Which NVIDIA GPU of...
 
Notifications
Clear all

Which NVIDIA GPU offers the most VRAM for large language models?

6 Posts
7 Users
0 Reactions
24 Views
0
Topic starter

Hey everyone! I’ve been diving deep into the world of local large language models lately, specifically experimenting with Llama 3 and some of the larger Mixtral variants. However, I’ve quickly realized that my current setup just isn't cutting it. As many of you know, VRAM is the absolute bottleneck when it comes to loading these massive parameter counts without everything slowing to a crawl on system RAM.

I’m trying to figure out which NVIDIA card—or combination of cards—will give me the most breathing room. I know the RTX 3090 and 4090 are the go-to consumer choices with 24GB, but I’m starting to look at professional options like the RTX A6000 or even older workstation cards with 48GB to avoid having to run heavily compressed quantizations. I’ve even seen some people talk about using multiple GPUs via NVLink or just splitting the layers across two cards, but I'm worried about the complexity and power draw.

My main goal is to run 70B models smoothly at home. Given the massive price jumps between the gaming line and the workstation-grade GPUs, I'm feeling a bit overwhelmed and confused about the best path forward. Should I hunt for a used high-VRAM pro card, or is there a specific NVIDIA model I'm overlooking that offers the best 'VRAM per dollar' for AI work? What’s the maximum VRAM I can realistically get on a single card before hitting those crazy enterprise-only prices?


Topic Tags
6 Answers
11

> My main goal is to run 70B models smoothly at home... What’s the maximum VRAM I can realistically get on a single card before hitting those crazy enterprise-only prices?

oh man, i totally get it. honestly, the jump from 24GB to 48GB is where the wallet really starts to scream lol. basically, if ur looking for the absolute most VRAM on a single card without going into the $10k+ territory, the NVIDIA RTX 6000 Ada Generation 48GB is the king, but even that is pricey. I actually went the route of hunting for a used NVIDIA RTX A6000 48GB (the older Ampere one) and it’s been a lifesaver for running Llama 3 70B at decent quants.

But I gotta be a bit cautious here... if u go for these pro cards, make sure to check ur cooling. those blower fans are LOUD. Plus, you might want to consider the power draw; these things can get thirsty. if ur on a budget, the "VRAM per dollar" king is definitely the NVIDIA GeForce RTX 3090 24GB. I know u said ur worried about complexity, but honestly, putting two of those in one machine is usually the most cost-effective way to hit 48GB.

Just a heads up tho—running dual cards means u NEED a beefy power supply. I wouldn't trust anything less than like a EVGA SuperNOVA 1600 P2 80+ PLATINUM 1600W if ur running two 3090s full tilt. seriously, be careful with the heat buildup in a standard case. anyway, i hope that helps a bit!! gl with the 70B models... they're a total game changer when u get them running right. peace.


10

I went through this last year... snagged a used NVIDIA RTX A6000 48GB (Ampere) for about $2,800. Just be *really* careful with eBay and power draw tbh, it's risky but worth it!


4

> What’s the maximum VRAM I can realistically get on a single card

Honestly, maybe look at NVIDIA Quadro RTX 8000 48GB, but basically dont forget to consult a pro about cooling and power safety first... its amazing!! Does ur PSU handle it?


4

+1


4

Following


1

So I've been trying the multi-card thing for a few months and honestly... it's kind of a headache lol. I thought I'd just plug everything in and be good to go, but there are so many little things I didn't think about before diving in!!! Like, it's not just the VRAM you gotta worry about. - The heat is actually insane if you have cards stacked together, my room literally feels like a sauna after an hour of testing.
- Wait no, the biggest issue was actually my case size—some of these high-VRAM cards are giant bricks and I had to basically move my whole build just to fit them.
- The noise from the fans is basically constant if you're running 70B models for a long time. Basically just make sure you have way more cooling and physical space than you think you need before buying anything. I mean, do you even have enough PCIe slots for the proper spacing between cards? I'm still trying to figure out if it was even worth the noise...


Share: