Hey everyone! I’m currently looking to build a dedicated rig for local LLM inference and I’m stuck in a bit of a rabbit hole. I’m mainly looking to run models like Llama 3 or Mixtral, and I can't decide if the RTX 4090 is the way to go or if I should look into a used A100. On one hand, the 4090 is a beast with its clock speeds and it's much easier to cool in a standard desktop setup. But on the other hand, that 24GB VRAM limit on the 4090 feels like a massive bottleneck compared to the 40GB or 80GB on an A100, especially when trying to run larger 70B parameter models at higher quantizations without heavy offloading.
I'm also curious about the memory bandwidth—I've heard the A100's HBM2e memory is way faster for inference tokens per second compared to the GDDR6X on the 4090. Since the A100 is significantly more expensive, I’m wondering if the performance gain actually justifies the price jump for a home setup. Is the 4090 fast enough to make up for the smaller memory pool, or will I regret not having the extra VRAM for longer context windows? For those who have tested both, which one provides the smoother experience for daily local LLM use?
Seconding the recommendation above. I would suggest two used NVIDIA GeForce RTX 3090 24GB cards for ~$700 each. Market-wise, its definately more budget-friendly than a NVIDIA GeForce RTX 4090 24GB for 70B inference!!
In my experience, 24GB is basically a tease for 70B models. The VRAM bottleneck is real!
- The NVIDIA GeForce RTX 4090 is fast but limited.
- Buy two used NVIDIA GeForce RTX 3090 24GB cards for ~$700 each instead.
- You get 48GB VRAM for way less than a NVIDIA A100 80GB PCIe.
It's way cheaper and ur gonna have a MUCH better time. peace
Seconding the recommendation above. This^ also wanted to add that enterprise gear can be a REAL headache for home setups.
- Stick with the consumer GeForce line from NVIDIA cuz it's way more plug-and-play.
- Be careful with used enterprise stuff, cooling them is basically a nightmare lol.
Honestly, I think you should just get multiple consumer cards from NVIDIA instead of one massive enterprise unit. gl!
For your situation, NVIDIA GeForce RTX 4090 24GB is safer than NVIDIA A100 80GB PCIe. A100 has VRAM, but cooling is hard and risky. 4090 is reliable and works well for me, so stick with it!
> the 24GB VRAM limit on the 4090 feels like a massive bottleneck
Consumer vs Enterprise: consumer's fast but too small. Enterprise has VRAM but it's hot/pricey. Not sure, but i think youll regret 24GB for 70B models... it was a total disappointment.
In my experience, you should prioritize total VRAM capacity over raw bandwidth or clock speeds. While the NVIDIA A100 80GB PCIe has superior HBM2e throughput, its cost-per-GB is hard to justify for local setups. The NVIDIA GeForce RTX 4090 24GB is fast, but you'll hit a hard wall with 70B models; basically, more memory always provides a smoother experience than faster cores for large quants, right?
Honestly, I have been looking at this for my own budget build and one thing I didnt see mentioned much is the actual physical fitment and power requirements. If you go for the NVIDIA GeForce RTX 4090, you really have to check your case dimensions because those cards are huge compared to even the older 30 series. Like, you might end up needing a whole new case or a specific power supply which just adds more to the bill.
Big if true
Been using this for years, no complaints