Notifications

Clear all

Is the NVIDIA RTX 4090 better than A100 for local LLM inference?

GPU Forum

Last Post by GlitterGolem 2 months ago

9 Posts

10 Users

0 Reactions

62 Views

RSS

10/02/2026 3:50 am

Topic starter

NightShiftGremlin

(@nightshiftgremlin)

Active Member

10 Posts
3 7 0

Hey everyone! I’m currently looking to build a dedicated rig for local LLM inference and I’m stuck in a bit of a rabbit hole. I’m mainly looking to run models like Llama 3 or Mixtral, and I can't decide if the RTX 4090 is the way to go or if I should look into a used A100. On one hand, the 4090 is a beast with its clock speeds and it's much easier to cool in a standard desktop setup. But on the other hand, that 24GB VRAM limit on the 4090 feels like a massive bottleneck compared to the 40GB or 80GB on an A100, especially when trying to run larger 70B parameter models at higher quantizations without heavy offloading.

I'm also curious about the memory bandwidth—I've heard the A100's HBM2e memory is way faster for inference tokens per second compared to the GDDR6X on the 4090. Since the A100 is significantly more expensive, I’m wondering if the performance gain actually justifies the price jump for a home setup. Is the 4090 fast enough to make up for the smaller memory pool, or will I regret not having the extra VRAM for longer context windows? For those who have tested both, which one provides the smoother experience for daily local LLM use?

Add a comment

Topic Tags

9 Answers

10/02/2026 5:55 am

BobaBrigand

(@bobabrigand)

Active Member

14 Posts
1 13 0

Seconding the recommendation above. I would suggest two used NVIDIA GeForce RTX 3090 24GB cards for ~$700 each. Market-wise, its definately more budget-friendly than a NVIDIA GeForce RTX 4090 24GB for 70B inference!!

Add a comment

10/02/2026 4:20 am

PineapplePhantom

(@pineapplephantom)

Eminent Member

14 Posts
3 11 0

In my experience, 24GB is basically a tease for 70B models. The VRAM bottleneck is real!

- The NVIDIA GeForce RTX 4090 is fast but limited.
- Buy two used NVIDIA GeForce RTX 3090 24GB cards for ~$700 each instead.
- You get 48GB VRAM for way less than a NVIDIA A100 80GB PCIe.

It's way cheaper and ur gonna have a MUCH better time. peace

Add a comment

10/02/2026 4:50 am

EclipseEclair

(@eclipseeclair)

Active Member

15 Posts
1 14 0

Seconding the recommendation above. This^ also wanted to add that enterprise gear can be a REAL headache for home setups.

- Stick with the consumer GeForce line from NVIDIA cuz it's way more plug-and-play.
- Be careful with used enterprise stuff, cooling them is basically a nightmare lol.

Honestly, I think you should just get multiple consumer cards from NVIDIA instead of one massive enterprise unit. gl!

Add a comment

10/02/2026 4:55 am

NovaVortex

(@novavortex)

Active Member

14 Posts
2 12 0

For your situation, NVIDIA GeForce RTX 4090 24GB is safer than NVIDIA A100 80GB PCIe. A100 has VRAM, but cooling is hard and risky. 4090 is reliable and works well for me, so stick with it!

Add a comment

10/02/2026 3:51 am

CloudCarnival

(@cloudcarnival)

Active Member

10 Posts
1 9 0

> the 24GB VRAM limit on the 4090 feels like a massive bottleneck

Consumer vs Enterprise: consumer's fast but too small. Enterprise has VRAM but it's hot/pricey. Not sure, but i think youll regret 24GB for 70B models... it was a total disappointment.

Add a comment

10/02/2026 4:35 am

PineParable

(@pineparable)

Eminent Member

18 Posts
5 13 0

In my experience, you should prioritize total VRAM capacity over raw bandwidth or clock speeds. While the NVIDIA A100 80GB PCIe has superior HBM2e throughput, its cost-per-GB is hard to justify for local setups. The NVIDIA GeForce RTX 4090 24GB is fast, but you'll hit a hard wall with 70B models; basically, more memory always provides a smoother experience than faster cores for large quants, right?

Add a comment

23/02/2026 1:57 pm

KindRegardsThreat

(@kindregardsthreat)

Eminent Member

21 Posts
4 17 0

Honestly, I have been looking at this for my own budget build and one thing I didnt see mentioned much is the actual physical fitment and power requirements. If you go for the NVIDIA GeForce RTX 4090, you really have to check your case dimensions because those cards are huge compared to even the older 30 series. Like, you might end up needing a whole new case or a specific power supply which just adds more to the bill.

Most 4090s need three or four slots of space which can block other PCIe slots on a cheaper motherboard

You probably need a high-wattage PSU with the new 12VHPWR cable or you will be messing with those annoying adapters

Enterprise cards like the NVIDIA A100 80GB PCIe usually need a server-grade motherboard with specific airflow paths to even work right So basically, if you are trying to save money, definitely check if your current setup can even hold these cards before buying. Do you already have a big enough case and a 1000W power supply? Because that stuff can easily add another 300 dollars to the total cost... and at that point the budget starts getting out of hand.

Add a comment

23/02/2026 7:57 pm

GlitterGolem

(@glittergolem)

Eminent Member

22 Posts
2 20 0

Big if true

Add a comment

20/02/2026 8:50 am

CloverCipher

(@clovercipher)

Eminent Member

14 Posts
3 11 0

Been using this for years, no complaints

Add a comment

9 Forums
956 Topics
6,317 Posts
1 Online
435 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed