I’ve been diving deep into the world of open-source AI lately and really want to start running models like Llama 3 or Mistral locally rather than relying on cloud APIs. I know VRAM is the most critical factor, but I’m a bit stuck on which direction to go. I’m currently torn between hunting for a used RTX 3090 to get that 24GB of memory or just investing in a brand-new 4090 for the better speed and power efficiency. Is the extra performance of the 40-series actually noticeable for inference, or should I just prioritize the cheapest way to get the most VRAM? What’s the current sweet spot for a smooth experience with 70B models?
Sooo I've been messing with local AI for a few years now and basically it's all about that VRAM. Like, the model weights have to sit in the memory, so more capacity is ALWAYS better than just having a faster chip. In my experience, I started with a used NVIDIA GeForce RTX 3090 24GB and honestly i'm still really happy with it!! The NVIDIA GeForce RTX 4090 24GB is definitely faster, but for just chatting with models like Llama 3, you probably wont notice the extra speed enough to justify the massive price jump, you know? For those big 70B models, you gotta use 4-bit quantization to squeeze them onto one card. It's a bit tight but it works well enough for a smooth experience. I guess if you have the cash, the 40-series is "better," but the 3090 is still the goat for value right now. gl!
Basically, grab a used NVIDIA GeForce RTX 3090 24GB:
- Way cheaper
- Same 24GB VRAM Is NVIDIA GeForce RTX 4090 24GB speed even worth the cash?? Idk, maybe not for inference.
Solid advice 👍
Nice, didn't know that
Just wanted to say thanks for everyone chiming in. Super helpful discussion.