I'm currently planning a build for local LLM inference and stable diffusion, and I'm torn between two paths. I can't decide if I should splurge on a single RTX 4090 or try to bridge two used RTX 3090s to get that sweet 48GB of VRAM. Iām a bit worried about the power consumption and heat of a dual-card setup, but the extra memory seems crucial for running larger models. Does NVLink actually make a difference here, or is the single-card speed a better trade-off? I'd love to hear from anyone who has tested both configurations. Which setup provides better long-term value for a home AI lab?
I went through this last year. honestly, i was super nervous about the stability of a multi-gpu rig, so i took a very cautious approach. i ended up grabbing two used NVIDIA GeForce RTX 3090 24GB cards because i just couldnt ignore that 48GB total vram for running the big 70B models.
1. Dual NVIDIA GeForce RTX 3090 24GB: The memory is basically a cheat code. but man, the heat is REAL. i had to get a high-end Corsair AX1600i 1600W Digital ATX Power Supply just to feel safe about the power spikes. i also use an NVIDIA GeForce RTX NVLink Bridge 4-slot which helps a bit with p2p transfers, but its really about that vram overhead.
2. Single NVIDIA GeForce RTX 4090 24GB: i tested one from a buddy and it is definitely faster for single-stream generation and stays way cooler. but as soon as i tried a model that needed 30GB+, it just tanked or wouldnt run without heavy quantization.
so far im pretty satisfied with the dual setup cuz it actually works well for my specific research, but you gotta be careful with the cooling... i basically have a desk fan pointed at my case lol. idk if id recommend it if you want a quiet room tho! 👍
ngl, if you're looking at pure value, the dual NVIDIA GeForce RTX 3090 24GB setup is basically unbeatable for home labs right now. getting 48GB of VRAM for under $1500 (used prices) is literally the only way to run those heavy 70B models without constant slowdowns from offloading to system RAM. a single NVIDIA GeForce RTX 4090 24GB is way faster for stable diffusion, but that 24GB wall is real... once you hit it, you're stuck. NVLink doesn't actually pool the memory for most LLM loaders anyway, but it helps with speeds on older architectures. just make sure your case can breathe, cuz those 3090s get hot af.
Respectfully, I'd consider another option. Ngl, dual 3090s are a huge fire risk if your wiring isnt perfect. I'd definitely stick with one card to keep things safe and actually cool lol.
Can vouch for this
Honestly, everyone gets hung up on the VRAM capacityāwhich is validābut you should really look at the performance delta in terms of raw throughput. In my benchmarks, the 4090 absolutely smokes a dual 3090 setup when it comes to Stable Diffusion XL because of the improved architecture and better support for newer kernels, right? If you're mostly doing image gen, the single card's speed makes the whole workflow feel way more responsive. But for LLMs, itās a trade-off between scale and latency. Even with a decent motherboard, splitting a model across two cards adds a bit of overhead that you dont see on a single-die setup. Are you prioritizing being able to run the absolute biggest 70B models possible at any cost, or are you looking for snappy, high tokens-per-second performance on slightly smaller, highly optimized models? Also, are you planning on doing any local fine-tuning or is this strictly for inference?
Works great for me
Can vouch for this
Yo! I feel u on this. Been down the rabbit hole of multi-GPU setups myself and honestly... it's a bit of a headache but also kinda awesome? I tried running two 3090s last year because i reallyyy wanted to run those huge 70B models locally. The 48GB of VRAM is literally a game changer for that, but my room felt like a sauna and I had to upgrade my PSU cuz it kept tripping. NVLink is cool but honestly for most LLM stuff, it doesnt matter as much as just having the raw VRAM capacity.
But yeah, before you pull the trigger tho, I gotta ask: what kind of case and cooling are you working with?? Also, are you mostly focused on just running the models or do you plan on doing a lot of training/finetuning? Cuz that 4090 is way faster for single-card tasks, but if u wanna load big weights, the 3090s are tempting. gl!
Works great for me