RTX 4090 versus A10...
 
Notifications
Clear all

RTX 4090 versus A100 for large language model fine-tuning?

7 Posts
8 Users
0 Reactions
56 Views
0
Topic starter

Can someone tell me if I should buy an RTX 4090 or try to rent an A100 if I want to fine-tune one of those big language models? Sorry if this is a really basic question but I am totally lost with all these numbers and letters lol. I want to build a specialized bot for my plant nursery here in Seattle to help customers with care instructions and I have about 1800 to 2000 dollars saved up for a new computer this month. I saw the 4090 in a shop and it looks huge but then I read online that the A100 has way more memory or something? I dont really get what VRAM is or why it matters so much for the training part. If I buy the 4090 will it just break if I try to run a big model on it? Or is the A100 only for like giant companies because it seems really expensive to rent hourly. I just want something that wont take like a week to learn my data sets. My friend said I might need two cards but I definitely cant afford that right now. Just trying to figure out the smartest way to spend my money before I go and buy the wrong thing...


Topic Tags
7 Answers
11

I remember picking up an ASUS ROG Strix GeForce RTX 4090 24GB last year for a similar project. Honestly, it was a massive letdown.

  • Memory: 24GB fills up instantly once you start training.
  • Stability: Unfortunately, my home rig crashed constantly. I finally rented an NVIDIA A100 80GB SXM4 and it finished the training in hours while the local card just choked.


10

^ This. Also, if you're really trying to stretch that 2000 dollar budget, buying a brand new NVIDIA GeForce RTX 4090 24GB is basically the fastest way to go broke. It's a beast for gaming, but for LLMs, you're paying a massive premium for speed you might not fully use if the memory bandwidth is the bottleneck anyway.

  • Honestly, look for a used NVIDIA GeForce RTX 3090 24GB GDDR6X. You get the same 24GB VRAM as the 4090 but it costs way less, usually under a grand these days.
  • Put the money you save into a rental cloud. You can grab an NVIDIA A100 80GB PCIe for about 1.50 an hour on some sites which is way more efficient for the heavy lifting.
  • Since it's just a plant nursery bot, you can probably use QLoRA to fit a 7B or 13B model into 24GB VRAM easily. Local builds are fun for tinkering but renting is the smartest way to keep your costs down for a specific project like this. Don't dump your whole savings into a card that might be overkill for a bot about ferns and potting soil.


3

Honestly, I tried building my own rig for this and it was such a letdown. My setup kept crashing because the memory wasnt enough for the models I wanted. Unfortunately, consumer gear feels really limited for this.

  • I had major heat issues
  • It took way too long
  • Software was a headache Renting is probably safer so you dont waste your savings on hardware that might fail you.


3

Ngl I'm kinda worried about those 12VHPWR connectors melting during 24/7 training. I think I heard about safety issues there... definitely watch your temps and power draw if you go local.


2

I spent way too much on a local build and it was a massive disappointment.

  • Memory bandwidth is the killer. The NVIDIA GeForce RTX 4090 has a much narrower bus than enterprise gear, so everything crawls.
  • Scaling is a mess. Consumer cards lack the interconnects found in an NVIDIA A100 Tensor Core GPU, making multi-card setups almost useless for training. Honestly, consumer hardware just isnt designed for this kind of throughput.


2

Yep, this is the way


1

tl;dr: I have been stuck with this exact same dilemma for months now and it is honestly driving me crazy. I have the same problem and keep looking for a clear answer but everywhere I turn there is just more conflicting info. Tbh I have been building PCs for a long time but this whole LLM hardware requirement thing is a total mess to figure out... I keep looking at my savings and then looking at the 4090, then getting scared I will just be wasting money if the memory isnt enough for what I want to do. It is so frustrating because you want to own your hardware but the risk of it being totally useless for training in a few months is just sitting there in the back of my mind. Still looking for a real answer myself.


Share: