Price Per TokenPrice Per Token

Have the GB10 devices become the current "best value" for LLMs?

I want to buy some real hardware because I feel like I'm falling behind. 3090s are >$1000 on ebay, and building out the server would be very expensive with current memory and storage prices. Macs are backordered for the next 5 months. I have no idea on the status of AMD products or Intel, but I don't want to fight driver and compatibility issues on top of trying to get models and harnesses running.

Are the GB10 variants the best value if you want to buy now? Is it better to try to wait on the M5 releases in 2-4 months? That seems like forever in today's fast-moving environment.

9
to join the discussion.

9 comments

jacek2023·21d ago

I was considering spark as a second device next to my 3090s but I have impression these devices are SLOWER not faster

1 pt
hurdurdur7·21d ago

I would not buy a GB10. It can fit big models and process prompts quite fast but token generation is slow for big models as memory bandwidth is low. On that last point the price vs performance is just wrong.

I would wait for M5 ultra. Or go for a multi gpu rig.

1 pt

what would your suggestion be for multi gpu rig under 2000usd?

1 pt
Tyme4Trouble·21d ago

Under 2000 USD? For anything other than inference you really are going to want 1:1 DRAM to vRAM which in this economy is going to byte even with DDR4

1 pt

I have access to a hardware disposal dump from a mnc, they dump all stuff other than GPUs (sometime they slip through), so for anything that is not soldered to the machine, I should be able to pull off, i already found two sets of DDR5-6000 (2x16GB), im trying to build from junkyard so gpu is the only thing im trying to buy

1 pt
Easy-Unit2087·21d ago

I think the GB10 has the best price/value for local LLMs at 2 nodes (2x $3,400 Asus GX10 1TB, $80 QSFP56 cable). Thanks to 200GbE, adding a second node nearly doubles speed and model size that will fit.

Two nodes run Intel/Qwen3.5-397B-A17B-int4-AutoRound at around 1,500t/s PP and 30t/s TG on vLLM, perfect for agentic work. They are also great at stable diffusion with ComfyUI (worker node for half of the images, and half of the upscaling), fine-tuning small models.

1 pt
__heroes_·21d ago

ebay is expensive, I got my 3090 for less than 800 bucks few months ago at local web marketplace. 24gb was a must for me but 4090 is over 2000 bucks, I'm not doing that.

1 pt
Tommonen·21d ago

No. B70 if you wabt gpu, strix halo or mac if you want unified memory

1 pt
semangeIof·21d ago

B70 hahaha. How many TPS are you getting? The memory bandwidth on that card is a joke. $1k USD to fit a Q6 of Gemma 4 31B and you still have to quantize KV cache and it isnt even human read speed at 0 context.

That card is not good. Maybe it will be better months from now but it is too slow.

1 pt