Together AI Pricing

Compare Together AI pricing for 257 models. Serverless inference, dedicated GPU endpoints, and fine-tuning across LLMs, embeddings, image, video, and audio models.

Last updated: May 26, 2026

Together AI Overview

Serverless

171

Dedicated GPU

Registry Only

111

Free Tier

$0.01

Cheapest Input/1M

Together AI Model Pricing

Provider	Model	Context	Quant	Input/1M	Output/1M	Dedicated/hr
FW Fireworks	BGE Base EN v1.5	1k	FP16	$0.008	$0.008	$2.59/hr A100-80GB
QW Qwen	Qwen2 1.5B Instruct	33k	FP16	$0.020	$0.020	$6.49/hr H100-80GB
FW Fireworks	E5 Large Instruct	1k	FP16	$0.020	$0.020	—
LQ Liquidai	LiquidAI/LFM2-24B-A2B	33k	FP16	$0.030	$0.120	—
AR Arcee AI	Trinity Mini	128k	FP16	$0.045	$0.150	$6.49/hr H100-80GB
O OpenAI	GPT-OSS-20b	131k	FP16	$0.050	$0.200	$6.49/hr H100-80GB
G Google	Gemma 3n 4B	33k	FP16	$0.060	$0.120	—
M Meta-llama	Llama 3.2 1B Instruct	131k	FP16	$0.060	$0.060	$6.49/hr H100-80GB
M Meta-llama	Llama 3.2 3B Instruct	131k	FP16	$0.060	$0.060	$25.96/hr 4x H100-80GB
NV Nvidia	Nemotron Nano 9B V2	131k	FP16	$0.060	$0.250	$6.49/hr H100-80GB
QW Qwen	Qwen2.5 1.5B Instruct	33k	FP16	$0.100	$0.100	—
M Meta-llama	Llama 3 8B Instruct	8k	INT4	$0.100	$0.100	—
MI Mistral AI	Mistral Small 3.1 24B	33k	FP16	$0.100	$0.300	$12.98/hr 2x H100-80GB
QW Qwen	Qwen3.5 9B	262k	FP16	$0.100	$0.150	$12.98/hr 2x H100-80GB
FW Fireworks	Salesforce Llama Rank V1	8k	FP16	$0.100	$0.100	$6.49/hr H100-80GB
E Essential AI	Rnj 1 Instruct	33k	FP16	$0.150	$0.150	—
O OpenAI	GPT-OSS-120b	131k	FP16	$0.150	$0.600	$6.49/hr H100-80GB
QW Qwen	Qwen3 Next 80B A3B Instruct	262k	FP16	$0.150	$1.500	$25.96/hr 4x H100-80GB
QW Qwen	Qwen3 Next 80B A3B Thinking	262k	FP16	$0.150	$1.500	$25.96/hr 4x H100-80GB
DS Deepseek	R1 Distill Qwen 1.5B	131k	FP16	$0.180	$0.180	$6.49/hr H100-80GB
M Meta-llama	Llama 4 Scout	1049k	FP16	$0.180	$0.590	$51.92/hr 8x H100-80GB
M Meta-llama	Llama 3.1 8B Instruct	131k	FP8	$0.180	$0.180	$6.49/hr H100-80GB
QW Qwen	Qwen3 VL 8B Instruct	262k	FP16	$0.180	$0.680	$6.49/hr H100-80GB
M Meta-llama	Llama 3 8B Instruct	8k	FP16	$0.200	$0.200	$6.49/hr H100-80GB
M Meta-llama	Llama 3 8B Instruct	8k	FP16	$0.200	$0.200	$6.49/hr H100-80GB
MI Mistral AI	Ministral 3 14B 2512	262k	FP16	$0.200	$0.200	$6.49/hr H100-80GB
MI Mistral AI	Mistral 7B Instruct v0.1	33k	FP16	$0.200	$0.200	$12.98/hr 2x H100-80GB
MI Mistral AI	Mistral 7B Instruct v0.3	33k	FP16	$0.200	$0.200	$12.98/hr 2x H100-80GB
QW Qwen	Qwen3 235B A22B Instruct 2507	262k	FP16	$0.200	$0.600	$25.96/hr 4x H100-80GB
Z Z-ai	GLM 4.5 Air	131k	FP16	$0.200	$1.100	$12.98/hr 2x H100-80GB
M Meta-llama	Llama 3.1 8B Instruct	16k	FP16	$0.200	$0.200	$12.98/hr 2x H100-80GB
M Meta-llama	Llama Guard 4 12B	1049k	FP16	$0.200	$0.200	—
FW Fireworks	Rime Arcana v2	—	FP16	$0.270	$N/A	$6.49/hr H100-80GB
M Meta-llama	Llama 4 Maverick	1049k	FP16	$0.270	$0.850	$51.92/hr 8x H100-80GB
MI Mistral AI	Voxtral Mini 3B 2507	—	FP16	$0.270	$0.850	—
O OpenAI	Whisper Large V3	0k	FP16	$0.270	$0.850	$6.49/hr H100-80GB
G Google	Gemma 4 31B Instruct	32k	FP16	$0.280	$0.860	—
MM Minimax	MiniMax M2.7	197k	FP16	$0.300	$1.200	$23.90/hr 2x B200-180GB
QW Qwen	Qwen2.5 7B Instruct	33k	FP8	$0.300	$0.300	$6.49/hr H100-80GB
G Google	Gemma 4 31B Instruct	262k	FP16	$0.390	$0.970	$12.98/hr 2x H100-80GB
Z Z-ai	GLM 4.7	203k	FP16	$0.450	$2.000	$51.92/hr 8x H100-80GB
QW Qwen	Qwen3.6 Plus	1000k	FP16	$0.500	$3.000	—
QW Qwen	Qwen3 Coder Next	262k	FP16	$0.500	$1.200	$12.98/hr 2x H100-80GB
QW Qwen	Qwen3 VL 32B Instruct	262k	FP16	$0.500	$1.500	$12.98/hr 2x H100-80GB
MI Mistral AI	Mixtral 8x7B Instruct	33k	FP16	$0.600	$0.600	$12.98/hr 2x H100-80GB
NO Nousresearch	Nous Hermes 2 Mixtral 8x7B DPO	33k	FP16	$0.600	$0.600	$25.96/hr 4x H100-80GB
QW Qwen	Qwen3.5 397B A17B	262k	FP16	$0.600	$3.600	—
Z Z-ai	GLM 4.6	203k	FP16	$0.600	$2.200	$51.92/hr 8x H100-80GB
DS Deepseek	DeepSeek Coder 33B Instruct	16k	FP16	$0.800	$0.800	$25.96/hr 4x H100-80GB
G Google	Gemma 2 27B	8k	FP16	$0.800	$0.800	$12.98/hr 2x H100-80GB
QW Qwen	Qwen2.5 14B Instruct	33k	FP16	$0.800	$0.800	$12.98/hr 2x H100-80GB
QW Qwen	Qwen2.5 Coder 32B Instruct	16k	FP16	$0.800	$0.800	$25.96/hr 4x H100-80GB
M Meta-llama	Llama 3.3 70B Instruct	131k	FP8	$0.880	$0.880	$12.98/hr 2x H100-80GB
M Meta-llama	Llama 3.1 70B Instruct	131k	FP8	$0.880	$0.880	$25.96/hr 4x H100-80GB
M Meta-llama	Llama 3 70B Instruct	8k	FP8	$0.880	$0.880	$25.96/hr 4x H100-80GB
NV Nvidia	Llama 3.1 Nemotron 70B Instruct	33k	FP16	$0.880	$0.880	$25.96/hr 4x H100-80GB
QW Qwen	Qwen2 72B Instruct	33k	FP16	$0.900	$0.900	$25.96/hr 4x H100-80GB
Z Z-ai	GLM 4.7	203k	FP16	$1.000	$3.200	$47.80/hr 4x B200-180GB
MS Moonshotai	Kimi K2.6	262k	FP16	$1.200	$4.500	$95.60/hr 8x B200-180GB
QW Qwen	Qwen2.5 72B Instruct	33k	FP16	$1.200	$1.200	$25.96/hr 4x H100-80GB
QW Qwen	Qwen2.5 72B Instruct	131k	FP8	$1.200	$1.200	$25.96/hr 4x H100-80GB
QW Qwen	Qwen2 VL 72B Instruct	33k	FP16	$1.200	$1.200	$25.96/hr 4x H100-80GB
QW Qwen	QwQ 32B	131k	FP16	$1.200	$1.200	$25.96/hr 4x H100-80GB
D Deepcogito	Cogito v2.1 671B	164k	FP16	$1.250	$1.250	—
QW Qwen	Qwen3.7 Max	1000k	FP16	$1.250	$3.750	—
Z Z-ai	GLM 5.1	203k	FP16	$1.400	$4.400	$47.80/hr 4x B200-180GB
DS Deepseek	R1 Distill Qwen 14B	131k	FP16	$1.600	$1.600	$25.96/hr 4x H100-80GB
QW Qwen	Qwen2.5 VL 72B Instruct	33k	FP16	$1.950	$8.000	$25.96/hr 4x H100-80GB
DS Deepseek	R1 Distill Llama 70B	131k	FP16	$2.000	$2.000	$51.92/hr 8x H100-80GB
QW Qwen	Qwen3 Coder 480B A35B (exacto)	262k	FP16	$2.000	$2.000	$51.92/hr 8x H100-80GB
DS Deepseek	DeepSeek V4 Pro	512k	FP16	$2.100	$4.400	$95.60/hr 8x B200-180GB
M Meta-llama	Llama 3.1 405B Instruct	4k	FP16	$3.500	$3.500	$51.92/hr 8x H100-80GB
FW Fireworks	Kokoro 82M	—	FP16	$4.000	$N/A	—
C Canopy Labs	Orpheus 3B	—	FP16	$15.000	$N/A	$6.49/hr H100-80GB
FW Fireworks	Cartesia Sonic	—	FP16	$65.000	$N/A	—
FW Fireworks	Cartesia Sonic 2	0k	FP16	$65.000	$N/A	—
FW Fireworks	Cartesia Sonic 3	0k	FP16	$65.000	$N/A	—