Price Per TokenPrice Per Token
Together AI

Together AI Pricing

Compare Together AI pricing for 257 models. Serverless inference, dedicated GPU endpoints, and fine-tuning across LLMs, embeddings, image, video, and audio models.

Last updated: May 26, 2026

Together AI Overview

77
Serverless
171
Dedicated GPU
0
Registry Only
111
Free Tier
$0.01
Cheapest Input/1M

Together AI Model Pricing

Provider
Model
Context
Quant
Input/1M
Output/1M
Dedicated/hr
BGE Base EN v1.5
1k
FP16
$0.008
$0.008
$2.59/hr A100-80GB
33k
FP16
$0.020
$0.020
$6.49/hr H100-80GB
E5 Large Instruct
1k
FP16
$0.020
$0.020
33k
FP16
$0.030
$0.120
128k
FP16
$0.045
$0.150
$6.49/hr H100-80GB
131k
FP16
$0.050
$0.200
$6.49/hr H100-80GB
33k
FP16
$0.060
$0.120
131k
FP16
$0.060
$0.060
$6.49/hr H100-80GB
131k
FP16
$0.060
$0.060
$25.96/hr 4x H100-80GB
131k
FP16
$0.060
$0.250
$6.49/hr H100-80GB
33k
FP16
$0.100
$0.100
8k
INT4
$0.100
$0.100
33k
FP16
$0.100
$0.300
$12.98/hr 2x H100-80GB
262k
FP16
$0.100
$0.150
$12.98/hr 2x H100-80GB
Salesforce Llama Rank V1
8k
FP16
$0.100
$0.100
$6.49/hr H100-80GB
33k
FP16
$0.150
$0.150
131k
FP16
$0.150
$0.600
$6.49/hr H100-80GB
262k
FP16
$0.150
$1.500
$25.96/hr 4x H100-80GB
262k
FP16
$0.150
$1.500
$25.96/hr 4x H100-80GB
131k
FP16
$0.180
$0.180
$6.49/hr H100-80GB
1049k
FP16
$0.180
$0.590
$51.92/hr 8x H100-80GB
131k
FP8
$0.180
$0.180
$6.49/hr H100-80GB
262k
FP16
$0.180
$0.680
$6.49/hr H100-80GB
8k
FP16
$0.200
$0.200
$6.49/hr H100-80GB
8k
FP16
$0.200
$0.200
$6.49/hr H100-80GB
262k
FP16
$0.200
$0.200
$6.49/hr H100-80GB
33k
FP16
$0.200
$0.200
$12.98/hr 2x H100-80GB
33k
FP16
$0.200
$0.200
$12.98/hr 2x H100-80GB
262k
FP16
$0.200
$0.600
$25.96/hr 4x H100-80GB
131k
FP16
$0.200
$1.100
$12.98/hr 2x H100-80GB
16k
FP16
$0.200
$0.200
$12.98/hr 2x H100-80GB
Llama Guard 4 12B
1049k
FP16
$0.200
$0.200
Rime Arcana v2
FP16
$0.270
$N/A
$6.49/hr H100-80GB
1049k
FP16
$0.270
$0.850
$51.92/hr 8x H100-80GB
Voxtral Mini 3B 2507
FP16
$0.270
$0.850
Whisper Large V3
0k
FP16
$0.270
$0.850
$6.49/hr H100-80GB
32k
FP16
$0.280
$0.860
197k
FP16
$0.300
$1.200
$23.90/hr 2x B200-180GB
33k
FP8
$0.300
$0.300
$6.49/hr H100-80GB
262k
FP16
$0.390
$0.970
$12.98/hr 2x H100-80GB
203k
FP16
$0.450
$2.000
$51.92/hr 8x H100-80GB
Qwen3.6 Plus
1000k
FP16
$0.500
$3.000
262k
FP16
$0.500
$1.200
$12.98/hr 2x H100-80GB
262k
FP16
$0.500
$1.500
$12.98/hr 2x H100-80GB
33k
FP16
$0.600
$0.600
$12.98/hr 2x H100-80GB
33k
FP16
$0.600
$0.600
$25.96/hr 4x H100-80GB
262k
FP16
$0.600
$3.600
203k
FP16
$0.600
$2.200
$51.92/hr 8x H100-80GB
16k
FP16
$0.800
$0.800
$25.96/hr 4x H100-80GB
8k
FP16
$0.800
$0.800
$12.98/hr 2x H100-80GB
33k
FP16
$0.800
$0.800
$12.98/hr 2x H100-80GB
16k
FP16
$0.800
$0.800
$25.96/hr 4x H100-80GB
131k
FP8
$0.880
$0.880
$12.98/hr 2x H100-80GB
131k
FP8
$0.880
$0.880
$25.96/hr 4x H100-80GB
8k
FP8
$0.880
$0.880
$25.96/hr 4x H100-80GB
33k
FP16
$0.880
$0.880
$25.96/hr 4x H100-80GB
33k
FP16
$0.900
$0.900
$25.96/hr 4x H100-80GB
203k
FP16
$1.000
$3.200
$47.80/hr 4x B200-180GB
262k
FP16
$1.200
$4.500
$95.60/hr 8x B200-180GB
33k
FP16
$1.200
$1.200
$25.96/hr 4x H100-80GB
131k
FP8
$1.200
$1.200
$25.96/hr 4x H100-80GB
33k
FP16
$1.200
$1.200
$25.96/hr 4x H100-80GB
131k
FP16
$1.200
$1.200
$25.96/hr 4x H100-80GB
164k
FP16
$1.250
$1.250
Qwen3.7 Max
1000k
FP16
$1.250
$3.750
203k
FP16
$1.400
$4.400
$47.80/hr 4x B200-180GB
131k
FP16
$1.600
$1.600
$25.96/hr 4x H100-80GB
33k
FP16
$1.950
$8.000
$25.96/hr 4x H100-80GB
131k
FP16
$2.000
$2.000
$51.92/hr 8x H100-80GB
262k
FP16
$2.000
$2.000
$51.92/hr 8x H100-80GB
512k
FP16
$2.100
$4.400
$95.60/hr 8x B200-180GB
4k
FP16
$3.500
$3.500
$51.92/hr 8x H100-80GB
Kokoro 82M
FP16
$4.000
$N/A
Orpheus 3B
FP16
$15.000
$N/A
$6.49/hr H100-80GB
Cartesia Sonic
FP16
$65.000
$N/A
Cartesia Sonic 2
0k
FP16
$65.000
$N/A
Cartesia Sonic 3
0k
FP16
$65.000
$N/A

Pricing from Together AI. Serverless prices per 1M tokens. Dedicated prices per hour (cheapest GPU config shown). Batch API at 50% of serverless.

Quantization: FP16 = Reference, FP8 = Turbo, INT4 = Lite. Image/video pricing not available via API.

Compare Together AI with Other Providers