Price Per TokenPrice Per Token

LLM Fine-Tuning Pricing

Compare fine-tuning costs across 6 providers. Training, inference, and hosting prices for GPT-4o, Gemini, Llama, Mistral, and more.

OpenAIAnthropic

AI Coding Assistants Leaderboard Vote for the best AI coding assistant

CursorGitHub CopilotWindsurfClineZedAiderTraeContinueAmazon Q

114 out of our 298 tracked models have had a price change in February.

Get our weekly newsletter on pricing changes, new releases, and tools.

Fine-Tuning Pricing Comparison

14 models

Provider
Base Model
Training $/1M
Inference In $/1M
Inference Out $/1M
Min Examples
Hosting
TG
Together AI
Llama 3.1 8B
$0.480
$0.180
$0.180
1
Included
TG
Together AI
Mistral 7B
$0.480
$0.200
$0.200
1
Included
FW
Fireworks
Llama 3.1 8B
$0.500
$0.200
$0.200
1
Included
MI
Mistral
Mistral 7B
$1.000
$0.250
$0.250
1
Included
MI
Mistral
Mistral Small
$2.000
$0.200
$0.600
1
Included
TG
Together AI
Llama 3.1 70B
$2.900
$0.880
$0.880
1
Included
O
OpenAI
GPT-4o-mini
$3.000
$0.300
$1.200
10
Included
G
Google
Gemini 2.0 Flash
$3.000
$0.150
$0.600
10
Included
FW
Fireworks
Llama 3.1 70B
$3.000
$0.900
$0.900
1
Included
CO
Cohere
Command R
$3.000
$0.300
$1.200
2
Included
CO
Cohere
Command R+
$3.000
$2.500
$10.000
2
Included
O
OpenAI
GPT-3.5 Turbo
$8.000
$3.000
$6.000
10
Included
G
Google
Gemini 1.5 Flash
$8.000
$0.075
$0.300
10
Included
O
OpenAI
GPT-4o
$25.000
$3.750
$15.000
10
Included

Prices last verified February 2026. Training costs are per 1M tokens processed during fine-tuning. Actual costs depend on dataset size, epochs, and provider-specific minimums.

What is LLM Fine-Tuning?

Fine-tuning an LLM means taking a pre-trained foundation model and training it further on your own dataset. This process adjusts the model's weights so it produces outputs that match your specific domain, tone, or task — without building a model from scratch.

Common fine-tuning use cases include customer support bots trained on company knowledge bases, code generation models specialized for internal APIs, and classification systems that follow strict labeling schemas. The result is a model that performs significantly better on your task while retaining the general capabilities of the base model.

Most providers offer supervised fine-tuning (SFT), where you provide input-output pairs. Some also support LoRA (Low-Rank Adaptation) — a parameter-efficient method that trains a small adapter instead of all model weights, reducing cost and time.

Fine-Tuning vs Prompt Engineering

Before investing in fine-tuning, consider whether prompt engineering (few-shot examples, system prompts, RAG) can achieve your goals. Here's how they compare:

FactorPrompt EngineeringFine-Tuning
Upfront costLow — no training requiredMedium to high — dataset prep + training compute
Per-request costHigher — long prompts with examplesLower — shorter prompts, model "knows" the task
Setup timeHoursDays to weeks (data collection + training)
Task performanceGood for general tasksExcellent for specialized tasks
Iteration speedFast — change prompts instantlySlow — retrain for each change
Best forPrototyping, low-volume, varied tasksProduction, high-volume, consistent tasks

Fine-Tuning Cost Breakdown

The total cost of fine-tuning an LLM breaks down into several components:

  • Training compute: The per-token cost charged during the fine-tuning process. This varies dramatically — from $0.48/1M tokens for open-source 7B models on Together AI to $25/1M tokens for GPT-4o on OpenAI. Training typically requires multiple epochs over your dataset.
  • Dataset preparation: Often the hidden cost. You need clean, well-formatted input-output pairs. Manual curation of 500-1,000 high-quality examples can take significant time, though some providers accept as few as 10 examples.
  • Inference cost delta: Fine-tuned model inference is often priced differently than the base model. OpenAI charges more for fine-tuned GPT-4o inference, while Google charges the same rate. Open-source providers like Together AI and Fireworks serve fine-tuned models at base model prices.
  • Hosting and storage: Most cloud providers include hosting in their per-token pricing. Mistral charges $2/month for model storage. Self-hosted options require GPU infrastructure — typically $1-4/hour for an 8B model or $8-16/hour for 70B models.

ROI Analysis: When Fine-Tuning Pays Off

Fine-tuning makes financial sense in specific scenarios:

  • High-volume classification: If you're processing 100K+ requests per day, even small per-request savings from shorter prompts add up. A fine-tuned model that eliminates a 500-token system prompt saves ~$0.15 per 1,000 requests at $0.30/1M input tokens.
  • Domain-specific generation: Medical reports, legal documents, or code in proprietary frameworks — tasks where general models consistently miss the mark. Fine-tuning can reduce error rates by 20-50%, cutting the cost of human review.
  • Latency-sensitive applications: Fine-tuned models need shorter prompts, which means fewer input tokens to process — resulting in lower time-to-first-token latency.

Break-even example:

Training a GPT-4o-mini fine-tune on 100K tokens (3 epochs) costs about $0.90. If the fine-tuned model lets you drop a 400-token system prompt from each request, you save $0.12 per 1,000 requests. At 10,000 requests/day, the training cost pays for itself in under a day.

How to Choose a Fine-Tuning Provider

  • OpenAI — Best for teams already using GPT models. Seamless experience with no infrastructure to manage. GPT-4o-mini offers the best value for most use cases. Minimum 10 examples required.
  • Google (Vertex AI) — Strong choice for enterprise teams on GCP. Gemini 2.0 Flash fine-tuning is competitively priced, and inference costs don't increase for tuned models.
  • Together AI — Best budget option for open-source models. LoRA fine-tuning starts at $0.48/1M tokens. Supports Llama, Mistral, and other open models with serverless inference included.
  • Fireworks — Similar to Together AI with competitive pricing for open-source models. Strong DPO support for RLHF-style fine-tuning at 2x the SFT price.
  • Mistral — Best for teams building with Mistral models specifically. Note the $4 minimum fee per job and $2/month storage cost.
  • Cohere — Good for RAG and enterprise search use cases. Command R models are optimized for retrieval-augmented generation, making fine-tuning particularly effective for search-heavy workflows.