Price Per TokenPrice Per Token

LLM Trends & Benchmarks

Track the evolution of AI models over time. Compare benchmark scores, pricing trends, and the open source vs closed source frontier.

Benchmark vs Price

How does model performance correlate with API pricing? Higher benchmark scores with lower prices indicate better value.

MMLU-Pro vs Price

Loading chart...

GPQA vs Price

Loading chart...

Aider (Coding) vs Price

Loading chart...

LiveCodeBench vs Price

Loading chart...

MATH Hard vs Price

Loading chart...

Context Length vs Price

Loading chart...

Benchmarks Over Time

Track the frontier of AI capabilities as new models are released each month.

MMLU-Pro Frontier Over Time

Provider frontier comparison - MMLU-Pro benchmark (includes reasoning variants)

GPQA Frontier Over Time

Provider frontier comparison - Graduate-level science (includes reasoning variants)

Aider Frontier Over Time

Provider frontier comparison - Real-world coding (includes reasoning variants)

LiveCodeBench Frontier Over Time

Provider frontier comparison - Competitive programming (includes reasoning variants)

MATH Hard Frontier Over Time

Provider frontier comparison - Competition math (includes reasoning variants)

Context Length Frontier Over Time

Provider frontier comparison - Maximum context window

Open Source vs Closed Source

Compare the frontier capabilities between open source and proprietary models over time.

MMLU-Pro: Open vs Closed

Comparing the best open source and closed source models each month

GPQA: Open vs Closed

Graduate-level science benchmark comparison

Aider: Open vs Closed

Real-world coding benchmark comparison

LiveCodeBench: Open vs Closed

Competitive programming benchmark comparison

MATH Hard: Open vs Closed

Competition math benchmark comparison

Context Length: Open vs Closed

Maximum context window comparison

Data includes 296 models from 2023-05 to 2025-12

Benchmark data sourced from Artificial Analysis. Pricing data updated daily.