Massive Multitask Language Understanding — tests knowledge across 57 subjects.
Data from LayerLens
As of April 18, 2026, the top-scoring model on MMLU is GLM 5 at 91.7%, followed by GLM 5 at 91.7% and R1 0528 at 90.5%. 36 models have been evaluated on this benchmark.
Last updated: April 18, 2026
Models
36
Best Score
91.7
Average
79.2
Std Dev
17.1
Provider | Model | Input $/M | Output $/M | MMLU | Actions |
|---|---|---|---|---|---|
$0.720 | $2.300 | 91.7 | |||
$0.720 | $2.300 | 91.7 | |||
$0.500 | $2.150 | 90.5 | |||
$0.300 | $0.500 | 89.2 | |||
$0.550 | $2.200 | 88.9 | |||
$0.550 | $2.200 | 88.3 | |||
$0.550 | $2.200 | 88.3 | |||
$0.039 | $0.100 | 87.6 | |||
$0.039 | $0.100 | 87.6 | |||
$0.150 | $0.580 | 85.9 | |||
$0.150 | $0.580 | 85.9 | |||
$0.300 | $2.500 | 85.7 | |||
$0.300 | $2.500 | 85.7 | |||
$0.300 | $2.500 | 85.7 | |||
$0.300 | $2.500 | 85.7 | |||
$0.150 | $0.600 | 85.5 | |||
$3.000 | $15.000 | 85.3 | |||
$3.000 | $15.000 | 85.3 | |||
$0.080 | $0.280 | 85.3 | |||
$0.080 | $0.280 | 85.3 | |||
$0.100 | $0.400 | 84.8 | |||
$2.000 | $8.000 | 84.6 | |||
$0.280 | $0.900 | 84.5 | |||
$2.500 | $10.000 | 84.1 | |||
$0.014 | $0.028 | 83.4 | |||
$0.400 | $2.000 | 82.6 | |||
$0.800 | $3.200 | 78.3 | |||
$0.065 | $0.140 | 77.6 | |||
$2.000 | $6.000 | 77.2 | |||
$0.075 | $0.200 | 76.0 | |||
$0.070 | $0.280 | 73.5 | |||
$0.800 | $4.000 | 72.8 | |||
$0.035 | $0.140 | 68.9 | |||
$0.900 | $0.900 | 37.1 | |||
$0.080 | $0.300 | 24.6 | |||
$0.030 | $0.050 | 15.3 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
Massive Multitask Language Understanding — tests knowledge across 57 subjects.
This leaderboard shows all models with MMLU benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.