Price Per TokenPrice Per Token

MATH-500 Leaderboard

Competition mathematics problems requiring multi-step reasoning, covering algebra, geometry, number theory, and calculus.

Data from Artificial Analysis

As of April 18, 2026, the top-scoring model on MATH-500 is GPT-5 at 99.4%, followed by o3 at 99.2% and Grok 3 Mini at 99.2%. 124 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

124

Best Score

99.4

Average

83.5

Std Dev

16.3

Categories
Mathematical Problem Solving
Provider
Model
Input $/M
Output $/M
MATH-500
Actions
$1.250
$10.000
99.4
$2.000
$8.000
99.2
$0.250
$0.500
99.2
$3.000
$15.000
99.1
$1.250
$10.000
99.1
$3.000
$15.000
99.0
$0.550
$2.200
98.9
$1.250
$10.000
98.7
$1.250
$10.000
98.6
$1.100
$4.400
98.5
$0.130
$0.600
98.4
$0.500
$2.150
98.3
$0.100
$0.400
98.3
$15.000
$75.000
98.2
$0.300
$2.500
98.1
$0.300
$2.500
98.1
$1.250
$10.000
98.0
$0.400
$1.760
98.0
$0.071
$0.100
98.0
$0.600
$2.200
97.9
$0.080
$0.300
97.6
$0.090
$0.300
97.5
$0.550
$2.200
97.3
$0.400
$1.760
97.2
$0.550
$2.200
97.1
$0.550
$2.200
97.1
$15.000
$60.000
97.0
$0.100
$0.400
96.9
$1.000
$10.000
96.7
$0.550
$2.000
96.6
$0.130
$0.850
96.5
$0.080
$0.240
96.1
$0.060
$0.200
96.1
$0.080
$0.280
95.9
$0.100
$0.400
95.9
$0.150
$0.580
95.7
$2.000
$8.000
95.7
$3.000
$15.000
94.7
$0.200
$0.770
94.2
$0.220
$0.900
94.2
$0.290
$0.290
94.1
$15.000
$75.000
94.1
$0.700
$0.800
93.5
$3.000
$15.000
93.4
$0.200
$0.200
93.3
$0.300
$2.500
93.2
$0.280
$0.900
93.1
$0.100
$0.400
93.0
$0.455
$0.900
93.0
$0.300
$2.500
92.6
$0.100
$0.400
92.6
$0.200
$0.800
92.5
$2.000
$8.000
91.3
$0.150
$0.580
91.0
$0.400
$2.000
90.7
$0.050
$0.200
90.4
$0.455
$0.900
90.2
$0.070
$0.270
89.3
$0.150
$0.600
88.9
$0.200
$0.770
88.7
$0.080
$0.160
88.3
$0.075
$0.200
88.3
$0.075
$0.300
87.3
$0.075
$0.300
87.3
$0.060
$0.200
87.1
$3.000
$15.000
87.0
$0.080
$0.240
86.9
$0.080
$0.280
86.3
$1.250
$10.000
86.1
$0.120
$0.390
85.8
$0.040
$0.130
85.3
$3.000
$15.000
85.0
$0.100
$0.400
84.8
$0.080
$0.300
84.4
$0.200
$0.200
84.3
$0.200
$0.200
84.3
$0.200
$0.200
84.3
$2.500
$12.500
83.9
$1.040
$4.160
83.5
$0.050
$0.200
82.8
$2.500
$10.000
81.9
$1.000
$1.000
81.7
$0.065
$0.140
81.0
$0.033
$0.130
80.5
$0.200
$0.600
80.5
$0.150
$0.600
78.9
$0.800
$3.200
78.6
$0.100
$0.400
77.5
$0.100
$0.320
77.3
$3.000
$15.000
77.1
$0.060
$0.120
77.1
$0.100
$0.400
77.0
$0.660
$0.800
76.7
$0.040
$0.080
76.6
$0.060
$0.240
76.5
$3.000
$15.000
74.5
$5.000
$15.000
73.7
$2.000
$6.000
73.6
$0.900
$0.900
73.3
$0.800
$4.000
72.1
$0.050
$0.080
71.5
$2.000
$6.000
71.4
$2.000
$6.000
71.4
$0.100
$0.300
70.7
$0.400
$2.000
70.7
$0.900
$0.900
70.3
$0.035
$0.140
70.3
$0.200
$0.600
67.7
$0.200
$0.200
66.0
$0.340
$0.390
64.9
$0.070
$0.280
63.5
$2.000
$8.000
60.0
$1.200
$1.200
54.5
$0.300
$0.300
53.8
$0.500
$1.500
52.7
$0.020
$0.050
51.9
$0.060
$0.060
51.6
$0.030
$0.040
49.9
$0.030
$0.050
48.9
$0.510
$0.740
48.3
$0.500
$1.500
44.1
$0.250
$1.250
39.4
$0.140
$0.420
29.9
$0.020
$0.020
14.0

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About MATH-500

Competition mathematics problems requiring multi-step reasoning, covering algebra, geometry, number theory, and calculus.

This leaderboard shows all models with MATH-500 benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Competition mathematics problems requiring multi-step reasoning, covering algebra, geometry, number theory, and calculus.
As of April 18, 2026, GPT-5 leads the MATH-500 leaderboard with a score of 99.4. Rankings change as new models are released and evaluated.
Currently 124 models have been evaluated on MATH-500, with an average score of 83.5 and standard deviation of 16.3.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.