Price Per TokenPrice Per Token

Mathematics Leaderboard

Mathematics benchmark covering algebra, geometry, number theory, and calculus problems.

Data from LayerLens

As of March 16, 2026, the top-scoring model on Mathematics is Claude Opus 4.6 at 95.6%, followed by Claude Opus 4.6 at 95.6% and o4 Mini High at 94.6%. 34 models have been evaluated on this benchmark.

Last updated: March 16, 2026

Models

34

Best Score

95.6

Average

84.2

Std Dev

12.0

Categories
Mathematical Problem Solving
Provider
Model
Input $/M
Output $/M
Mathematics
Actions
$5.000
$25.000
95.6
$5.000
$25.000
95.6
$1.100
$4.400
94.6
$0.720
$2.300
94.0
$0.720
$2.300
94.0
$1.100
$4.400
93.1
$0.080
$0.280
93.0
$0.080
$0.280
93.0
$0.550
$2.190
92.7
$0.150
$0.400
92.1
$0.150
$0.400
92.1
$3.000
$15.000
92.0
$15.000
$75.000
91.2
$15.000
$75.000
91.2
$0.100
$0.400
90.7
$3.000
$15.000
90.3
$3.000
$15.000
90.3
$3.000
$15.000
89.0
$3.000
$15.000
89.0
$0.150
$0.600
86.8
$0.030
$0.110
84.9
$0.400
$2.000
84.2
$0.014
$0.028
83.1
$0.080
$0.300
80.0
$0.060
$0.140
78.2
$2.500
$10.000
77.2
$0.800
$3.200
74.8
$0.030
$0.050
74.4
$0.800
$4.000
73.6
$1.000
$10.000
70.8
$0.060
$0.240
70.1
$2.000
$6.000
68.5
$0.035
$0.140
67.0
$2.500
$10.000
36.9

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

108 out of our 483 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About Mathematics

Mathematics benchmark covering algebra, geometry, number theory, and calculus problems.

This leaderboard shows all models with Mathematics benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Mathematics benchmark covering algebra, geometry, number theory, and calculus problems.
As of March 16, 2026, Claude Opus 4.6 leads the Mathematics leaderboard with a score of 95.6. Rankings change as new models are released and evaluated.
Currently 34 models have been evaluated on Mathematics, with an average score of 84.2 and standard deviation of 12.0.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.