Price Per TokenPrice Per Token

ARC Challenge Leaderboard

AI2 Reasoning Challenge (Challenge set) — grade-school science questions requiring complex reasoning.

Data from LayerLens

Models

35

Best Score

96.3

Average

92.0

Std Dev

6.7

Categories
Reasoning and Logic
Provider
Model
Input $/M
Output $/M
ARC Challenge
Actions
$1.250
$10.000
96.3
$0.400
$2.200
95.3
$0.300
$0.500
95.2
$2.000
$8.000
95.1
$1.100
$4.400
95.1
$0.450
$2.150
95.1
$0.150
$0.600
95.0
$0.071
$0.100
94.8
$3.000
$15.000
94.7
$3.000
$15.000
94.7
$0.080
$0.240
94.7
$0.080
$0.240
94.7
$0.150
$0.400
94.7
$0.060
$0.140
94.6
$0.100
$0.400
94.2
$0.280
$1.100
94.0
$0.320
$0.890
94.0
$0.080
$0.300
93.9
$0.400
$2.000
93.9
$3.000
$15.000
93.9
$2.500
$10.000
93.7
$3.000
$15.000
93.7
$0.400
$2.000
93.4
$0.800
$3.200
92.7
$2.000
$6.000
92.2
$0.060
$0.180
91.8
$0.700
$2.500
91.5
$0.040
$0.150
91.0
$0.100
$0.300
90.4
$0.800
$4.000
90.4
$2.500
$10.000
89.2
$2.500
$10.000
89.2
$0.035
$0.140
86.9
$0.020
$0.040
81.7
$0.051
$0.340
56.3

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

93 out of our 301 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About ARC Challenge

AI2 Reasoning Challenge (Challenge set) — grade-school science questions requiring complex reasoning.

This leaderboard shows all models with ARC Challenge benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Advertise with us