Price Per TokenPrice Per Token

ARC Easy Leaderboard

AI2 Reasoning Challenge (Easy set) — grade-school science questions.

Data from LayerLens

As of April 18, 2026, the top-scoring model on ARC Easy is Claude Opus 4 at 99.7%, followed by Claude Opus 4 at 99.7% and Qwen3 32B at 99.1%. 40 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

40

Best Score

99.7

Average

97.9

Std Dev

3.3

Categories
Reasoning and Logic
Provider
Model
Input $/M
Output $/M
ARC Easy
Actions
$15.000
$75.000
99.7
$15.000
$75.000
99.7
$0.080
$0.240
99.1
$0.080
$0.240
99.1
$0.400
$2.000
99.1
$3.000
$15.000
99.1
$3.000
$15.000
99.1
$3.000
$15.000
99.0
$3.000
$15.000
99.0
$2.000
$8.000
99.0
$0.065
$0.140
98.9
$1.100
$4.400
98.9
$0.400
$1.760
98.9
$0.400
$1.760
98.9
$0.300
$2.500
98.9
$0.300
$2.500
98.9
$0.300
$2.500
98.9
$0.300
$2.500
98.9
$0.300
$0.500
98.9
$0.800
$3.200
98.8
$2.500
$10.000
98.8
$0.100
$0.400
98.8
$0.150
$0.580
98.7
$0.150
$0.580
98.7
$0.500
$2.150
98.7
$0.550
$2.200
98.6
$0.080
$0.300
98.6
$0.150
$0.600
98.6
$0.014
$0.028
98.6
$0.080
$0.160
98.2
$0.550
$2.000
97.9
$0.075
$0.200
97.8
$0.300
$0.300
97.6
$0.060
$0.240
97.5
$2.500
$10.000
97.2
$0.070
$0.280
97.1
$2.500
$10.000
96.6
$0.035
$0.140
95.8
$0.060
$0.120
93.4
$0.030
$0.050
78.6

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About ARC Easy

AI2 Reasoning Challenge (Easy set) — grade-school science questions.

This leaderboard shows all models with ARC Easy benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

AI2 Reasoning Challenge (Easy set) — grade-school science questions.
As of April 18, 2026, Claude Opus 4 leads the ARC Easy leaderboard with a score of 99.7. Rankings change as new models are released and evaluated.
Currently 40 models have been evaluated on ARC Easy, with an average score of 97.9 and standard deviation of 3.3.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.