ARC Easy Leaderboard

AI2 Reasoning Challenge (Easy set) — grade-school science questions.

As of June 2, 2026, the top-scoring model on ARC Easy is Claude Opus 4 at 99.7%, followed by Claude Opus 4 at 99.7% and Qwen3 32B at 99.1%. 40 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

Best Score

99.7

Average

97.9

Std Dev

3.3

Categories

Reasoning and Logic

SourceLayerLens

Provider	Model	Input $/M	Output $/M	ARC Easy	Actions
A Anthropic	Claude Opus 4 Thinking	$15.000	$75.000	99.7	Try
A Anthropic	Claude Opus 4	$15.000	$75.000	99.7	Try
AL Alibaba	Qwen3 32B Thinking	$0.080	$0.280	99.1	Try
AL Alibaba	Qwen3 32B	$0.080	$0.280	99.1	Try
MI Mistral	Mistral Medium 3	$0.400	$2.000	99.1	Try
X xAI	Grok 3	$3.000	$15.000	99.1	Try
X xAI	Grok 3	$3.000	$15.000	99.1	Try
X xAI	Grok 4	$3.000	$15.000	99.0	Try
X Xai	Grok 3 Beta	$3.000	$15.000	99.0	Try
O OpenAI	GPT-4.1	$2.000	$8.000	99.0	Try
MS Microsoft	Phi 4	$0.065	$0.140	98.9	Try
O OpenAI	o4 Mini High	$1.100	$4.400	98.9	Try
MM MiniMax	MiniMax M1	$0.400	$2.200	98.9	Try
MM MiniMax	MiniMax M1	$0.400	$2.200	98.9	Try
G Google	Gemini 2.5 Flash Thinking	$0.300	$2.500	98.9	Try
G Google	Gemini 2.5 Flash Thinking	$0.300	$2.500	98.9	Try
G Google	Gemini 2.5 Flash	$0.300	$2.500	98.9	Try
G Google	Gemini 2.5 Flash	$0.300	$2.500	98.9	Try
X Xai	Grok 3 Mini Beta	$0.300	$0.500	98.9	Try
AM Amazon	Nova Pro 1.0	$0.800	$3.200	98.8	Try
CO Cohere	Command A	$2.500	$10.000	98.8	Try
G Google	Gemini 2.0 Flash	$0.100	$0.400	98.8	Try
AL Alibaba	QwQ 32B	$0.900	$0.900	98.7	Try
AL Alibaba	QwQ 32B	$0.900	$0.900	98.7	Try
DS DeepSeek	R1 0528	$0.500	$2.150	98.7	Try
O OpenAI	o3 Mini	$0.550	$2.200	98.6	Try
M Meta	Llama 4 Scout	$0.080	$0.300	98.6	Try
M Meta	Llama 4 Maverick	$0.150	$0.600	98.6	Try
DS Deepseek	DeepSeek V3	$0.014	$0.028	98.6	Try
G Google	Gemma 3 27B	$0.080	$0.160	98.2	Try
DS DeepSeek	R1	$0.550	$2.000	97.9	Try
MI Mistral	Mistral Small 3.2 24B	$0.075	$0.200	97.8	Try
NO Nous Research	Hermes 3 70B Instruct	$0.300	$0.300	97.6	Try
AM Amazon	Nova Lite 1.0	$0.060	$0.240	97.5	Try
IF Inflection	Inflection 3 Productivity	$2.500	$10.000	97.2	Try
MI Mistral	Devstral Small 1.1	$0.070	$0.280	97.1	Try
IF Inflection	Inflection 3 Pi	$2.500	$10.000	96.6	Try
AM Amazon	Nova Micro 1.0	$0.035	$0.140	95.8	Try
G Google	Gemma 3n 4B	$0.060	$0.120	93.4	Try
M Meta	Llama 3.2 3B Instruct	$0.030	$0.050	78.6	Try

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community

8 Ways to Use Fewer Tokens

About ARC Easy

AI2 Reasoning Challenge (Easy set) — grade-school science questions.

This leaderboard shows all models with ARC Easy benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.