SimpleQA Leaderboard

Simple question answering benchmark testing factual accuracy and knowledge retrieval.

As of June 2, 2026, the top-scoring model on SimpleQA is Gemini 2.5 Pro at 53.0%, followed by Qwen3 235B A22B Instruct 2507 at 50.6% and Qwen3 VL 235B A22B Instruct at 46.7%. 45 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

Best Score

53.0

Average

20.8

Std Dev

14.0

Categories

Reasoning and LogicGeneral Knowledge

SourceLayerLens

Provider	Model	Input $/M	Output $/M	SimpleQA	Actions
G Google	Gemini 2.5 Pro	$1.000	$10.000	53.0	Try
AL Alibaba	Qwen3 235B A22B Instruct 2507	$0.071	$0.100	50.6	Try
AL Alibaba	Qwen3 VL 235B A22B Instruct	$0.200	$0.880	46.7	Try
O OpenAI	GPT-4.1	$2.000	$8.000	40.4	Try
AL Alibaba	Qwen3 Next 80B A3B Instruct	$0.090	$0.780	40.1	Try
X Xai	Grok 3 Beta	$3.000	$15.000	38.3	Try
AL Alibaba	Qwen3 VL 235B A22B Thinking Thinking	$0.260	$0.900	37.9	Try
X xAI	Grok 3	$3.000	$15.000	37.4	Try
X xAI	Grok 3	$3.000	$15.000	37.4	Try
BD Baidu	ERNIE 4.5 300B A47B	$0.900	$0.900	36.9	Try
A Anthropic	Claude 3.7 Sonnet Thinking	$3.000	$15.000	32.8	Try
A Anthropic	Claude 3.7 Sonnet	$3.000	$15.000	32.8	Try
DS DeepSeek	R1	$0.550	$2.000	29.1	Try
K Kimi	Kimi K2 0711	$0.550	$2.200	26.5	Try
K Kimi	Kimi K2 0711	$0.550	$2.200	26.5	Try
DS DeepSeek	R1 0528	$0.500	$2.150	25.1	Try
DS DeepSeek	DeepSeek V3.1	$0.210	$0.790	23.3	Try
DS DeepSeek	DeepSeek V3.1 Thinking	$0.210	$0.790	23.3	Try
DS Deepseek	DeepSeek V3	$0.014	$0.028	23.0	Try
M Meta	Llama 4 Maverick	$0.150	$0.600	22.1	Try
MI Mistral	Mistral Medium 3.1	$0.400	$2.000	20.5	Try
MI Mistral	Mistral Medium 3	$0.400	$2.000	19.7	Try
X Xai	Grok 3 Mini Beta	$0.300	$0.500	18.4	Try
NO Nous Research	Hermes 3 70B Instruct	$0.300	$0.300	17.1	Try
MI Mistral	Pixtral Large 2411	$2.000	$6.000	16.7	Try
CO Cohere	Command A	$2.500	$10.000	15.8	Try
O OpenAI	o3 Mini	$0.550	$2.200	14.0	Try
AL Alibaba	Qwen3 235B A22B Thinking	$0.455	$0.900	12.7	Try
AL Alibaba	Qwen3 235B A22B	$0.455	$0.900	12.7	Try
AM Amazon	Nova Pro 1.0	$0.800	$3.200	12.6	Try
AL Alibaba	Tongyi DeepResearch 30B A3B	$0.090	$0.400	11.9	Try
MI Mistral	Mistral Small 3.2 24B	$0.075	$0.200	9.7	Try
G Google	Gemma 3 27B	$0.080	$0.160	8.5	Try
A Anthropic	Claude 3.5 Haiku	$0.800	$4.000	8.0	Try
M Meta	Llama 4 Scout	$0.080	$0.300	7.3	Try
MI Mistral	Devstral Small 1.1	$0.070	$0.280	6.7	Try
AM Amazon	Nova Lite 1.0	$0.060	$0.240	6.6	Try
AL Alibaba	Qwen3 30B A3B Thinking	$0.080	$0.280	5.6	Try
AL Alibaba	Qwen3 30B A3B	$0.080	$0.280	5.6	Try
AL Alibaba	Qwen3 32B Thinking	$0.080	$0.280	5.5	Try
AL Alibaba	Qwen3 32B	$0.080	$0.280	5.5	Try
AM Amazon	Nova Micro 1.0	$0.035	$0.140	4.7	Try
G Google	Gemma 3n 4B	$0.060	$0.120	4.0	Try
MS Microsoft	Phi 4	$0.065	$0.140	2.3	Try
M Meta	Llama 3.2 3B Instruct	$0.030	$0.050	0.5	Try

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community

8 Ways to Use Fewer Tokens

About SimpleQA

Simple question answering benchmark testing factual accuracy and knowledge retrieval.

This leaderboard shows all models with SimpleQA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.