SWE-bench Lite Leaderboard

Software Engineering benchmark testing ability to resolve real GitHub issues.

As of June 2, 2026, the top-scoring model on SWE-bench Lite is Claude Opus 4.6 at 62.7%, followed by Claude Opus 4.6 at 62.7% and MiniMax M2.5 at 56.3%. 62 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

Best Score

62.7

Average

26.8

Std Dev

19.9

Categories

Multi-turn

SourceLayerLens

Provider	Model	Input $/M	Output $/M	SWE-bench Lite	Actions
A Anthropic	Claude Opus 4.6 Thinking	$5.000	$25.000	62.7	Try
A Anthropic	Claude Opus 4.6	$5.000	$25.000	62.7	Try
MM MiniMax	MiniMax M2.5	$0.150	$1.150	56.3	Try
O OpenAI	GPT-5	$1.250	$10.000	54.3	Try
O OpenAI	GPT-5	$1.250	$10.000	54.3	Try
O OpenAI	GPT-5	$1.250	$10.000	54.3	Try
O OpenAI	GPT-5	$1.250	$10.000	54.3	Try
A Anthropic	Claude Haiku 4.5 Thinking	$1.000	$5.000	54.3	Try
A Anthropic	Claude Haiku 4.5	$1.000	$5.000	54.3	Try
Z Z AI	GLM 5 Thinking	$0.600	$2.080	53.3	Try
Z Z AI	GLM 5	$0.600	$2.080	53.3	Try
A Anthropic	Claude Opus 4.5 Thinking	$5.000	$25.000	49.3	Try
A Anthropic	Claude Opus 4.5	$5.000	$25.000	49.3	Try
AL Alibaba	Qwen3 Coder 480B A35B (exacto)	$0.220	$0.900	44.7	Try
K Kimi	Kimi K2 0711	$0.550	$2.200	42.0	Try
K Kimi	Kimi K2 0711	$0.550	$2.200	42.0	Try
Z Z AI	GLM 4.6 Thinking	$0.430	$1.740	42.0	Try
Z Z AI	GLM 4.6	$0.430	$1.740	42.0	Try
G Google	Gemini 2.5 Pro	$1.000	$10.000	40.0	Try
MM MiniMax	MiniMax M2	$0.255	$1.000	39.0	Try
O OpenAI	GPT-5 Mini	$0.250	$2.000	38.3	Try
O OpenAI	GPT-5 Mini	$0.250	$2.000	38.3	Try
MI Mistral	Mistral Medium 3.1	$0.400	$2.000	36.5	Try
AL Alibaba	Qwen3 235B A22B Instruct 2507	$0.071	$0.100	36.3	Try
O OpenAI	GPT-5.1	$1.250	$10.000	36.3	Try
O OpenAI	GPT-5.1	$1.250	$10.000	36.3	Try
MI Mistral AI	Mistral Large 3 2512	$0.500	$1.500	33.3	Try
DS Deepseek	DeepSeek V3	$0.014	$0.028	29.1	Try
A Anthropic	Claude 3.5 Haiku	$0.800	$4.000	27.7	Try
G Google	Gemini 2.5 Flash Thinking	$0.300	$2.500	26.1	Try
G Google	Gemini 2.5 Flash Thinking	$0.300	$2.500	26.1	Try
G Google	Gemini 2.5 Flash	$0.300	$2.500	26.1	Try
G Google	Gemini 2.5 Flash	$0.300	$2.500	26.1	Try
Z Z AI	GLM 4.6V Thinking	$0.300	$0.900	26.0	Try
Z Z AI	GLM 4.6V	$0.300	$0.900	26.0	Try
AL Alibaba	Qwen3.5 397B A17B	$0.390	$0.900	20.0	Try
AL Alibaba	Qwen3.5 397B A17B	$0.390	$0.900	20.0	Try
AL Alibaba	Qwen3 32B Thinking	$0.080	$0.280	16.3	Try
AL Alibaba	Qwen3 32B	$0.080	$0.280	16.3	Try
DS DeepSeek	DeepSeek V3.1	$0.210	$0.790	14.3	Try
DS DeepSeek	DeepSeek V3.1 Thinking	$0.210	$0.790	14.3	Try
G Google	Gemini 3 Flash Preview Thinking	$0.500	$3.000	12.7	Try
G Google	Gemini 3 Flash Preview	$0.500	$3.000	12.7	Try
O OpenAI	GPT-OSS-120b	$0.039	$0.100	9.0	Try
O OpenAI	GPT-OSS-120b	$0.039	$0.100	9.0	Try
M Meta	Llama 4 Maverick	$0.150	$0.600	8.0	Try
X xAI	Grok 4	$3.000	$15.000	7.7	Try
Z Z AI	GLM 4.5 Thinking	$0.600	$2.200	7.7	Try
X xAI	Grok 4 Fast Thinking	$0.200	$0.500	7.0	Try
MI Mistral	Mistral Small 3.2 24B	$0.075	$0.200	5.7	Try
M Meta	Llama 4 Scout	$0.080	$0.300	4.0	Try
AM Amazon	Nova Pro 1.0	$0.800	$3.200	2.7	Try
X xAI	Grok 4.1 Fast Thinking	$0.000	$0.000	0.7	Try
X xAI	Grok 4.1 Fast	$0.000	$0.000	0.7	Try
MS Microsoft	Phi 4	$0.065	$0.140	-	Try
MM MiniMax	MiniMax M1	$0.400	$2.200	-	Try
MM MiniMax	MiniMax M1	$0.400	$2.200	-	Try
DS DeepSeek	DeepSeek V3.2 Exp Thinking	$0.270	$0.410	-	Try
DS DeepSeek	DeepSeek V3.2 Exp	$0.270	$0.410	-	Try
K Kimi	Kimi K2.5 Thinking	$0.400	$1.900	-	Try
K Kimi	Kimi K2.5	$0.400	$1.900	-	Try
QW Qwen	Qwen3.5 Plus	$0.260	$1.560	-	Try

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community

8 Ways to Use Fewer Tokens

About SWE-bench Lite

Software Engineering benchmark testing ability to resolve real GitHub issues.

This leaderboard shows all models with SWE-bench Lite benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.