MMMU Leaderboard

Multimodal Understanding benchmark testing vision-language models on expert-level tasks.

As of June 2, 2026, the top-scoring model on MMMU is o4 Mini High at 79.2%, followed by GPT-5 at 79.1% and GPT-5 at 79.1%. 64 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

Best Score

79.2

Average

60.1

Std Dev

15.5

Categories

Multimodal

SourceLayerLens

Provider	Model	Input $/M	Output $/M	MMMU	Actions
O OpenAI	o4 Mini High	$1.100	$4.400	79.2	Try
O OpenAI	GPT-5	$1.250	$10.000	79.1	Try
O OpenAI	GPT-5	$1.250	$10.000	79.1	Try
O OpenAI	GPT-5	$1.250	$10.000	79.1	Try
O OpenAI	GPT-5	$1.250	$10.000	79.1	Try
AL Alibaba	Qwen3.5-122B-A10B	$0.260	$0.900	78.1	Try
AL Alibaba	Qwen3.5-122B-A10B	$0.260	$0.900	78.1	Try
AL Alibaba	Qwen3.5-27B	$0.195	$0.900	77.6	Try
AL Alibaba	Qwen3.5-27B	$0.195	$0.900	77.6	Try
QW Qwen	Qwen3.5-Flash	$0.065	$0.260	76.8	Try
AL Alibaba	Qwen3.5-35B-A3B	$0.140	$0.900	76.4	Try
AL Alibaba	Qwen3.5-35B-A3B	$0.140	$0.900	76.4	Try
A Anthropic	Claude Opus 4.5 Thinking	$5.000	$25.000	76.3	Try
A Anthropic	Claude Opus 4.5	$5.000	$25.000	76.3	Try
O OpenAI	GPT-5 Mini	$0.250	$2.000	75.3	Try
O OpenAI	GPT-5 Mini	$0.250	$2.000	75.3	Try
A Anthropic	Claude Sonnet 4.6	$3.000	$15.000	75.3	Try
A Anthropic	Claude Sonnet 4.6	$3.000	$15.000	75.3	Try
A Anthropic	Claude Sonnet 4.6	$3.000	$15.000	75.3	Try
A Anthropic	Claude Sonnet 4.5 Thinking	$3.000	$15.000	72.9	Try
A Anthropic	Claude Sonnet 4.5	$3.000	$15.000	71.7	Try
O OpenAI	GPT-4.1	$2.000	$8.000	69.3	Try
G Google	Gemini 2.0 Flash	$0.100	$0.400	69.0	Try
AL Alibaba	Qwen3 VL 235B A22B Instruct	$0.200	$0.880	68.2	Try
A Anthropic	Claude 3.7 Sonnet Thinking	$3.000	$15.000	66.9	Try
A Anthropic	Claude 3.7 Sonnet	$3.000	$15.000	66.9	Try
A Anthropic	Claude Haiku 4.5 Thinking	$1.000	$5.000	65.2	Try
A Anthropic	Claude Haiku 4.5	$1.000	$5.000	65.2	Try
O OpenAI	GPT-5.2	$1.750	$14.000	62.3	Try
O OpenAI	GPT-5.2	$1.750	$14.000	62.3	Try
O OpenAI	GPT-5.1	$1.250	$10.000	60.7	Try
O OpenAI	GPT-5.1	$1.250	$10.000	60.7	Try
X xAI	Grok 4 Fast Thinking	$0.200	$0.500	60.4	Try
MI Mistral	Mistral Medium 3	$0.400	$2.000	58.7	Try
MI Mistral	Mistral Medium 3.1	$0.400	$2.000	58.1	Try
AL Alibaba	Qwen3 Max	$0.780	$3.900	57.4	Try
AL Alibaba	Qwen3 Max	$0.780	$3.900	57.4	Try
MI Mistral	Mistral Small 3.2 24B	$0.075	$0.200	55.9	Try
A Anthropic	Claude 3.5 Haiku	$0.800	$4.000	54.3	Try
G Google	Gemma 3 27B	$0.080	$0.160	53.9	Try
G Google	Gemini 2.5 Flash Thinking	$0.300	$2.500	53.0	Try
G Google	Gemini 2.5 Flash Thinking	$0.300	$2.500	53.0	Try
G Google	Gemini 2.5 Flash	$0.300	$2.500	53.0	Try
G Google	Gemini 2.5 Flash	$0.300	$2.500	53.0	Try
AL Alibaba	QwQ 32B	$0.900	$0.900	52.9	Try
AL Alibaba	QwQ 32B	$0.900	$0.900	52.9	Try
X Xai	Grok 3 Mini Beta	$0.300	$0.500	52.4	Try
AL Alibaba	Qwen3 32B Thinking	$0.080	$0.280	52.1	Try
AL Alibaba	Qwen3 32B	$0.080	$0.280	52.1	Try
DS DeepSeek	R1	$0.550	$2.000	50.7	Try
AM Amazon	Nova Pro 1.0	$0.800	$3.200	50.1	Try
X Xai	Grok 3 Beta	$3.000	$15.000	49.1	Try
AL Alibaba	Qwen3 30B A3B Thinking	$0.080	$0.280	49.0	Try
AL Alibaba	Qwen3 30B A3B	$0.080	$0.280	49.0	Try
DS Deepseek	DeepSeek V3	$0.014	$0.028	48.2	Try
M Meta	Llama 3.1 405B Instruct	$0.900	$0.900	45.2	Try
M Meta	Llama 4 Scout	$0.080	$0.300	42.3	Try
IF Inflection	Inflection 3 Pi	$2.500	$10.000	42.0	Try
MS Microsoft	Phi 4	$0.065	$0.140	39.7	Try
CO Cohere	Command A	$2.500	$10.000	37.9	Try
M Meta	Llama 4 Maverick	$0.150	$0.600	31.3	Try
AL Alibaba	Qwen3.5 397B A17B	$0.390	$0.900	21.8	Try
AL Alibaba	Qwen3.5 397B A17B	$0.390	$0.900	21.8	Try
QW Qwen	Qwen3.5 Plus	$0.260	$1.560	13.1	Try

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community

8 Ways to Use Fewer Tokens

About MMMU

Multimodal Understanding benchmark testing vision-language models on expert-level tasks.

This leaderboard shows all models with MMMU benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.