Formal Logic Extended Leaderboard

Extended formal logic benchmark testing deductive and propositional reasoning.

As of June 2, 2026, the top-scoring model on Formal Logic Extended is o3 Mini at 99.8%, followed by R1 0528 at 99.2% and R1 at 98.4%. 19 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

Best Score

99.8

Average

64.4

Std Dev

34.4

Categories

Reasoning and Logic

SourceLayerLens

Provider	Model	Input $/M	Output $/M	Formal Logic Extended	Actions
O OpenAI	o3 Mini	$0.550	$2.200	99.8	Try
DS DeepSeek	R1 0528	$0.500	$2.150	99.2	Try
DS DeepSeek	R1	$0.550	$2.000	98.4	Try
A Anthropic	Claude 3.7 Sonnet Thinking	$3.000	$15.000	95.6	Try
O OpenAI	GPT-4.1	$2.000	$8.000	90.6	Try
AL Alibaba	QwQ 32B	$0.900	$0.900	89.5	Try
AL Alibaba	QwQ 32B	$0.900	$0.900	89.5	Try
A Anthropic	Claude 3.7 Sonnet	$3.000	$15.000	87.4	Try
DS Deepseek	DeepSeek V3	$0.014	$0.028	83.6	Try
CO Cohere	Command A	$2.500	$10.000	80.7	Try
X Xai	Grok 3 Beta	$3.000	$15.000	71.9	Try
A Anthropic	Claude 3.5 Haiku	$0.800	$4.000	60.3	Try
M Meta	Llama 3.1 405B Instruct	$0.900	$0.900	58.2	Try
M Meta	Llama 3.2 3B Instruct	$0.030	$0.050	52.2	Try
MI Mistral	Devstral Small 1.1	$0.070	$0.280	45.1	Try
M Meta	Llama 4 Maverick	$0.150	$0.600	10.8	Try
MS Microsoft	Phi 4	$0.065	$0.140	9.7	Try
M Meta	Llama 4 Scout	$0.080	$0.300	0.1	Try
IF Inflection	Inflection 3 Pi	$2.500	$10.000	-	Try

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community

8 Ways to Use Fewer Tokens

About Formal Logic Extended

Extended formal logic benchmark testing deductive and propositional reasoning.

This leaderboard shows all models with Formal Logic Extended benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.