Knights and Knaves Leaderboard

Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.

As of June 2, 2026, the top-scoring model on Knights and Knaves is o3 Mini at 99.7%, followed by o4 Mini High at 99.7% and R1 0528 at 97.9%. 26 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

Best Score

99.7

Average

54.6

Std Dev

30.3

Categories

Reasoning and Logic

SourceLayerLens

Provider	Model	Input $/M	Output $/M	Knights and Knaves	Actions
O OpenAI	o3 Mini	$0.550	$2.200	99.7	Try
O OpenAI	o4 Mini High	$1.100	$4.400	99.7	Try
DS DeepSeek	R1 0528	$0.500	$2.150	97.9	Try
DS DeepSeek	R1	$0.550	$2.000	97.3	Try
X Xai	Grok 3 Beta	$3.000	$15.000	94.0	Try
O OpenAI	GPT-OSS-20b	$0.029	$0.140	94.0	Try
O OpenAI	GPT-OSS-20b	$0.029	$0.140	94.0	Try
O OpenAI	GPT-4.1	$2.000	$8.000	77.1	Try
MI Mistral	Mistral Medium 3	$0.400	$2.000	60.7	Try
DS Deepseek	DeepSeek V3	$0.014	$0.028	60.3	Try
M Meta	Llama 4 Maverick	$0.150	$0.600	59.4	Try
G Google	Gemma 3 27B	$0.080	$0.160	57.6	Try
MI Mistral	Mistral Small 3.2 24B	$0.075	$0.200	55.6	Try
G Google	Gemini 2.0 Flash	$0.100	$0.400	52.9	Try
G Google	Gemma 3n 4B	$0.060	$0.120	40.6	Try
CO Cohere	Command A	$2.500	$10.000	39.1	Try
MS Microsoft	Phi 4	$0.065	$0.140	38.3	Try
M Meta	Llama 3.1 405B Instruct	$0.900	$0.900	33.9	Try
MI Mistral	Pixtral Large 2411	$2.000	$6.000	31.4	Try
M Meta	Llama 4 Scout	$0.080	$0.300	30.7	Try
AM Amazon	Nova Pro 1.0	$0.800	$3.200	28.3	Try
MI Mistral	Devstral Small 1.1	$0.070	$0.280	24.0	Try
AM Amazon	Nova Micro 1.0	$0.035	$0.140	19.3	Try
A Anthropic	Claude 3.5 Haiku	$0.800	$4.000	15.0	Try
IF Inflection	Inflection 3 Productivity	$2.500	$10.000	12.5	Try
M Meta	Llama 3.2 3B Instruct	$0.030	$0.050	6.1	Try

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community

8 Ways to Use Fewer Tokens

About Knights and Knaves

Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.

This leaderboard shows all models with Knights and Knaves benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.