Price Per TokenPrice Per Token

Knights and Knaves Leaderboard

Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.

Data from LayerLens

As of April 18, 2026, the top-scoring model on Knights and Knaves is o3 Mini at 99.7%, followed by o4 Mini High at 99.7% and R1 0528 at 97.9%. 26 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

26

Best Score

99.7

Average

54.6

Std Dev

30.3

Categories
Reasoning and Logic
Provider
Model
Input $/M
Output $/M
Knights and Knaves
Actions
$0.550
$2.200
99.7
$1.100
$4.400
99.7
$0.500
$2.150
97.9
$0.550
$2.000
97.3
$3.000
$15.000
94.0
$0.030
$0.100
94.0
$0.030
$0.100
94.0
$2.000
$8.000
77.1
$0.400
$2.000
60.7
$0.014
$0.028
60.3
$0.150
$0.600
59.4
$0.080
$0.160
57.6
$0.075
$0.200
55.6
$0.100
$0.400
52.9
$0.060
$0.120
40.6
$2.500
$10.000
39.1
$0.065
$0.140
38.3
$0.900
$0.900
33.9
$2.000
$6.000
31.4
$0.080
$0.300
30.7
$0.800
$3.200
28.3
$0.070
$0.280
24.0
$0.035
$0.140
19.3
$0.800
$4.000
15.0
$2.500
$10.000
12.5
$0.030
$0.050
6.1

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About Knights and Knaves

Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.

This leaderboard shows all models with Knights and Knaves benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.
As of April 18, 2026, o3 Mini leads the Knights and Knaves leaderboard with a score of 99.7. Rankings change as new models are released and evaluated.
Currently 26 models have been evaluated on Knights and Knaves, with an average score of 54.6 and standard deviation of 30.3.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.