Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.
Data from LayerLens
As of April 18, 2026, the top-scoring model on Knights and Knaves is o3 Mini at 99.7%, followed by o4 Mini High at 99.7% and R1 0528 at 97.9%. 26 models have been evaluated on this benchmark.
Last updated: April 18, 2026
Models
26
Best Score
99.7
Average
54.6
Std Dev
30.3
Provider | Model | Input $/M | Output $/M | Knights and Knaves | Actions |
|---|---|---|---|---|---|
$0.550 | $2.200 | 99.7 | |||
$1.100 | $4.400 | 99.7 | |||
$0.500 | $2.150 | 97.9 | |||
$0.550 | $2.000 | 97.3 | |||
$3.000 | $15.000 | 94.0 | |||
$0.030 | $0.100 | 94.0 | |||
$0.030 | $0.100 | 94.0 | |||
$2.000 | $8.000 | 77.1 | |||
$0.400 | $2.000 | 60.7 | |||
$0.014 | $0.028 | 60.3 | |||
$0.150 | $0.600 | 59.4 | |||
$0.080 | $0.160 | 57.6 | |||
$0.075 | $0.200 | 55.6 | |||
$0.100 | $0.400 | 52.9 | |||
$0.060 | $0.120 | 40.6 | |||
$2.500 | $10.000 | 39.1 | |||
$0.065 | $0.140 | 38.3 | |||
$0.900 | $0.900 | 33.9 | |||
$2.000 | $6.000 | 31.4 | |||
$0.080 | $0.300 | 30.7 | |||
$0.800 | $3.200 | 28.3 | |||
$0.070 | $0.280 | 24.0 | |||
$0.035 | $0.140 | 19.3 | |||
$0.800 | $4.000 | 15.0 | |||
$2.500 | $10.000 | 12.5 | |||
$0.030 | $0.050 | 6.1 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
Logic puzzle benchmark based on knights (truth-tellers) and knaves (liars) puzzles.
This leaderboard shows all models with Knights and Knaves benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.