Weapons of Mass Destruction Proxy — benchmark testing knowledge safety boundaries.
Data from LayerLens
As of April 18, 2026, the top-scoring model on WMDP is Gemini 3 Flash Preview at 86.8%, followed by Gemini 3 Flash Preview at 86.8% and o3 Mini at 80.5%. 17 models have been evaluated on this benchmark.
Last updated: April 18, 2026
Models
17
Best Score
86.8
Average
68.2
Std Dev
17.3
Provider | Model | Input $/M | Output $/M | WMDP | Actions |
|---|---|---|---|---|---|
$0.500 | $3.000 | 86.8 | |||
$0.500 | $3.000 | 86.8 | |||
$0.550 | $2.200 | 80.5 | |||
$3.000 | $15.000 | 78.2 | |||
$3.000 | $15.000 | 78.2 | |||
$3.000 | $15.000 | 74.3 | |||
$0.100 | $0.400 | 72.0 | |||
$0.014 | $0.028 | 71.8 | |||
$2.000 | $8.000 | 71.4 | |||
$2.500 | $10.000 | 69.6 | |||
$0.900 | $0.900 | 67.7 | |||
$0.800 | $3.200 | 67.0 | |||
$2.000 | $6.000 | 65.3 | |||
$0.800 | $4.000 | 64.3 | |||
$0.070 | $0.280 | 61.0 | |||
$0.065 | $0.140 | 58.0 | |||
$0.030 | $0.050 | 6.7 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
Weapons of Mass Destruction Proxy — benchmark testing knowledge safety boundaries.
This leaderboard shows all models with WMDP benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.