Price Per TokenPrice Per Token

WMDP Leaderboard

Weapons of Mass Destruction Proxy — benchmark testing knowledge safety boundaries.

Data from LayerLens

As of April 18, 2026, the top-scoring model on WMDP is Gemini 3 Flash Preview at 86.8%, followed by Gemini 3 Flash Preview at 86.8% and o3 Mini at 80.5%. 17 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

17

Best Score

86.8

Average

68.2

Std Dev

17.3

Categories
Reasoning and Logic
Provider
Model
Input $/M
Output $/M
WMDP
Actions
$0.500
$3.000
86.8
$0.500
$3.000
86.8
$0.550
$2.200
80.5
$3.000
$15.000
78.2
$3.000
$15.000
78.2
$3.000
$15.000
74.3
$0.100
$0.400
72.0
$0.014
$0.028
71.8
$2.000
$8.000
71.4
$2.500
$10.000
69.6
$0.900
$0.900
67.7
$0.800
$3.200
67.0
$2.000
$6.000
65.3
$0.800
$4.000
64.3
$0.070
$0.280
61.0
$0.065
$0.140
58.0
$0.030
$0.050
6.7

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About WMDP

Weapons of Mass Destruction Proxy — benchmark testing knowledge safety boundaries.

This leaderboard shows all models with WMDP benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Weapons of Mass Destruction Proxy — benchmark testing knowledge safety boundaries.
As of April 18, 2026, Gemini 3 Flash Preview leads the WMDP leaderboard with a score of 86.8. Rankings change as new models are released and evaluated.
Currently 17 models have been evaluated on WMDP, with an average score of 68.2 and standard deviation of 17.3.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.