Medical question answering benchmark from USMLE-style questions.
Data from LayerLens
Models
33
Best Score
95.2
Average
79.3
Std Dev
11.1
Provider | Model | Input $/M | Output $/M | MedQA | Actions |
|---|---|---|---|---|---|
$1.100 | $4.400 | 95.2 | |||
$1.250 | $10.000 | 94.6 | |||
$3.000 | $15.000 | 92.3 | |||
$0.700 | $2.500 | 92.1 | |||
$1.100 | $4.400 | 91.4 | |||
$2.000 | $8.000 | 89.7 | |||
$2.500 | $10.000 | 88.1 | |||
$3.000 | $15.000 | 87.6 | |||
$0.150 | $0.400 | 86.5 | |||
$3.000 | $15.000 | 86.1 | |||
$1.200 | $6.000 | 85.5 | |||
$0.080 | $0.280 | 85.3 | |||
$0.080 | $0.280 | 85.3 | |||
$0.071 | $0.100 | 84.8 | |||
$0.100 | $0.400 | 83.2 | |||
$4.000 | $4.000 | 82.9 | |||
$0.320 | $0.890 | 80.3 | |||
$0.800 | $3.200 | 79.3 | |||
$0.400 | $2.000 | 79.1 | |||
$0.150 | $0.600 | 78.4 | |||
$2.000 | $6.000 | 78.3 | |||
$0.060 | $0.140 | 77.8 | |||
$0.800 | $4.000 | 77.8 | |||
$2.500 | $10.000 | 73.3 | |||
$1.000 | $3.000 | 72.8 | |||
$1.000 | $3.000 | 72.8 | |||
$0.060 | $0.180 | 70.5 | |||
$0.130 | $0.400 | 70.1 | |||
$0.130 | $0.400 | 70.1 | |||
$0.040 | $0.150 | 67.5 | |||
$0.020 | $0.040 | 54.2 | |||
$0.051 | $0.340 | 52.6 | |||
$0.080 | $0.300 | 52.0 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Deploy OpenClaw in Under 1 Minute— We handle hosting, scaling, and maintenance
93 out of our 301 tracked models have had a price change in March.
Get our weekly newsletter on pricing changes, new releases, and tools.
Medical question answering benchmark from USMLE-style questions.
This leaderboard shows all models with MedQA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.