Medical question answering benchmark from USMLE-style questions.
Data from LayerLens
As of April 18, 2026, the top-scoring model on MedQA is o4 Mini High at 95.2%, followed by Gemini 2.5 Pro at 94.6% and Claude 3.7 Sonnet at 92.3%. 34 models have been evaluated on this benchmark.
Last updated: April 18, 2026
Models
34
Best Score
95.2
Average
79.4
Std Dev
11.0
Provider | Model | Input $/M | Output $/M | MedQA | Actions |
|---|---|---|---|---|---|
$1.100 | $4.400 | 95.2 | |||
$1.000 | $10.000 | 94.6 | |||
$3.000 | $15.000 | 92.3 | |||
$0.550 | $2.000 | 92.1 | |||
$0.550 | $2.200 | 91.4 | |||
$2.000 | $8.000 | 89.7 | |||
$3.000 | $15.000 | 87.6 | |||
$0.150 | $0.580 | 86.5 | |||
$0.150 | $0.580 | 86.5 | |||
$3.000 | $15.000 | 86.1 | |||
$0.780 | $3.900 | 85.5 | |||
$0.780 | $3.900 | 85.5 | |||
$0.080 | $0.280 | 85.3 | |||
$0.080 | $0.280 | 85.3 | |||
$0.071 | $0.100 | 84.8 | |||
$0.100 | $0.400 | 83.2 | |||
$0.900 | $0.900 | 82.9 | |||
$0.014 | $0.028 | 80.3 | |||
$0.800 | $3.200 | 79.3 | |||
$0.400 | $2.000 | 79.1 | |||
$0.150 | $0.600 | 78.4 | |||
$2.000 | $6.000 | 78.3 | |||
$0.065 | $0.140 | 77.8 | |||
$0.800 | $4.000 | 77.8 | |||
$2.500 | $10.000 | 73.3 | |||
$1.000 | $3.000 | 72.8 | |||
$1.000 | $3.000 | 72.8 | |||
$0.075 | $0.200 | 70.5 | |||
$0.130 | $0.400 | 70.1 | |||
$0.130 | $0.400 | 70.1 | |||
$0.080 | $0.160 | 67.5 | |||
$0.060 | $0.120 | 54.2 | |||
$0.030 | $0.050 | 52.6 | |||
$0.080 | $0.300 | 52.0 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
Medical question answering benchmark from USMLE-style questions.
This leaderboard shows all models with MedQA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.