Price Per TokenPrice Per Token

MedQA Leaderboard

Medical question answering benchmark from USMLE-style questions.

Data from LayerLens

As of April 18, 2026, the top-scoring model on MedQA is o4 Mini High at 95.2%, followed by Gemini 2.5 Pro at 94.6% and Claude 3.7 Sonnet at 92.3%. 34 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

34

Best Score

95.2

Average

79.4

Std Dev

11.0

Categories
Reasoning and Logic
Provider
Model
Input $/M
Output $/M
MedQA
Actions
$1.100
$4.400
95.2
$1.000
$10.000
94.6
$3.000
$15.000
92.3
$0.550
$2.000
92.1
$0.550
$2.200
91.4
$2.000
$8.000
89.7
$3.000
$15.000
87.6
$0.150
$0.580
86.5
$0.150
$0.580
86.5
$3.000
$15.000
86.1
$0.780
$3.900
85.5
$0.780
$3.900
85.5
$0.080
$0.280
85.3
$0.080
$0.280
85.3
$0.071
$0.100
84.8
$0.100
$0.400
83.2
$0.900
$0.900
82.9
$0.014
$0.028
80.3
$0.800
$3.200
79.3
$0.400
$2.000
79.1
$0.150
$0.600
78.4
$2.000
$6.000
78.3
$0.065
$0.140
77.8
$0.800
$4.000
77.8
$2.500
$10.000
73.3
$1.000
$3.000
72.8
$1.000
$3.000
72.8
$0.075
$0.200
70.5
$0.130
$0.400
70.1
$0.130
$0.400
70.1
$0.080
$0.160
67.5
$0.060
$0.120
54.2
$0.030
$0.050
52.6
$0.080
$0.300
52.0

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About MedQA

Medical question answering benchmark from USMLE-style questions.

This leaderboard shows all models with MedQA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Medical question answering benchmark from USMLE-style questions.
As of April 18, 2026, o4 Mini High leads the MedQA leaderboard with a score of 95.2. Rankings change as new models are released and evaluated.
Currently 34 models have been evaluated on MedQA, with an average score of 79.4 and standard deviation of 11.0.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.