Price Per TokenPrice Per Token

GPQA Leaderboard

Graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry. Questions are Google-proof and extremely difficult.

Data from Artificial Analysis

As of April 4, 2026, the top-scoring model on GPQA is GPT-5.4 at 92.0%, followed by GPT-5.3 Codex at 91.5% and Gemini 3 Pro Preview at 90.8%. 242 models have been evaluated on this benchmark.

Last updated: April 4, 2026

Models

242

Best Score

92.0

Average

65.0

Std Dev

17.2

Categories
Reasoning and Logic
Provider
Model
Input $/M
Output $/M
GPQA
Actions
$2.500
$15.000
92.0
$1.750
$14.000
91.5
$2.000
$12.000
90.8
$10.500
$84.000
90.3
$1.750
$14.000
89.9
$0.500
$3.000
89.8
$5.000
$25.000
89.6
$0.390
$0.900
89.3
$0.383
$1.720
87.9
$3.000
$15.000
87.7
$3.000
$15.000
87.5
$0.300
$1.200
87.4
$1.250
$10.000
87.3
$0.400
$1.200
87.1
$5.000
$25.000
86.6
$0.780
$3.900
86.1
$0.390
$0.900
86.1
$1.250
$10.000
86.0
$0.390
$1.750
85.9
$0.195
$0.900
85.8
$0.260
$2.080
85.7
$0.140
$0.400
85.7
$1.250
$10.000
85.4
$0.200
$0.500
85.3
$0.118
$0.950
84.8
$0.200
$0.500
84.7
$1.200
$4.000
84.7
$0.090
$0.290
84.6
$20.000
$80.000
84.5
$0.163
$0.900
84.5
$1.000
$10.000
84.4
$1.250
$10.000
84.2
$0.195
$0.900
84.2
$0.260
$0.380
84.0
$5.000
$25.000
84.0
$0.550
$2.200
83.8
$1.250
$10.000
83.7
$1.250
$10.000
83.6
$0.090
$0.290
83.5
$3.000
$15.000
83.4
$0.100
$0.300
83.1
$0.270
$0.950
83.0
$0.250
$2.000
82.8
$0.400
$2.000
82.8
$2.000
$8.000
82.7
$0.260
$2.080
82.7
$1.250
$10.000
82.2
$0.720
$2.300
82.0
$0.163
$0.900
81.9
$0.450
$2.150
81.3
$0.250
$2.000
81.3
$0.500
$3.000
81.2
$0.280
$0.900
81.1
$5.000
$25.000
81.0
$15.000
$75.000
80.9
$1.200
$4.000
80.9
$1.250
$10.000
80.8
$0.040
$0.150
80.6
$3.000
$15.000
79.9
$0.270
$0.410
79.7
$3.000
$15.000
79.7
$15.000
$75.000
79.6
$0.210
$0.790
79.2
$0.130
$0.400
79.2
$0.250
$0.500
79.1
$0.300
$2.500
79.0
$0.149
$0.900
79.0
$0.383
$1.720
78.9
$0.040
$0.150
78.6
$1.100
$4.400
78.4
$0.600
$2.200
78.2
$0.039
$0.100
78.2
$0.390
$1.700
78.0
$0.150
$0.750
77.9
$3.000
$15.000
77.7
$0.255
$1.000
77.7
$0.780
$3.900
77.6
$1.100
$4.400
77.3
$3.000
$15.000
77.2
$0.250
$0.750
77.0
$0.400
$2.000
76.7
$0.550
$2.200
76.6
$0.780
$3.900
76.4
$0.780
$3.900
76.4
$0.207
$0.828
76.4
$0.200
$1.100
76.1
$0.050
$0.200
75.7
$0.071
$0.100
75.3
$0.210
$0.790
75.1
$0.260
$0.380
75.1
$0.550
$2.200
74.8
$0.100
$0.400
74.8
$2.500
$15.000
74.8
$15.000
$60.000
74.7
$0.090
$0.780
73.8
$0.270
$0.410
73.8
$0.120
$0.750
73.7
$0.150
$0.750
73.5
$0.130
$0.850
73.3
$1.000
$3.000
72.7
$0.200
$1.500
72.7
$3.000
$15.000
72.7
$0.300
$0.900
71.9
$0.200
$0.880
71.2
$1.750
$14.000
71.2
$0.100
$0.400
70.9
$0.550
$2.000
70.8
$15.000
$75.000
70.1
$0.400
$0.800
70.0
$0.130
$0.400
69.9
$0.130
$0.520
69.5
$3.000
$15.000
69.3
$0.030
$0.100
68.8
$0.600
$1.800
68.4
$3.000
$15.000
68.3
$0.300
$2.500
68.3
$0.400
$1.760
68.2
$0.050
$0.400
67.6
$1.250
$10.000
67.3
$0.039
$0.100
67.2
$1.000
$5.000
67.2
$0.150
$0.600
67.1
$0.104
$0.416
67.1
$0.050
$0.400
67.0
$0.080
$0.240
66.8
$0.200
$0.200
66.7
$2.000
$8.000
66.6
$0.720
$2.300
66.6
$0.400
$1.600
66.4
$0.390
$1.750
66.4
$0.090
$0.300
65.9
$3.000
$15.000
65.6
$0.090
$0.290
65.6
$0.250
$2.000
65.6
$0.200
$0.770
65.5
$0.100
$0.400
65.1
$1.000
$5.000
64.6
$1.250
$10.000
64.3
$0.200
$0.500
63.7
$0.390
$1.700
63.2
$0.100
$0.400
62.5
$0.100
$0.400
62.3
$0.220
$0.900
61.8
$0.080
$0.280
61.6
$0.290
$0.290
61.5
$0.400
$0.800
61.3
$0.150
$0.500
61.0
$0.200
$0.500
60.6
$0.060
$0.200
60.4
$0.300
$2.500
60.3
$3.000
$15.000
59.9
$0.400
$0.900
59.4
$0.150
$0.580
59.3
$0.150
$0.500
59.1
$0.050
$0.200
58.9
$0.400
$2.000
58.8
$1.040
$4.160
58.7
$0.080
$0.300
58.7
$0.060
$0.400
58.1
$3.000
$15.000
57.8
$0.400
$2.000
57.8
$0.065
$0.140
57.5
$0.600
$1.800
57.3
$0.200
$0.200
57.2
$0.200
$0.200
57.2
$0.040
$0.160
57.0
$2.500
$12.500
56.9
$0.300
$0.900
56.6
$0.150
$0.580
55.7
$0.200
$0.770
55.7
$0.040
$0.160
55.7
$0.200
$0.600
53.9
$1.000
$3.000
53.6
$0.075
$0.300
53.5
$0.080
$0.240
53.5
$0.100
$0.200
52.9
$2.500
$10.000
52.7
$0.200
$0.200
52.2
$2.500
$10.000
52.1
$0.200
$0.200
51.7
$0.070
$0.270
51.6
$0.120
$0.200
51.6
$0.900
$0.900
51.5
$0.080
$0.280
51.5
$0.050
$0.200
51.2
$2.000
$6.000
50.5
$0.075
$0.200
50.5
$0.800
$3.200
49.9
$0.100
$0.320
49.8
$0.400
$2.000
49.2
$0.120
$0.390
49.1
$0.130
$0.400
49.1
$2.000
$6.000
48.6
$0.100
$0.400
48.1
$0.100
$0.400
47.4
$2.000
$6.000
47.2
$1.000
$1.000
47.1
$0.150
$0.150
47.1
$0.060
$0.200
47.0
$0.200
$0.600
46.6
$0.880
$0.880
46.5
$0.050
$0.080
46.2
$0.030
$0.110
45.4
$0.050
$0.200
45.2
$0.060
$0.400
45.2
$0.200
$0.200
43.9
$0.060
$0.240
43.3
$0.080
$0.160
42.8
$0.050
$0.400
42.8
$0.080
$0.200
42.7
$0.150
$0.600
42.6
$0.200
$0.200
42.5
$0.200
$0.600
42.4
$0.660
$0.900
41.7
$0.070
$0.280
41.4
$0.033
$0.130
41.0
$0.340
$0.390
40.9
$0.800
$4.000
40.8
$0.700
$0.800
40.2
$0.300
$0.300
40.1
$0.100
$0.200
40.0
$0.050
$0.200
39.9
$0.200
$0.200
39.8
$2.000
$8.000
39.0
$0.510
$0.740
37.9
$0.250
$1.250
37.4
$0.035
$0.140
35.8
$0.500
$1.500
35.1
$0.040
$0.130
34.9
$0.010
$0.020
34.4
$0.030
$0.090
33.9
$1.200
$1.200
33.2
$0.050
$0.200
32.8
$0.500
$1.000
29.7
$0.030
$0.040
29.6
$0.020
$0.040
29.6
$0.140
$0.420
29.2
$0.040
$0.080
29.1
$0.020
$0.050
25.9
$0.030
$0.050
25.5
$0.049
$0.049
22.1
$0.020
$0.020
19.6

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

8 Ways to Use Fewer Tokens

About GPQA

Graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry. Questions are Google-proof and extremely difficult.

This leaderboard shows all models with GPQA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry. Questions are Google-proof and extremely difficult.
As of April 4, 2026, GPT-5.4 leads the GPQA leaderboard with a score of 92.0. Rankings change as new models are released and evaluated.
Currently 242 models have been evaluated on GPQA, with an average score of 65.0 and standard deviation of 17.2.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.