Price Per TokenPrice Per Token

TerminalBench Leaderboard

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.

Data from Artificial Analysis

As of June 2, 2026, the top-scoring model on TerminalBench is GPT-5.4 at 57.6%, followed by Claude Opus 4.7 at 54.5% and Gemini 3.1 Pro Preview at 53.8%. 248 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

248

Best Score

57.6

Average

19.2

Std Dev

15.3

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
TerminalBench
Actions
$2.500
$15.000
57.6
$5.000
$25.000
54.5
$2.000
$12.000
53.8
$3.000
$15.000
53.0
$1.750
$14.000
53.0
$5.000
$25.000
51.5
$1.250
$3.750
50.8
$5.000
$25.000
48.5
$5.000
$25.000
47.0
$10.500
$84.000
47.0
$5.000
$25.000
46.2
$3.000
$15.000
46.2
$0.435
$0.870
46.2
$1.500
$9.000
46.2
$1.250
$10.000
45.5
$0.684
$3.400
43.9
$1.750
$14.000
43.2
$0.600
$2.080
43.2
$2.500
$15.000
43.2
$0.980
$3.080
43.2
$0.435
$0.870
43.2
$3.000
$15.000
42.4
$2.000
$12.000
41.7
$5.000
$25.000
40.9
$0.390
$0.900
40.9
$1.500
$9.000
40.9
$0.600
$2.080
39.4
$0.279
$1.200
39.4
$1.500
$9.000
39.4
$0.500
$3.000
38.6
$3.000
$15.000
37.9
$1.250
$10.000
37.9
$1.250
$10.000
37.9
$2.500
$15.000
37.9
$0.684
$3.400
37.9
$2.000
$8.000
37.1
$1.750
$14.000
37.1
$0.120
$0.370
36.4
$0.435
$0.870
36.4
$3.000
$15.000
35.6
$0.229
$0.343
35.6
$0.390
$0.900
35.6
$0.980
$3.080
35.6
$0.435
$0.870
35.6
$0.098
$0.197
35.6
$1.250
$10.000
34.8
$0.270
$0.400
34.8
$0.400
$1.900
34.8
$0.150
$1.150
34.8
$0.400
$2.000
34.8
$0.140
$0.900
34.8
$15.000
$75.000
34.3
$2.000
$12.000
34.1
$0.098
$0.197
34.1
$0.250
$2.000
33.3
$0.250
$2.000
33.3
$1.200
$4.000
33.3
$1.250
$10.000
32.6
$0.229
$0.343
32.6
$0.090
$0.300
32.6
$0.195
$0.900
32.6
$1.200
$4.000
32.6
$0.270
$0.950
31.8
$1.750
$14.000
31.8
$0.500
$3.000
31.8
$0.400
$1.540
31.8
$0.195
$0.900
31.8
$3.000
$15.000
31.1
$15.000
$75.000
31.1
$0.550
$2.200
31.1
$0.270
$0.410
31.1
$0.100
$0.300
31.1
$0.260
$0.900
31.1
$0.270
$0.950
30.3
$0.400
$1.540
30.3
$0.260
$0.900
29.5
$0.250
$2.000
28.8
$3.000
$15.000
28.8
$0.430
$1.740
28.8
$0.290
$0.950
28.8
$0.100
$0.300
28.0
$3.000
$15.000
27.3
$1.000
$5.000
27.3
$1.000
$5.000
27.3
$0.090
$0.300
27.3
$1.000
$10.000
26.5
$1.250
$10.000
26.5
$0.140
$0.900
26.5
$0.250
$0.750
26.5
$0.255
$1.000
25.8
$0.100
$0.300
25.8
$0.140
$0.900
25.8
$0.210
$0.790
25.0
$0.270
$0.410
25.0
$0.430
$1.740
25.0
$0.060
$0.300
25.0
$0.210
$0.790
24.2
$0.000
$0.000
24.2
$0.780
$3.900
24.2
$0.250
$1.500
24.2
$0.040
$0.150
24.2
$0.039
$0.100
23.5
$0.600
$2.500
23.5
$1.250
$10.000
22.7
$0.600
$2.200
22.0
$0.060
$0.400
22.0
$3.000
$15.000
21.2
$3.000
$15.000
21.2
$0.010
$0.030
21.2
$0.125
$0.850
20.4
$0.780
$3.900
20.4
$0.780
$3.900
19.7
$0.220
$0.900
18.9
$0.200
$0.500
18.9
$0.400
$0.900
18.9
$0.400
$1.900
18.9
$1.250
$10.000
18.2
$0.110
$0.800
18.2
$0.040
$0.150
18.2
$0.250
$0.500
17.4
$0.050
$0.400
17.4
$0.200
$1.500
17.4
$0.780
$3.900
17.4
$0.900
$0.900
16.7
$0.500
$2.150
15.9
$0.550
$2.200
15.9
$0.200
$0.770
15.2
$0.550
$2.200
15.2
$0.071
$0.100
15.2
$0.070
$0.270
15.2
$0.000
$0.000
14.4
$0.300
$0.900
14.4
$2.000
$8.000
13.6
$0.300
$2.500
13.6
$0.100
$0.100
13.6
$0.050
$0.200
13.6
$0.060
$0.300
13.6
$15.000
$60.000
12.9
$1.250
$10.000
12.9
$0.100
$0.400
12.9
$0.300
$2.500
12.1
$0.050
$0.400
12.1
$0.200
$0.500
12.1
$0.050
$0.200
12.1
$3.000
$15.000
11.4
$1.000
$3.000
11.4
$0.260
$0.900
11.4
$0.029
$0.140
10.6
$0.400
$2.000
10.6
$0.140
$0.900
10.6
$1.000
$3.000
9.8
$0.098
$0.300
9.8
$0.400
$2.000
9.1
$0.207
$0.828
9.1
$0.200
$1.100
9.1
$0.104
$0.416
8.3
$0.100
$0.300
7.6
$0.400
$1.600
7.6
$0.090
$0.780
7.6
$0.100
$0.400
7.6
$0.104
$0.416
7.6
$0.900
$0.900
6.8
$0.550
$2.200
6.8
$0.200
$0.770
6.8
$0.150
$0.600
6.8
$0.080
$0.280
6.8
$0.075
$0.200
6.8
$0.050
$0.400
6.8
$0.600
$1.800
6.8
$0.200
$0.880
6.8
$2.500
$12.500
6.8
$0.300
$2.500
6.8
$2.000
$6.000
6.1
$0.800
$3.200
6.1
$0.550
$2.000
6.1
$1.100
$4.400
6.1
$0.455
$0.900
6.1
$0.455
$0.900
6.1
$0.900
$0.900
6.1
$0.070
$0.280
6.1
$0.043
$0.172
6.1
$0.130
$0.520
6.1
$0.080
$0.200
5.3
$0.039
$0.100
5.3
$0.600
$1.800
5.3
$0.080
$0.400
5.3
$0.130
$0.900
5.3
$0.100
$0.400
5.3
$0.360
$0.400
4.5
$0.900
$0.900
4.5
$0.200
$0.200
4.5
$0.100
$0.400
4.5
$0.029
$0.140
4.5
$0.130
$0.400
4.5
$0.200
$0.200
4.5
$0.150
$0.150
4.5
$0.200
$0.200
4.5
$0.065
$0.140
3.8
$0.100
$0.400
3.8
$0.080
$0.160
3.8
$0.100
$0.400
3.8
$0.080
$0.200
3.8
$0.400
$2.000
3.8
$0.100
$0.400
3.8
$0.117
$1.365
3.8
$0.060
$0.400
3.8
$0.340
$0.390
3.0
$0.100
$0.320
3.0
$0.080
$0.280
3.0
$0.400
$2.200
3.0
$0.300
$0.900
3.0
$0.800
$4.000
2.3
$0.050
$0.200
2.3
$0.050
$0.200
2.3
$0.080
$0.280
2.3
$0.060
$0.120
2.3
$0.400
$2.200
2.3
$0.100
$0.400
2.3
$2.000
$8.000
2.3
$0.080
$0.200
2.3
$0.035
$0.140
1.5
$0.700
$0.800
1.5
$0.080
$0.300
1.5
$0.200
$0.200
1.5
$0.040
$0.160
1.5
$0.150
$0.500
1.5
$0.250
$1.250
0.8
$0.510
$0.740
0.8
$0.020
$0.050
0.8
$0.060
$0.060
0.8
$0.060
$0.240
0.8
$2.500
$10.000
0.8
$0.040
$0.130
0.8
$0.040
$0.080
0.8
$0.040
$0.160
0.8
$0.120
$0.200
0.8
$0.040
$0.040
-
$0.020
$0.020
-
$0.050
$0.200
-
$0.130
$0.400
-
$0.100
$0.400
-
$0.100
$0.400
-
$0.010
$0.020
-
$0.200
$0.200
-
$0.100
$0.200
-
$0.150
$0.500
-
$0.200
$0.600
-
$0.200
$0.200
-

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About TerminalBench

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.

This leaderboard shows all models with TerminalBench benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.
As of June 2, 2026, GPT-5.4 leads the TerminalBench leaderboard with a score of 57.6. Rankings change as new models are released and evaluated.
Currently 248 models have been evaluated on TerminalBench, with an average score of 19.2 and standard deviation of 15.3.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.