Price Per TokenPrice Per Token

TerminalBench Leaderboard

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.

Data from Artificial Analysis

As of April 18, 2026, the top-scoring model on TerminalBench is GPT-5.4 at 57.6%, followed by Gemini 3.1 Pro Preview at 53.8% and Claude Sonnet 4.6 at 53.0%. 230 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

230

Best Score

57.6

Average

17.6

Std Dev

14.5

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
TerminalBench
Actions
$2.500
$15.000
57.6
$2.000
$12.000
53.8
$3.000
$15.000
53.0
$1.750
$14.000
53.0
$5.000
$25.000
48.5
$5.000
$25.000
47.0
$10.500
$84.000
47.0
$5.000
$25.000
46.2
$3.000
$15.000
46.2
$1.250
$10.000
45.5
$0.875
$7.000
43.2
$0.720
$2.300
43.2
$0.950
$3.150
43.2
$3.000
$15.000
42.4
$2.000
$12.000
41.7
$5.000
$25.000
40.9
$0.390
$0.900
40.9
$0.720
$2.300
39.4
$0.300
$1.200
39.4
$0.500
$3.000
38.6
$3.000
$15.000
37.9
$1.250
$10.000
37.9
$1.250
$10.000
37.9
$2.500
$15.000
37.9
$2.000
$8.000
37.1
$1.750
$14.000
37.1
$0.130
$0.380
36.4
$3.000
$15.000
35.6
$0.260
$0.380
35.6
$0.390
$0.900
35.6
$0.950
$3.150
35.6
$1.250
$10.000
34.8
$0.400
$1.200
34.8
$0.383
$1.720
34.8
$0.118
$0.950
34.8
$0.400
$2.000
34.8
$15.000
$75.000
34.3
$2.000
$12.000
34.1
$0.125
$1.000
33.3
$0.250
$2.000
33.3
$1.200
$4.000
33.3
$1.250
$10.000
32.6
$0.260
$0.380
32.6
$0.100
$0.300
32.6
$0.195
$0.900
32.6
$1.200
$4.000
32.6
$0.210
$0.790
31.8
$0.875
$7.000
31.8
$0.500
$3.000
31.8
$0.390
$1.750
31.8
$0.195
$0.900
31.8
$3.000
$15.000
31.1
$15.000
$75.000
31.1
$0.550
$2.200
31.1
$0.270
$0.410
31.1
$0.090
$0.290
31.1
$0.260
$2.080
31.1
$0.210
$0.790
30.3
$0.390
$1.750
30.3
$0.260
$2.080
29.5
$0.125
$1.000
28.8
$3.000
$15.000
28.8
$0.390
$1.740
28.8
$0.290
$0.950
28.8
$0.090
$0.290
28.0
$3.000
$15.000
27.3
$1.000
$5.000
27.3
$1.000
$5.000
27.3
$0.100
$0.300
27.3
$1.000
$10.000
26.5
$1.250
$10.000
26.5
$0.163
$0.900
26.5
$0.250
$0.750
26.5
$0.255
$1.000
25.8
$0.090
$0.290
25.8
$0.150
$0.750
25.0
$0.270
$0.410
25.0
$0.390
$1.740
25.0
$0.070
$0.350
25.0
$0.150
$0.750
24.2
$0.200
$0.500
24.2
$0.780
$3.900
24.2
$0.250
$1.500
24.2
$0.040
$0.150
24.2
$0.039
$0.100
23.5
$0.400
$2.000
23.5
$1.250
$10.000
22.7
$0.600
$2.200
22.0
$0.060
$0.400
22.0
$3.000
$15.000
21.2
$3.000
$15.000
21.2
$0.130
$0.850
20.5
$0.780
$3.900
20.5
$0.780
$3.900
19.7
$0.220
$0.900
18.9
$0.200
$0.500
18.9
$0.400
$0.900
18.9
$0.383
$1.720
18.9
$1.250
$10.000
18.2
$0.150
$0.800
18.2
$0.040
$0.150
18.2
$0.250
$0.500
17.4
$0.050
$0.400
17.4
$0.200
$1.500
17.4
$0.780
$3.900
17.4
$0.900
$0.900
16.7
$0.500
$2.150
15.9
$0.550
$2.200
15.9
$0.200
$0.770
15.2
$0.550
$2.200
15.2
$0.071
$0.100
15.2
$0.070
$0.270
15.2
$0.200
$0.500
14.4
$0.300
$0.900
14.4
$2.000
$8.000
13.6
$0.300
$2.500
13.6
$0.130
$0.600
13.6
$0.050
$0.200
13.6
$0.070
$0.350
13.6
$15.000
$60.000
12.9
$1.250
$10.000
12.9
$0.100
$0.400
12.9
$0.300
$2.500
12.1
$0.050
$0.400
12.1
$0.200
$0.500
12.1
$0.050
$0.200
12.1
$3.000
$15.000
11.4
$1.000
$3.000
11.4
$0.260
$0.900
11.4
$0.030
$0.100
10.6
$0.400
$2.000
10.6
$0.163
$0.900
10.6
$1.000
$3.000
9.8
$0.098
$0.300
9.8
$0.400
$2.000
9.1
$0.207
$0.828
9.1
$0.200
$1.100
9.1
$0.104
$0.416
8.3
$0.100
$0.300
7.6
$0.200
$0.800
7.6
$0.090
$0.780
7.6
$0.100
$0.400
7.6
$0.104
$0.416
7.6
$0.900
$0.900
6.8
$0.550
$2.200
6.8
$0.200
$0.770
6.8
$0.150
$0.600
6.8
$0.080
$0.280
6.8
$0.075
$0.200
6.8
$0.050
$0.400
6.8
$0.600
$1.800
6.8
$0.200
$0.880
6.8
$2.500
$12.500
6.8
$0.300
$2.500
6.8
$2.000
$6.000
6.1
$0.800
$3.200
6.1
$0.550
$2.000
6.1
$1.100
$4.400
6.1
$0.455
$0.900
6.1
$0.455
$0.900
6.1
$0.280
$0.900
6.1
$0.070
$0.280
6.1
$0.090
$0.300
6.1
$0.130
$0.520
6.1
$0.060
$0.200
5.3
$0.039
$0.100
5.3
$0.600
$1.800
5.3
$0.080
$0.300
5.3
$0.130
$0.600
5.3
$0.100
$0.400
5.3
$0.120
$0.390
4.5
$0.900
$0.900
4.5
$0.200
$0.200
4.5
$0.100
$0.400
4.5
$0.030
$0.100
4.5
$0.130
$0.400
4.5
$0.200
$0.200
4.5
$0.150
$0.150
4.5
$0.200
$0.200
4.5
$0.065
$0.140
3.8
$0.100
$0.400
3.8
$0.080
$0.160
3.8
$0.100
$0.400
3.8
$0.060
$0.200
3.8
$0.400
$2.000
3.8
$0.100
$0.400
3.8
$0.117
$1.365
3.8
$0.060
$0.400
3.8
$0.340
$0.390
3.0
$0.100
$0.320
3.0
$0.080
$0.240
3.0
$0.400
$1.760
3.0
$0.300
$0.900
3.0
$0.800
$4.000
2.3
$0.050
$0.200
2.3
$0.050
$0.200
2.3
$0.080
$0.280
2.3
$0.060
$0.120
2.3
$0.400
$1.760
2.3
$0.100
$0.400
2.3
$2.000
$8.000
2.3
$0.080
$0.200
2.3
$0.035
$0.140
1.5
$0.700
$0.800
1.5
$0.080
$0.300
1.5
$0.200
$0.200
1.5
$0.040
$0.160
1.5
$0.150
$0.500
1.5
$0.250
$1.250
0.8
$0.510
$0.740
0.8
$0.020
$0.050
0.8
$0.060
$0.060
0.8
$0.060
$0.240
0.8
$2.500
$10.000
0.8
$0.040
$0.130
0.8
$0.040
$0.080
0.8
$0.040
$0.160
0.8
$0.120
$0.200
0.8
$0.030
$0.040
-
$0.020
$0.020
-
$0.050
$0.200
-
$0.130
$0.400
-
$0.100
$0.400
-
$0.100
$0.400
-
$0.010
$0.020
-
$0.200
$0.200
-
$0.100
$0.200
-
$0.150
$0.500
-
$0.200
$0.600
-
$0.200
$0.200
-

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About TerminalBench

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.

This leaderboard shows all models with TerminalBench benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.
As of April 18, 2026, GPT-5.4 leads the TerminalBench leaderboard with a score of 57.6. Rankings change as new models are released and evaluated.
Currently 230 models have been evaluated on TerminalBench, with an average score of 17.6 and standard deviation of 14.5.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.