Price Per TokenPrice Per Token

TerminalBench Leaderboard

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.

Data from Artificial Analysis

Models

173

Best Score

48.5

Average

14.9

Std Dev

13.2

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
TerminalBench
Actions
$5.000
$25.000
48.5
$5.000
$25.000
47.0
$21.000
$168.000
47.0
$5.000
$25.000
46.2
$3.000
$15.000
46.2
$0.800
$2.560
43.2
$2.000
$12.000
41.7
$5.000
$25.000
40.9
$0.800
$2.560
39.4
$0.500
$3.000
38.6
$3.000
$15.000
37.9
$1.250
$10.000
37.9
$2.000
$8.000
37.1
$3.000
$15.000
35.6
$0.250
$0.400
35.6
$1.250
$10.000
34.8
$0.400
$1.200
34.8
$0.450
$2.200
34.8
$15.000
$75.000
34.3
$0.250
$2.000
33.3
$0.250
$2.000
33.3
$1.250
$10.000
32.6
$0.250
$0.400
32.6
$0.210
$0.790
31.8
$1.750
$14.000
31.8
$0.500
$3.000
31.8
$0.300
$1.400
31.8
$3.000
$15.000
31.1
$15.000
$75.000
31.1
$0.270
$0.410
31.1
$0.090
$0.290
31.1
$0.210
$0.790
30.3
$0.300
$1.400
30.3
$3.000
$15.000
28.8
$0.350
$1.710
28.8
$0.270
$0.950
28.8
$0.090
$0.290
28.0
$3.000
$15.000
27.3
$1.000
$5.000
27.3
$1.000
$5.000
27.3
$1.250
$10.000
26.5
$0.255
$1.000
25.8
$0.090
$0.290
25.8
$0.150
$0.750
25.0
$0.270
$0.410
25.0
$0.350
$1.710
25.0
$0.150
$0.750
24.2
$0.200
$0.500
24.2
$1.200
$6.000
24.2
$0.039
$0.190
23.5
$0.400
$2.000
23.5
$1.250
$10.000
22.7
$0.550
$2.000
22.0
$0.060
$0.400
22.0
$3.000
$15.000
21.2
$3.000
$15.000
21.2
$0.130
$0.850
20.5
$1.200
$6.000
20.5
$0.220
$1.000
18.9
$0.200
$0.500
18.9
$0.400
$2.000
18.9
$0.450
$2.200
18.9
$0.120
$0.750
18.2
$0.200
$1.500
17.4
$0.450
$2.150
15.9
$0.550
$2.200
15.9
$0.200
$0.770
15.2
$1.100
$4.400
15.2
$0.071
$0.100
15.2
$0.070
$0.270
15.2
$0.200
$0.500
14.4
$0.300
$0.900
14.4
$2.000
$8.000
13.6
$0.300
$2.500
13.6
$0.000
$0.000
13.6
$0.050
$0.200
13.6
$15.000
$60.000
12.9
$0.100
$0.400
12.9
$0.300
$2.500
12.1
$0.050
$0.400
12.1
$0.200
$0.500
12.1
$0.050
$0.200
12.1
$3.000
$15.000
11.4
$1.000
$3.000
11.4
$0.030
$0.140
10.6
$0.400
$2.000
10.6
$1.000
$3.000
9.8
$0.400
$2.000
9.1
$0.207
$0.828
9.1
$0.200
$1.100
9.1
$2.500
$10.000
8.3
$2.500
$10.000
8.3
$0.104
$0.416
8.3
$0.350
$0.560
7.6
$0.400
$1.600
7.6
$0.090
$1.100
7.6
$0.100
$0.400
7.6
$4.000
$4.000
6.8
$1.100
$4.400
6.8
$0.200
$0.770
6.8
$0.150
$0.600
6.8
$0.080
$0.280
6.8
$0.060
$0.180
6.8
$0.600
$1.800
6.8
$0.200
$0.880
6.8
$2.500
$12.500
6.8
$0.300
$2.500
6.8
$2.000
$6.000
6.1
$0.800
$3.200
6.1
$0.700
$2.500
6.1
$1.100
$4.400
6.1
$0.455
$1.820
6.1
$0.455
$1.820
6.1
$0.280
$1.100
6.1
$0.100
$0.300
6.1
$0.090
$0.300
6.1
$0.130
$0.520
6.1
$0.060
$0.240
5.3
$0.600
$1.800
5.3
$0.100
$0.400
5.3
$0.120
$0.390
4.5
$1.200
$1.200
4.5
$0.100
$0.400
4.5
$0.130
$0.400
4.5
$0.200
$0.600
4.5
$0.150
$0.150
4.5
$0.200
$0.200
4.5
$0.060
$0.140
3.8
$0.100
$0.400
3.8
$0.040
$0.150
3.8
$0.100
$0.400
3.8
$0.060
$0.240
3.8
$0.400
$2.000
3.8
$0.100
$0.400
3.8
$0.060
$0.400
3.8
$0.400
$0.400
3.0
$0.100
$0.320
3.0
$0.080
$0.240
3.0
$0.300
$0.900
3.0
$0.800
$4.000
2.3
$0.050
$0.400
2.3
$0.050
$0.400
2.3
$0.080
$0.280
2.3
$0.020
$0.040
2.3
$0.400
$2.200
2.3
$0.100
$0.400
2.3
$2.000
$8.000
2.3
$0.080
$0.500
2.3
$0.035
$0.140
1.5
$0.700
$0.800
1.5
$0.080
$0.300
1.5
$0.040
$0.160
1.5
$0.150
$0.500
1.5
$0.250
$1.250
0.8
$0.510
$0.740
0.8
$0.020
$0.050
0.8
$0.049
$0.049
0.8
$0.060
$0.240
0.8
$2.500
$10.000
0.8
$0.040
$0.130
0.8
$0.040
$0.080
0.8
$0.040
$0.160
0.8
$0.120
$0.200
0.8
$0.030
$0.040
-
$0.027
$0.200
-
$0.050
$0.200
-
$0.130
$0.400
-
$0.010
$0.020
-
$0.200
$0.600
-
$0.100
$0.200
-
$0.150
$0.500
-
$0.200
$0.600
-
$0.200
$0.200
-

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

93 out of our 301 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About TerminalBench

Terminal-based benchmark testing AI ability to interact with command-line interfaces and solve system tasks.

This leaderboard shows all models with TerminalBench benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Advertise with us