Price Per TokenPrice Per Token

Tau2 Leaderboard

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.

Data from Artificial Analysis

Models

178

Best Score

98.8

Average

41.7

Std Dev

28.0

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
Tau2
Actions
$0.060
$0.400
98.8
$0.800
$2.560
98.2
$0.800
$2.560
97.4
$0.300
$1.400
95.9
$0.450
$2.200
95.9
$0.090
$0.290
95.0
$0.300
$1.400
94.2
$0.200
$0.500
93.3
$0.090
$0.290
93.3
$5.000
$25.000
92.1
$0.060
$0.400
91.8
$0.250
$0.400
90.6
$5.000
$25.000
89.5
$0.207
$0.828
88.6
$2.000
$12.000
87.1
$1.250
$10.000
86.8
$0.255
$1.000
86.8
$5.000
$25.000
86.3
$0.270
$0.950
85.4
$1.250
$10.000
84.8
$21.000
$168.000
84.8
$5.000
$25.000
84.8
$0.090
$0.290
83.9
$1.200
$6.000
83.6
$1.250
$10.000
83.0
$0.450
$2.200
81.3
$2.000
$8.000
80.7
$0.500
$3.000
80.4
$0.120
$0.750
79.5
$3.000
$15.000
79.5
$0.250
$0.400
78.9
$3.000
$15.000
78.1
$0.350
$1.710
76.9
$0.200
$1.500
75.7
$3.000
$15.000
74.9
$1.200
$6.000
74.3
$15.000
$75.000
73.4
$0.400
$2.000
73.4
$15.000
$75.000
71.4
$3.000
$15.000
70.5
$0.350
$1.710
70.5
$0.250
$2.000
68.4
$0.039
$0.190
65.8
$0.200
$0.500
65.8
$3.000
$15.000
64.6
$0.200
$0.500
63.7
$0.200
$0.500
63.7
$0.250
$2.000
62.9
$15.000
$60.000
62.6
$0.300
$2.500
62.0
$0.550
$2.200
61.1
$0.030
$0.140
60.2
$1.100
$4.400
55.6
$3.000
$15.000
54.7
$1.000
$5.000
54.7
$1.250
$10.000
54.1
$0.000
$0.000
53.2
$0.400
$1.600
52.9
$3.000
$15.000
52.3
$3.000
$15.000
50.0
$3.000
$15.000
48.8
$0.200
$0.770
47.1
$2.000
$8.000
47.1
$0.130
$0.850
46.5
$1.250
$10.000
46.5
$1.750
$14.000
46.5
$0.220
$1.000
43.6
$0.500
$3.000
43.3
$0.550
$2.000
43.0
$0.050
$0.200
40.9
$0.400
$2.000
40.6
$2.500
$12.500
38.3
$0.150
$0.750
37.4
$0.210
$0.790
37.1
$0.210
$0.790
37.1
$2.000
$6.000
36.5
$0.450
$2.150
36.5
$0.050
$0.400
36.5
$0.200
$0.880
35.1
$0.150
$0.750
34.8
$0.120
$0.390
34.5
$0.060
$0.240
34.5
$0.070
$0.270
34.5
$0.270
$0.410
33.9
$0.270
$0.410
33.9
$0.071
$0.100
33.3
$2.000
$6.000
33.0
$1.000
$5.000
32.5
$0.060
$0.240
32.2
$0.300
$2.500
31.6
$0.400
$2.200
31.6
$0.300
$0.900
31.6
$1.100
$4.400
31.3
$2.000
$6.000
30.7
$0.100
$0.400
30.7
$0.300
$0.900
30.7
$0.100
$0.400
30.4
$0.080
$0.240
29.8
$0.100
$0.400
29.5
$0.060
$0.180
29.5
$0.080
$0.500
29.2
$0.104
$0.416
29.2
$2.500
$10.000
28.9
$1.100
$4.400
28.7
$0.100
$0.300
28.4
$0.100
$0.400
28.1
$0.050
$0.400
27.8
$0.455
$1.820
27.2
$0.200
$0.200
27.2
$0.100
$0.320
26.6
$1.000
$3.000
26.6
$0.200
$1.100
26.6
$0.150
$0.150
26.6
$0.080
$0.280
26.0
$0.050
$0.200
25.4
$2.500
$10.000
25.1
$0.350
$0.560
25.1
$0.100
$0.400
25.1
$0.050
$0.400
24.9
$0.400
$2.000
24.9
$0.800
$4.000
24.6
$0.400
$2.000
24.3
$0.455
$1.820
24.0
$0.040
$0.160
23.4
$1.200
$1.200
23.1
$0.200
$0.770
22.8
$0.600
$1.800
22.5
$0.130
$0.400
22.5
$0.080
$0.280
22.2
$1.000
$3.000
22.2
$0.700
$0.800
21.9
$0.040
$0.160
21.9
$0.130
$0.400
21.6
$0.090
$1.100
21.6
$0.200
$0.600
21.3
$0.200
$0.600
21.3
$0.250
$1.250
21.1
$0.051
$0.340
21.1
$0.400
$2.000
19.9
$0.050
$0.080
19.6
$0.600
$1.800
19.6
$0.200
$0.600
19.3
$4.000
$4.000
19.0
$0.100
$0.400
19.0
$0.130
$0.520
19.0
$0.100
$0.400
18.4
$0.150
$0.600
17.8
$0.060
$0.240
17.5
$0.100
$0.400
17.3
$0.020
$0.050
16.4
$0.080
$0.300
15.5
$0.400
$0.400
15.2
$2.500
$10.000
15.2
$0.300
$2.500
14.9
$0.049
$0.049
14.6
$0.800
$3.200
14.0
$0.035
$0.140
14.0
$2.000
$8.000
13.5
$0.100
$0.200
12.6
$0.700
$2.500
11.4
$0.040
$0.130
10.8
$0.040
$0.150
10.5
$0.010
$0.020
10.5
$0.090
$0.300
10.2
$0.040
$0.080
5.0
$0.020
$0.040
5.0
$0.510
$0.740
-
$0.030
$0.040
-
$0.200
$0.200
-
$0.027
$0.200
-
$0.060
$0.140
-
$0.050
$0.200
-
$0.280
$1.100
-
$0.120
$0.200
-
$0.150
$0.500
-
$0.400
$1.200
-
$0.150
$0.500
-
$0.200
$0.200
-

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

93 out of our 301 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About Tau2

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.

This leaderboard shows all models with Tau2 benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Advertise with us