Price Per TokenPrice Per Token

Tau2 Leaderboard

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.

Data from Artificial Analysis

As of June 2, 2026, the top-scoring model on Tau2 is GLM-4.7-Flash at 98.8%, followed by GLM-5 Turbo at 98.5% and GLM-5V Turbo at 98.5%. 251 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

251

Best Score

98.8

Average

51.0

Std Dev

30.7

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
Tau2
Actions
$0.060
$0.400
98.8
$1.200
$4.000
98.5
$1.200
$4.000
98.5
$0.600
$2.080
98.3
$0.980
$3.080
97.7
$0.600
$2.080
97.4
$0.980
$3.080
97.1
$0.435
$0.870
96.2
$0.400
$1.540
95.9
$0.400
$1.900
95.9
$0.684
$3.400
95.9
$0.390
$0.900
95.6
$2.000
$12.000
95.6
$1.500
$9.000
95.6
$0.150
$1.150
95.3
$0.140
$0.900
95.3
$1.500
$9.000
95.3
$0.100
$0.300
95.0
$0.098
$0.197
95.0
$1.250
$3.750
94.7
$0.090
$0.300
94.4
$0.098
$0.197
94.4
$0.400
$1.540
94.2
$0.435
$0.870
94.2
$0.195
$0.900
93.9
$0.684
$3.400
93.9
$0.260
$0.900
93.6
$0.000
$0.000
93.3
$0.100
$0.300
93.3
$0.550
$2.200
93.0
$1.750
$14.000
92.1
$5.000
$25.000
92.1
$0.060
$0.400
91.8
$0.400
$2.000
91.2
$0.435
$0.870
91.2
$0.229
$0.343
90.6
$0.250
$0.500
90.3
$5.000
$25.000
89.5
$0.140
$0.900
89.2
$0.207
$0.828
88.6
$5.000
$25.000
88.6
$0.090
$0.300
87.4
$2.000
$12.000
87.1
$0.195
$0.900
87.1
$2.500
$15.000
87.1
$1.250
$10.000
86.8
$0.255
$1.000
86.8
$0.040
$0.150
86.8
$1.250
$10.000
86.6
$5.000
$25.000
86.3
$0.140
$0.900
86.3
$1.750
$14.000
86.0
$0.010
$0.030
86.0
$0.290
$0.950
85.4
$0.040
$0.150
85.1
$0.140
$0.900
85.1
$1.250
$10.000
84.8
$10.500
$84.000
84.8
$5.000
$25.000
84.8
$0.279
$1.200
84.8
$0.260
$0.900
84.5
$1.250
$10.000
84.2
$0.100
$0.300
83.9
$0.390
$0.900
83.9
$0.780
$3.900
83.6
$0.780
$3.900
83.6
$1.250
$10.000
83.0
$1.250
$10.000
81.9
$0.400
$1.900
81.3
$2.000
$8.000
80.7
$0.500
$3.000
80.4
$0.110
$0.800
79.5
$3.000
$15.000
79.5
$0.229
$0.343
79.0
$3.000
$15.000
79.0
$3.000
$15.000
78.1
$0.430
$1.740
76.9
$0.200
$1.500
75.7
$3.000
$15.000
75.7
$3.000
$15.000
74.9
$2.500
$15.000
74.6
$0.780
$3.900
74.3
$1.750
$14.000
74.3
$5.000
$25.000
74.0
$15.000
$75.000
73.4
$0.600
$2.500
73.4
$0.435
$0.870
72.5
$15.000
$75.000
71.4
$0.250
$2.000
71.0
$0.250
$0.750
70.8
$3.000
$15.000
70.5
$0.430
$1.740
70.5
$0.250
$2.000
68.4
$2.000
$12.000
68.1
$1.250
$10.000
67.0
$0.039
$0.100
65.8
$0.200
$0.500
65.8
$3.000
$15.000
64.6
$0.200
$0.500
63.7
$0.000
$0.000
63.7
$0.250
$2.000
62.9
$15.000
$60.000
62.6
$0.300
$2.500
62.0
$0.550
$2.200
61.1
$0.029
$0.140
60.2
$0.120
$0.370
59.9
$1.500
$9.000
58.8
$0.550
$2.200
55.6
$3.000
$15.000
54.7
$1.000
$5.000
54.7
$1.000
$10.000
54.1
$0.260
$0.900
54.1
$0.100
$0.100
53.2
$0.400
$1.600
52.9
$3.000
$15.000
52.3
$0.029
$0.140
50.3
$3.000
$15.000
50.0
$3.000
$15.000
48.8
$0.200
$0.770
47.1
$2.000
$8.000
47.1
$0.125
$0.850
46.5
$1.250
$10.000
46.5
$1.750
$14.000
46.5
$0.104
$0.416
45.6
$0.039
$0.100
45.0
$0.220
$0.900
43.6
$0.060
$0.300
43.6
$0.500
$3.000
43.3
$0.600
$2.200
43.0
$0.098
$0.300
41.5
$0.050
$0.200
40.9
$0.400
$2.000
40.6
$0.060
$0.300
40.4
$2.500
$12.500
38.3
$0.210
$0.790
37.4
$0.270
$0.950
37.1
$0.270
$0.950
37.1
$2.000
$6.000
36.5
$0.500
$2.150
36.5
$0.050
$0.400
36.5
$0.200
$0.880
35.1
$2.500
$15.000
35.1
$0.210
$0.790
34.8
$0.360
$0.400
34.5
$0.080
$0.200
34.5
$0.070
$0.270
34.5
$0.400
$2.200
34.2
$0.270
$0.410
33.9
$0.270
$0.410
33.9
$0.071
$0.100
33.3
$2.000
$6.000
33.0
$0.780
$3.900
32.8
$1.000
$5.000
32.5
$0.080
$0.200
32.2
$0.300
$2.500
31.6
$0.400
$2.200
31.6
$0.300
$0.900
31.6
$1.100
$4.400
31.3
$0.250
$1.500
31.3
$2.000
$6.000
30.7
$0.100
$0.400
30.7
$0.300
$0.900
30.7
$0.050
$0.400
30.4
$0.100
$0.400
30.4
$0.080
$0.280
29.8
$0.100
$0.400
29.5
$0.075
$0.200
29.5
$0.080
$0.200
29.2
$0.104
$0.416
29.2
$0.550
$2.200
28.6
$0.070
$0.280
28.4
$0.080
$0.400
28.1
$0.100
$0.400
28.1
$0.050
$0.200
27.8
$0.455
$0.900
27.2
$0.200
$0.200
27.2
$0.100
$0.400
26.9
$0.100
$0.320
26.6
$0.200
$0.200
26.6
$1.000
$3.000
26.6
$0.200
$1.100
26.6
$0.150
$0.150
26.6
$0.080
$0.280
26.0
$0.050
$0.400
25.7
$0.200
$0.200
25.4
$0.050
$0.200
25.4
$0.100
$0.300
25.1
$0.100
$0.400
25.1
$0.050
$0.200
24.9
$0.400
$0.900
24.9
$0.800
$4.000
24.6
$0.400
$2.000
24.3
$0.455
$0.900
24.0
$0.040
$0.160
23.4
$0.900
$0.900
23.1
$0.200
$0.770
22.8
$0.600
$1.800
22.5
$0.130
$0.400
22.5
$0.117
$1.365
22.5
$0.080
$0.280
22.2
$1.000
$3.000
22.2
$0.700
$0.800
21.9
$0.040
$0.160
21.9
$0.130
$0.400
21.6
$0.090
$0.780
21.6
$0.200
$0.200
21.3
$0.200
$0.600
21.3
$0.250
$1.250
21.1
$0.030
$0.050
21.1
$0.400
$2.000
19.9
$0.130
$0.900
19.9
$0.050
$0.080
19.6
$0.600
$1.800
19.6
$0.200
$0.200
19.3
$0.900
$0.900
19.0
$0.200
$0.200
19.0
$0.100
$0.400
19.0
$0.130
$0.520
19.0
$0.100
$0.400
18.4
$0.150
$0.600
17.8
$0.060
$0.240
17.5
$0.100
$0.400
17.3
$0.020
$0.050
16.4
$0.080
$0.300
15.5
$0.340
$0.390
15.2
$2.500
$10.000
15.2
$0.300
$2.500
14.9
$0.060
$0.060
14.6
$0.800
$3.200
14.0
$0.035
$0.140
14.0
$2.000
$8.000
13.5
$0.100
$0.200
12.6
$0.550
$2.000
11.4
$0.040
$0.130
10.8
$0.080
$0.160
10.5
$0.010
$0.020
10.5
$0.043
$0.172
10.2
$0.040
$0.080
5.0
$0.060
$0.120
5.0
$0.510
$0.740
-
$0.040
$0.040
-
$0.020
$0.020
-
$0.065
$0.140
-
$0.050
$0.200
-
$0.900
$0.900
-
$1.250
$10.000
-
$0.120
$0.200
-
$0.150
$0.500
-
$0.270
$0.400
-
$0.150
$0.500
-
$0.200
$0.200
-

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About Tau2

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.

This leaderboard shows all models with Tau2 benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.
As of June 2, 2026, GLM-4.7-Flash leads the Tau2 leaderboard with a score of 98.8. Rankings change as new models are released and evaluated.
Currently 251 models have been evaluated on Tau2, with an average score of 51.0 and standard deviation of 30.7.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.