Price Per TokenPrice Per Token

Tau2 Leaderboard

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.

Data from Artificial Analysis

As of April 18, 2026, the top-scoring model on Tau2 is GLM-4.7-Flash at 98.8%, followed by GLM-5 Turbo at 98.5% and GLM-5V Turbo at 98.5%. 233 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

233

Best Score

98.8

Average

48.2

Std Dev

29.9

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
Tau2
Actions
$0.060
$0.400
98.8
$1.200
$4.000
98.5
$1.200
$4.000
98.5
$0.720
$2.300
98.2
$0.950
$3.150
97.7
$0.720
$2.300
97.4
$0.950
$3.150
97.1
$0.390
$1.750
95.9
$0.383
$1.720
95.9
$0.390
$0.900
95.6
$2.000
$12.000
95.6
$0.118
$0.950
95.3
$0.090
$0.290
95.0
$0.100
$0.300
94.4
$0.390
$1.750
94.2
$0.195
$0.900
93.9
$0.260
$2.080
93.6
$0.200
$0.500
93.3
$0.090
$0.290
93.3
$0.550
$2.200
93.0
$1.750
$14.000
92.1
$5.000
$25.000
92.1
$0.060
$0.400
91.8
$0.400
$2.000
91.2
$0.260
$0.380
90.6
$0.250
$0.500
90.4
$5.000
$25.000
89.5
$0.163
$0.900
89.2
$0.207
$0.828
88.6
$0.100
$0.300
87.4
$2.000
$12.000
87.1
$0.195
$0.900
87.1
$2.500
$15.000
87.1
$1.250
$10.000
86.8
$0.255
$1.000
86.8
$0.040
$0.150
86.8
$1.250
$10.000
86.5
$5.000
$25.000
86.3
$0.163
$0.900
86.3
$1.750
$14.000
86.0
$0.290
$0.950
85.4
$0.040
$0.150
85.1
$1.250
$10.000
84.8
$10.500
$84.000
84.8
$5.000
$25.000
84.8
$0.300
$1.200
84.8
$0.260
$2.080
84.5
$1.250
$10.000
84.2
$0.090
$0.290
83.9
$0.390
$0.900
83.9
$0.780
$3.900
83.6
$0.780
$3.900
83.6
$1.250
$10.000
83.0
$1.250
$10.000
81.9
$0.383
$1.720
81.3
$2.000
$8.000
80.7
$0.500
$3.000
80.4
$0.150
$0.800
79.5
$3.000
$15.000
79.5
$0.260
$0.380
78.9
$3.000
$15.000
78.9
$3.000
$15.000
78.1
$0.390
$1.740
76.9
$0.200
$1.500
75.7
$3.000
$15.000
75.7
$3.000
$15.000
74.9
$0.780
$3.900
74.3
$0.875
$7.000
74.3
$15.000
$75.000
73.4
$0.400
$2.000
73.4
$15.000
$75.000
71.4
$0.125
$1.000
71.1
$0.250
$0.750
70.8
$3.000
$15.000
70.5
$0.390
$1.740
70.5
$0.125
$1.000
68.4
$2.000
$12.000
68.1
$1.250
$10.000
67.0
$0.039
$0.100
65.8
$0.200
$0.500
65.8
$3.000
$15.000
64.6
$0.200
$0.500
63.7
$0.200
$0.500
63.7
$0.250
$2.000
62.9
$15.000
$60.000
62.6
$0.300
$2.500
62.0
$0.550
$2.200
61.1
$0.030
$0.100
60.2
$0.130
$0.380
59.9
$0.550
$2.200
55.6
$3.000
$15.000
54.7
$1.000
$5.000
54.7
$1.000
$10.000
54.1
$0.260
$0.900
54.1
$0.130
$0.600
53.2
$0.200
$0.800
52.9
$3.000
$15.000
52.3
$0.030
$0.100
50.3
$3.000
$15.000
50.0
$3.000
$15.000
48.8
$0.200
$0.770
47.1
$2.000
$8.000
47.1
$0.130
$0.850
46.5
$1.250
$10.000
46.5
$0.875
$7.000
46.5
$0.104
$0.416
45.6
$0.039
$0.100
45.0
$0.220
$0.900
43.6
$0.070
$0.350
43.6
$0.500
$3.000
43.3
$0.600
$2.200
43.0
$0.098
$0.300
41.5
$0.050
$0.200
40.9
$0.400
$2.000
40.6
$0.070
$0.350
40.4
$2.500
$12.500
38.3
$0.150
$0.750
37.4
$0.210
$0.790
37.1
$0.210
$0.790
37.1
$2.000
$6.000
36.5
$0.500
$2.150
36.5
$0.050
$0.400
36.5
$0.200
$0.880
35.1
$2.500
$15.000
35.1
$0.150
$0.750
34.8
$0.120
$0.390
34.5
$0.060
$0.200
34.5
$0.070
$0.270
34.5
$0.400
$1.760
34.2
$0.270
$0.410
33.9
$0.270
$0.410
33.9
$0.071
$0.100
33.3
$2.000
$6.000
33.0
$0.780
$3.900
32.7
$1.000
$5.000
32.5
$0.060
$0.200
32.2
$0.300
$2.500
31.6
$0.400
$1.760
31.6
$0.300
$0.900
31.6
$1.100
$4.400
31.3
$0.250
$1.500
31.3
$2.000
$6.000
30.7
$0.100
$0.400
30.7
$0.300
$0.900
30.7
$0.050
$0.400
30.4
$0.100
$0.400
30.4
$0.080
$0.240
29.8
$0.100
$0.400
29.5
$0.075
$0.200
29.5
$0.080
$0.200
29.2
$0.104
$0.416
29.2
$0.550
$2.200
28.7
$0.070
$0.280
28.4
$0.080
$0.300
28.1
$0.100
$0.400
28.1
$0.050
$0.200
27.8
$0.455
$0.900
27.2
$0.200
$0.200
27.2
$0.100
$0.400
26.9
$0.100
$0.320
26.6
$0.200
$0.200
26.6
$1.000
$3.000
26.6
$0.200
$1.100
26.6
$0.150
$0.150
26.6
$0.080
$0.280
26.0
$0.050
$0.400
25.7
$0.200
$0.200
25.4
$0.050
$0.200
25.4
$0.100
$0.300
25.1
$0.100
$0.400
25.1
$0.050
$0.200
24.9
$0.400
$0.900
24.9
$0.800
$4.000
24.6
$0.400
$2.000
24.3
$0.455
$0.900
24.0
$0.040
$0.160
23.4
$0.900
$0.900
23.1
$0.200
$0.770
22.8
$0.600
$1.800
22.5
$0.130
$0.400
22.5
$0.117
$1.365
22.5
$0.080
$0.280
22.2
$1.000
$3.000
22.2
$0.700
$0.800
21.9
$0.040
$0.160
21.9
$0.130
$0.400
21.6
$0.090
$0.780
21.6
$0.200
$0.200
21.3
$0.200
$0.600
21.3
$0.250
$1.250
21.1
$0.030
$0.050
21.1
$0.400
$2.000
19.9
$0.130
$0.600
19.9
$0.050
$0.080
19.6
$0.600
$1.800
19.6
$0.200
$0.200
19.3
$0.900
$0.900
19.0
$0.200
$0.200
19.0
$0.100
$0.400
19.0
$0.130
$0.520
19.0
$0.100
$0.400
18.4
$0.150
$0.600
17.8
$0.060
$0.240
17.5
$0.100
$0.400
17.3
$0.020
$0.050
16.4
$0.080
$0.300
15.5
$0.340
$0.390
15.2
$2.500
$10.000
15.2
$0.300
$2.500
14.9
$0.060
$0.060
14.6
$0.800
$3.200
14.0
$0.035
$0.140
14.0
$2.000
$8.000
13.5
$0.100
$0.200
12.6
$0.550
$2.000
11.4
$0.040
$0.130
10.8
$0.080
$0.160
10.5
$0.010
$0.020
10.5
$0.090
$0.300
10.2
$0.040
$0.080
5.0
$0.060
$0.120
5.0
$0.510
$0.740
-
$0.030
$0.040
-
$0.020
$0.020
-
$0.065
$0.140
-
$0.050
$0.200
-
$0.280
$0.900
-
$1.250
$10.000
-
$0.120
$0.200
-
$0.150
$0.500
-
$0.400
$1.200
-
$0.150
$0.500
-
$0.200
$0.200
-

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About Tau2

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.

This leaderboard shows all models with Tau2 benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Tau2 benchmark testing multi-turn agent capabilities in airline and retail domains.
As of April 18, 2026, GLM-4.7-Flash leads the Tau2 leaderboard with a score of 98.8. Rankings change as new models are released and evaluated.
Currently 233 models have been evaluated on Tau2, with an average score of 48.2 and standard deviation of 29.9.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.