Price Per TokenPrice Per Token

IFBench Leaderboard

Instruction Following Benchmark measuring LLM ability to adhere to nuanced writing constraints and formatting requirements.

Data from Artificial Analysis

As of May 20, 2026, the top-scoring model on IFBench is MiMo v2.5 Pro at 79.9%, followed by DeepSeek V4 Flash (Non-Reasoning) at 79.2% and Qwen3.5 397B A17B at 78.8%. 253 models have been evaluated on this benchmark.

Last updated: May 20, 2026

Models

253

Best Score

79.9

Average

47.9

Std Dev

15.4

Categories
Instruction Following
Provider
Model
Input $/M
Output $/M
IFBench
Actions
$1.000
$3.000
79.9
$0.112
$0.224
79.2
$0.390
$0.900
78.8
$0.500
$3.000
78.0
$1.750
$14.000
77.6
$0.250
$1.500
77.2
$2.000
$12.000
77.1
$0.435
$0.870
76.5
$0.980
$3.080
76.3
$0.730
$3.400
76.0
$0.260
$0.900
75.7
$0.279
$1.200
75.7
$0.195
$0.900
75.6
$0.120
$0.370
75.6
$0.250
$2.000
75.4
$10.500
$84.000
75.4
$1.750
$14.000
75.4
$1.250
$10.000
74.2
$2.500
$15.000
74.0
$1.200
$4.000
73.2
$1.250
$10.000
73.1
$0.625
$5.000
72.9
$0.140
$0.900
72.5
$0.060
$0.300
72.5
$0.255
$1.000
72.3
$0.600
$1.920
72.3
$0.100
$0.300
71.8
$0.150
$1.150
71.6
$2.000
$8.000
71.4
$0.250
$2.000
71.2
$0.050
$0.200
71.1
$0.780
$3.900
70.8
$1.250
$10.000
70.6
$2.000
$12.000
70.4
$15.000
$60.000
70.3
$0.400
$1.900
70.2
$1.250
$10.000
70.0
$0.290
$0.950
69.9
$0.250
$0.750
69.8
$0.039
$0.180
69.0
$1.100
$4.400
68.7
$0.207
$0.828
68.4
$0.550
$2.200
68.1
$0.250
$2.000
67.9
$0.400
$1.750
67.9
$0.050
$0.400
67.5
$1.100
$4.400
67.1
$0.040
$0.150
66.7
$1.250
$10.000
66.6
$0.100
$0.300
66.5
$0.150
$0.500
66.0
$0.050
$0.400
65.9
$2.500
$15.000
65.9
$0.875
$7.000
65.2
$0.030
$0.140
65.1
$0.100
$0.300
64.6
$0.150
$0.900
64.3
$0.100
$0.300
64.2
$0.287
$0.431
63.9
$1.200
$4.000
61.1
$0.060
$0.400
60.8
$0.098
$0.300
60.7
$0.252
$0.378
60.7
$0.104
$0.416
59.4
$5.000
$25.000
58.6
$0.039
$0.180
58.3
$5.000
$25.000
58.0
$0.030
$0.140
57.8
$0.010
$0.030
57.4
$3.000
$15.000
57.3
$0.270
$0.950
57.0
$3.000
$15.000
56.6
$0.260
$0.900
56.5
$15.000
$75.000
55.4
$0.600
$1.920
55.2
$0.500
$3.000
55.1
$3.000
$15.000
54.7
$0.400
$1.750
54.6
$1.000
$5.000
54.3
$0.270
$0.410
54.1
$0.780
$3.900
53.8
$15.000
$75.000
53.7
$3.000
$15.000
53.7
$0.400
$2.000
53.5
$5.000
$25.000
53.1
$0.200
$0.500
52.7
$0.100
$0.400
52.6
$0.980
$3.080
52.0
$0.390
$0.900
51.6
$0.149
$0.900
51.2
$0.260
$0.900
50.8
$0.080
$0.300
50.7
$0.200
$0.500
50.5
$0.300
$2.500
50.3
$0.100
$0.400
49.9
$0.200
$0.200
49.8
$2.000
$12.000
49.7
$0.150
$0.500
49.1
$0.252
$0.378
49.0
$1.000
$10.000
48.7
$2.500
$15.000
48.4
$3.000
$15.000
48.3
$0.780
$3.900
48.0
$0.875
$7.000
47.4
$0.112
$0.224
47.2
$0.100
$0.320
47.1
$3.000
$15.000
46.9
$0.195
$0.900
46.9
$0.900
$0.900
46.3
$0.060
$0.400
46.3
$0.071
$0.100
46.1
$0.250
$0.500
45.9
$0.435
$0.870
45.8
$1.250
$10.000
45.6
$0.060
$0.300
45.4
$3.000
$15.000
45.4
$0.130
$0.900
45.1
$1.250
$10.000
45.0
$5.000
$25.000
44.6
$0.140
$0.900
44.5
$0.730
$3.400
44.3
$0.780
$3.900
44.1
$0.600
$2.200
44.1
$3.000
$15.000
44.0
$0.400
$1.900
43.7
$5.000
$25.000
43.6
$0.390
$1.740
43.4
$15.000
$75.000
43.3
$0.625
$5.000
43.2
$0.270
$0.410
43.1
$0.150
$0.600
43.0
$2.000
$8.000
43.0
$5.000
$25.000
43.0
$0.800
$4.000
42.8
$1.000
$3.000
42.7
$0.200
$0.880
42.6
$3.000
$15.000
42.6
$3.000
$15.000
42.4
$1.000
$5.000
42.0
$0.100
$0.400
41.8
$0.400
$2.200
41.8
$0.400
$2.000
41.7
$0.080
$0.280
41.5
$0.550
$2.200
41.5
$0.210
$0.790
41.5
$0.120
$0.200
41.5
$0.200
$1.500
41.4
$0.400
$2.200
41.2
$0.270
$0.950
41.2
$3.000
$15.000
41.2
$0.200
$0.770
41.0
$0.060
$0.200
40.5
$0.300
$2.500
40.5
$0.220
$0.900
40.5
$0.100
$0.400
40.2
$0.100
$0.300
39.9
$0.117
$1.365
39.9
$0.400
$2.000
39.8
$0.090
$0.780
39.7
$0.500
$2.150
39.6
$0.080
$0.300
39.5
$0.100
$0.400
39.5
$0.400
$2.000
39.3
$0.104
$0.416
39.2
$0.200
$0.600
39.2
$0.280
$0.900
39.1
$0.900
$0.900
39.1
$0.550
$2.000
39.0
$0.300
$2.500
39.0
$0.900
$0.900
38.8
$0.455
$0.900
38.7
$0.200
$0.800
38.3
$0.800
$3.200
38.1
$0.050
$0.200
38.1
$0.100
$0.400
38.1
$0.400
$0.900
38.1
$0.210
$0.790
37.8
$0.040
$0.150
37.8
$0.200
$0.500
37.7
$0.130
$0.850
37.5
$0.050
$0.200
37.5
$0.510
$0.740
37.1
$0.100
$0.400
37.0
$0.360
$0.400
36.9
$0.040
$0.130
36.7
$0.390
$1.740
36.7
$0.455
$0.900
36.6
$2.500
$10.000
36.5
$0.200
$0.500
36.5
$0.080
$0.280
36.3
$2.500
$12.500
36.2
$0.150
$0.900
36.2
$0.250
$1.250
36.1
$0.110
$0.800
35.2
$2.000
$8.000
35.2
$0.200
$0.770
34.8
$1.000
$3.000
34.8
$0.070
$0.280
34.6
$2.000
$6.000
34.5
$0.340
$0.390
34.4
$0.600
$1.800
34.2
$0.060
$0.240
34.2
$0.200
$1.100
34.0
$0.200
$0.200
33.5
$0.050
$0.200
33.5
$0.075
$0.200
33.5
$0.130
$0.520
33.1
$0.090
$0.300
33.1
$0.100
$0.400
32.9
$0.100
$0.200
32.8
$1.000
$3.000
32.7
$0.070
$0.270
32.6
$0.200
$0.200
32.5
$0.050
$0.400
32.5
$0.080
$0.200
32.3
$0.050
$0.200
32.0
$0.200
$0.200
32.0
$0.080
$0.280
31.9
$0.200
$0.200
31.9
$0.080
$0.160
31.8
$2.000
$6.000
31.6
$0.080
$0.280
31.5
$0.100
$0.400
31.5
$0.130
$0.400
31.3
$2.000
$6.000
31.2
$0.150
$0.600
30.9
$0.900
$0.900
30.8
$0.060
$0.060
30.4
$0.300
$0.900
30.1
$0.100
$0.300
29.9
$0.400
$2.000
29.9
$0.035
$0.140
29.4
$0.150
$0.150
29.1
$0.130
$0.400
29.0
$0.600
$1.800
28.6
$0.020
$0.050
28.6
$0.050
$0.200
28.6
$0.040
$0.080
28.3
$0.060
$0.120
27.9
$0.300
$0.900
27.9
$0.040
$0.160
27.6
$0.700
$0.800
27.6
$0.040
$0.160
27.1
$0.200
$0.200
26.9
$0.050
$0.080
26.4
$0.010
$0.020
26.3
$0.030
$0.050
26.2
$0.200
$0.200
25.9
$0.040
$0.040
24.6
$0.060
$0.200
23.9
$0.065
$0.140
23.5
$0.290
$0.290
22.9
$0.020
$0.020
22.8

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About IFBench

Instruction Following Benchmark measuring LLM ability to adhere to nuanced writing constraints and formatting requirements.

This leaderboard shows all models with IFBench benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Instruction Following Benchmark measuring LLM ability to adhere to nuanced writing constraints and formatting requirements.
As of May 20, 2026, MiMo v2.5 Pro leads the IFBench leaderboard with a score of 79.9. Rankings change as new models are released and evaluated.
Currently 253 models have been evaluated on IFBench, with an average score of 47.9 and standard deviation of 15.4.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.