Price Per TokenPrice Per Token

IFBench Leaderboard

Instruction Following Benchmark measuring LLM ability to adhere to nuanced writing constraints and formatting requirements.

Data from Artificial Analysis

As of April 4, 2026, the top-scoring model on IFBench is Qwen3.5 397B A17B at 78.8%, followed by Gemini 3 Flash Preview at 78.0% and GPT-5.2-Codex at 77.6%. 218 models have been evaluated on this benchmark.

Last updated: April 4, 2026

Models

218

Best Score

78.8

Average

46.2

Std Dev

15.0

Categories
Instruction Following
Provider
Model
Input $/M
Output $/M
IFBench
Actions
$0.390
$0.900
78.8
$0.500
$3.000
78.0
$1.750
$14.000
77.6
$0.260
$2.080
75.7
$0.300
$1.200
75.7
$0.195
$0.900
75.6
$0.125
$1.000
75.4
$10.500
$84.000
75.4
$1.750
$14.000
75.4
$1.250
$10.000
74.1
$2.500
$15.000
73.9
$1.200
$4.000
73.2
$0.625
$5.000
73.1
$0.625
$5.000
72.9
$0.163
$0.900
72.5
$0.255
$1.000
72.3
$0.720
$2.300
72.3
$0.090
$0.290
71.8
$0.118
$0.950
71.6
$2.000
$8.000
71.4
$0.050
$0.200
71.1
$0.780
$3.900
70.7
$0.625
$5.000
70.6
$2.000
$12.000
70.4
$15.000
$60.000
70.3
$0.383
$1.720
70.2
$1.250
$10.000
70.0
$0.270
$0.950
69.9
$0.250
$0.750
69.8
$0.039
$0.100
69.0
$0.550
$2.200
68.7
$0.207
$0.828
68.4
$0.550
$2.200
68.1
$0.250
$2.000
67.9
$0.390
$1.750
67.9
$0.050
$0.400
67.6
$1.100
$4.400
67.1
$0.040
$0.150
66.7
$0.625
$5.000
66.6
$0.150
$0.500
66.0
$0.050
$0.400
65.9
$0.030
$0.100
65.1
$0.100
$0.300
64.6
$0.090
$0.290
64.2
$0.400
$1.200
63.9
$0.250
$2.000
61.2
$0.060
$0.400
60.8
$0.260
$0.380
60.7
$0.039
$0.100
58.3
$5.000
$25.000
58.0
$3.000
$15.000
57.3
$0.210
$0.790
57.0
$3.000
$15.000
56.6
$15.000
$75.000
55.4
$0.720
$2.300
55.2
$0.500
$3.000
55.1
$3.000
$15.000
54.7
$0.390
$1.750
54.6
$1.000
$5.000
54.3
$0.270
$0.410
54.1
$0.780
$3.900
53.8
$15.000
$75.000
53.7
$3.000
$15.000
53.7
$0.400
$2.000
53.5
$5.000
$25.000
53.1
$0.200
$0.500
52.7
$0.100
$0.400
52.6
$0.390
$0.900
51.6
$0.149
$0.900
51.2
$0.260
$2.080
50.8
$0.200
$0.500
50.5
$0.300
$2.500
50.3
$0.100
$0.400
49.9
$0.200
$0.200
49.8
$0.150
$0.500
49.1
$0.260
$0.380
49.0
$1.000
$10.000
48.7
$2.500
$15.000
48.4
$3.000
$15.000
48.3
$0.780
$3.900
48.0
$1.750
$14.000
47.4
$0.100
$0.320
47.1
$3.000
$15.000
46.9
$0.195
$0.900
46.9
$0.060
$0.400
46.3
$0.071
$0.100
46.1
$0.250
$0.500
45.9
$0.625
$5.000
45.6
$3.000
$15.000
45.4
$5.000
$25.000
44.6
$0.163
$0.900
44.5
$0.600
$2.200
44.1
$0.780
$3.900
44.1
$3.000
$15.000
44.0
$0.383
$1.720
43.7
$0.390
$1.700
43.4
$15.000
$75.000
43.3
$0.625
$5.000
43.2
$0.270
$0.410
43.1
$0.150
$0.600
43.0
$2.000
$8.000
43.0
$5.000
$25.000
43.0
$0.800
$4.000
42.8
$0.200
$0.880
42.7
$3.000
$15.000
42.7
$3.000
$15.000
42.4
$1.000
$5.000
42.0
$0.100
$0.400
41.8
$0.400
$2.000
41.7
$0.080
$0.280
41.5
$0.550
$2.200
41.5
$0.150
$0.750
41.5
$0.120
$0.200
41.5
$0.200
$1.500
41.4
$0.400
$1.760
41.2
$0.210
$0.790
41.2
$3.000
$15.000
41.2
$0.200
$0.770
41.0
$0.060
$0.200
40.5
$0.220
$0.900
40.5
$0.300
$2.500
40.5
$0.100
$0.400
40.2
$0.090
$0.290
39.9
$0.400
$2.000
39.8
$0.090
$0.780
39.7
$0.450
$2.150
39.6
$0.080
$0.300
39.5
$0.400
$2.000
39.3
$0.104
$0.416
39.2
$0.200
$0.600
39.2
$0.280
$0.900
39.1
$0.900
$0.900
39.0
$0.550
$2.000
39.0
$0.300
$2.500
39.0
$0.150
$0.580
38.8
$0.400
$0.800
38.7
$0.200
$0.800
38.3
$0.800
$3.200
38.1
$0.050
$0.200
38.1
$0.400
$0.900
38.1
$0.150
$0.750
37.8
$0.040
$0.150
37.8
$0.200
$0.500
37.7
$0.130
$0.850
37.6
$0.050
$0.200
37.5
$0.510
$0.740
37.1
$0.100
$0.400
37.0
$0.120
$0.390
36.9
$0.040
$0.130
36.7
$0.390
$1.700
36.7
$0.400
$0.800
36.6
$2.500
$10.000
36.5
$0.200
$0.500
36.5
$0.080
$0.240
36.3
$2.500
$12.500
36.2
$0.250
$1.250
36.1
$2.500
$10.000
36.0
$2.000
$8.000
35.2
$0.120
$0.750
35.2
$0.200
$0.770
34.8
$1.000
$3.000
34.8
$0.070
$0.280
34.6
$2.000
$6.000
34.5
$0.340
$0.390
34.4
$0.600
$1.800
34.2
$0.060
$0.240
34.1
$0.200
$1.100
34.0
$0.050
$0.200
33.5
$0.200
$0.200
33.5
$0.075
$0.200
33.5
$0.090
$0.300
33.1
$0.130
$0.520
33.1
$0.100
$0.400
32.9
$0.100
$0.200
32.8
$0.070
$0.270
32.7
$1.000
$3.000
32.7
$0.200
$0.200
32.5
$0.050
$0.400
32.5
$0.080
$0.200
32.3
$0.050
$0.200
32.0
$0.200
$0.200
32.0
$0.080
$0.280
31.9
$0.200
$0.200
31.9
$0.080
$0.160
31.8
$2.000
$6.000
31.6
$0.080
$0.240
31.5
$0.100
$0.400
31.5
$0.130
$0.400
31.3
$2.000
$6.000
31.2
$0.150
$0.600
31.0
$0.900
$0.900
30.7
$0.049
$0.049
30.4
$0.100
$0.200
30.4
$0.300
$0.900
30.1
$0.030
$0.110
29.9
$0.400
$2.000
29.9
$0.035
$0.140
29.4
$0.150
$0.150
29.1
$0.130
$0.400
29.0
$0.020
$0.050
28.6
$0.050
$0.200
28.6
$0.600
$1.800
28.6
$0.040
$0.080
28.3
$0.020
$0.040
27.9
$0.300
$0.900
27.9
$0.700
$0.800
27.6
$0.040
$0.160
27.6
$0.040
$0.160
27.1
$0.200
$0.200
26.9
$0.050
$0.080
26.4
$0.010
$0.020
26.3
$0.030
$0.050
26.2
$0.200
$0.200
25.9
$0.030
$0.040
24.6
$0.060
$0.200
23.9
$0.065
$0.140
23.5
$0.290
$0.290
22.9
$0.020
$0.020
22.8

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

8 Ways to Use Fewer Tokens

About IFBench

Instruction Following Benchmark measuring LLM ability to adhere to nuanced writing constraints and formatting requirements.

This leaderboard shows all models with IFBench benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Instruction Following Benchmark measuring LLM ability to adhere to nuanced writing constraints and formatting requirements.
As of April 4, 2026, Qwen3.5 397B A17B leads the IFBench leaderboard with a score of 78.8. Rankings change as new models are released and evaluated.
Currently 218 models have been evaluated on IFBench, with an average score of 46.2 and standard deviation of 15.0.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.