Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.
Data from LayerLens
As of May 20, 2026, the top-scoring model on IFEval is Kimi K2.5 at 92.6%, followed by Kimi K2.5 at 92.6% and Gemini 2.5 Pro at 90.8%. 26 models have been evaluated on this benchmark.
Last updated: May 20, 2026
Models
26
Best Score
92.6
Average
83.8
Std Dev
6.3
Provider | Model | Input $/M | Output $/M | IFEval | Actions |
|---|---|---|---|---|---|
$0.400 | $1.900 | 92.6 | |||
$0.400 | $1.900 | 92.6 | |||
$1.000 | $10.000 | 90.8 | |||
$0.400 | $1.750 | 90.8 | |||
$0.400 | $1.750 | 90.8 | |||
$3.000 | $15.000 | 88.5 | |||
$0.270 | $0.410 | 88.1 | |||
$0.270 | $0.410 | 88.1 | |||
$0.550 | $2.200 | 87.6 | |||
$0.550 | $2.200 | 87.6 | |||
$0.280 | $0.900 | 85.4 | |||
$0.300 | $2.500 | 84.3 | |||
$0.300 | $2.500 | 84.3 | |||
$0.300 | $2.500 | 84.3 | |||
$0.300 | $2.500 | 84.3 | |||
$0.080 | $0.300 | 83.9 | |||
$0.220 | $0.900 | 82.8 | |||
$0.800 | $3.200 | 81.9 | |||
$0.080 | $0.160 | 81.9 | |||
$0.080 | $0.280 | 81.3 | |||
$0.080 | $0.280 | 81.3 | |||
$0.070 | $0.270 | 78.7 | |||
$1.000 | $3.000 | 75.4 | |||
$1.000 | $3.000 | 75.4 | |||
$0.130 | $0.400 | 68.6 | |||
$0.130 | $0.400 | 68.6 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.
This leaderboard shows all models with IFEval benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.