Price Per TokenPrice Per Token

IFEval Leaderboard

Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.

Data from LayerLens

As of April 4, 2026, the top-scoring model on IFEval is Kimi K2.5 at 92.6%, followed by Kimi K2.5 at 92.6% and Gemini 2.5 Pro at 90.8%. 24 models have been evaluated on this benchmark.

Last updated: April 4, 2026

Models

24

Best Score

92.6

Average

83.8

Std Dev

6.6

Categories
Instruction Following

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

8 Ways to Use Fewer Tokens

About IFEval

Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.

This leaderboard shows all models with IFEval benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.
As of April 4, 2026, Kimi K2.5 leads the IFEval leaderboard with a score of 92.6. Rankings change as new models are released and evaluated.
Currently 24 models have been evaluated on IFEval, with an average score of 83.8 and standard deviation of 6.6.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.