Price Per TokenPrice Per Token

IFEval Leaderboard

Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.

Data from LayerLens

As of May 20, 2026, the top-scoring model on IFEval is Kimi K2.5 at 92.6%, followed by Kimi K2.5 at 92.6% and Gemini 2.5 Pro at 90.8%. 26 models have been evaluated on this benchmark.

Last updated: May 20, 2026

Models

26

Best Score

92.6

Average

83.8

Std Dev

6.3

Categories
Instruction Following

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About IFEval

Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.

This leaderboard shows all models with IFEval benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Instruction Following Evaluation benchmark testing how well LLMs follow detailed formatting and content constraints.
As of May 20, 2026, Kimi K2.5 leads the IFEval leaderboard with a score of 92.6. Rankings change as new models are released and evaluated.
Currently 26 models have been evaluated on IFEval, with an average score of 83.8 and standard deviation of 6.3.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.