Price Per TokenPrice Per Token

BIRD-CRITIC Leaderboard

BIRD-CRITIC — multi-turn benchmark testing SQL generation and database interaction.

Data from LayerLens

As of April 18, 2026, the top-scoring model on BIRD-CRITIC is Claude Opus 4.6 at 34.0%, followed by Claude Opus 4.6 at 34.0% and GLM 4.7 at 33.0%. 31 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

31

Best Score

34.0

Average

27.7

Std Dev

3.5

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
BIRD-CRITIC
Actions
$5.000
$25.000
34.0
$5.000
$25.000
34.0
$0.390
$1.750
33.0
$0.390
$1.750
33.0
$0.383
$1.720
31.3
$0.383
$1.720
31.3
$0.220
$0.900
31.0
$0.071
$0.100
30.9
$3.000
$15.000
29.7
$3.000
$15.000
29.7
$0.300
$2.500
29.7
$0.300
$2.500
29.7
$0.300
$2.500
29.7
$0.300
$2.500
29.7
$0.039
$0.100
25.8
$0.039
$0.100
25.8
$0.080
$0.160
25.8
$0.290
$0.950
25.7
$0.100
$0.400
25.3
$0.100
$0.400
25.3
$0.100
$0.400
25.3
$0.100
$0.400
25.3
$0.300
$0.900
25.3
$0.300
$0.900
25.3
$0.500
$1.500
25.0
$0.060
$0.400
25.0
$0.060
$0.400
25.0
$0.200
$1.100
24.7
$0.050
$0.200
22.7
$0.050
$0.200
22.7
$0.280
$0.900
21.7

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About BIRD-CRITIC

BIRD-CRITIC — multi-turn benchmark testing SQL generation and database interaction.

This leaderboard shows all models with BIRD-CRITIC benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

BIRD-CRITIC — multi-turn benchmark testing SQL generation and database interaction.
As of April 18, 2026, Claude Opus 4.6 leads the BIRD-CRITIC leaderboard with a score of 34.0. Rankings change as new models are released and evaluated.
Currently 31 models have been evaluated on BIRD-CRITIC, with an average score of 27.7 and standard deviation of 3.5.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.