Price Per TokenPrice Per Token

BIRD-CRITIC Leaderboard

BIRD-CRITIC — multi-turn benchmark testing SQL generation and database interaction.

Data from LayerLens

As of June 2, 2026, the top-scoring model on BIRD-CRITIC is Claude Opus 4.6 at 34.0%, followed by Claude Opus 4.6 at 34.0% and GLM 4.7 at 33.0%. 31 models have been evaluated on this benchmark.

Last updated: June 2, 2026

Models

31

Best Score

34.0

Average

27.7

Std Dev

3.5

Categories
Multi-turn
Provider
Model
Input $/M
Output $/M
BIRD-CRITIC
Actions
$5.000
$25.000
34.0
$5.000
$25.000
34.0
$0.400
$1.540
33.0
$0.400
$1.540
33.0
$0.400
$1.900
31.3
$0.400
$1.900
31.3
$0.220
$0.900
31.0
$0.071
$0.100
30.9
$3.000
$15.000
29.7
$3.000
$15.000
29.7
$0.300
$2.500
29.7
$0.300
$2.500
29.7
$0.300
$2.500
29.7
$0.300
$2.500
29.7
$0.039
$0.100
25.8
$0.039
$0.100
25.8
$0.080
$0.160
25.8
$0.290
$0.950
25.7
$0.100
$0.400
25.3
$0.100
$0.400
25.3
$0.100
$0.400
25.3
$0.100
$0.400
25.3
$0.300
$0.900
25.3
$0.300
$0.900
25.3
$0.500
$1.500
25.0
$0.060
$0.400
25.0
$0.060
$0.400
25.0
$0.200
$1.100
24.7
$0.050
$0.200
22.7
$0.050
$0.200
22.7
$0.900
$0.900
21.7

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About BIRD-CRITIC

BIRD-CRITIC — multi-turn benchmark testing SQL generation and database interaction.

This leaderboard shows all models with BIRD-CRITIC benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

BIRD-CRITIC — multi-turn benchmark testing SQL generation and database interaction.
As of June 2, 2026, Claude Opus 4.6 leads the BIRD-CRITIC leaderboard with a score of 34.0. Rankings change as new models are released and evaluated.
Currently 31 models have been evaluated on BIRD-CRITIC, with an average score of 27.7 and standard deviation of 3.5.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.