AGIEval Chinese — reasoning tasks from Chinese standardized exams (Gaokao, civil service).
Data from LayerLens
As of April 18, 2026, the top-scoring model on AGIEval Chinese is DeepSeek V3.2 Exp at 90.1%, followed by Qwen3 235B A22B at 89.4% and Qwen3 235B A22B at 89.4%. 32 models have been evaluated on this benchmark.
Last updated: April 18, 2026
Models
32
Best Score
90.1
Average
77.1
Std Dev
11.4
Provider | Model | Input $/M | Output $/M | AGIEval Chinese | Actions |
|---|---|---|---|---|---|
$0.270 | $0.410 | 90.1 | |||
$0.455 | $0.900 | 89.4 | |||
$0.455 | $0.900 | 89.4 | |||
$0.280 | $0.900 | 89.0 | |||
$0.390 | $1.740 | 88.2 | |||
$0.390 | $1.740 | 88.2 | |||
$0.550 | $2.000 | 87.8 | |||
$0.150 | $0.580 | 87.3 | |||
$0.150 | $0.580 | 87.3 | |||
$0.080 | $0.240 | 86.7 | |||
$0.080 | $0.240 | 86.7 | |||
$0.270 | $0.410 | 85.8 | |||
$0.080 | $0.280 | 85.5 | |||
$0.080 | $0.280 | 85.5 | |||
$0.071 | $0.100 | 84.4 | |||
$0.120 | $0.390 | 78.4 | |||
$0.300 | $0.500 | 77.4 | |||
$0.014 | $0.028 | 75.8 | |||
$0.550 | $2.200 | 74.8 | |||
$0.400 | $2.000 | 74.2 | |||
$0.400 | $2.000 | 74.2 | |||
$2.500 | $10.000 | 73.6 | |||
$0.100 | $0.400 | 73.3 | |||
$3.000 | $15.000 | 71.4 | |||
$2.000 | $8.000 | 69.3 | |||
$0.800 | $3.200 | 64.8 | |||
$2.000 | $6.000 | 63.1 | |||
$0.065 | $0.140 | 60.8 | |||
$0.080 | $0.160 | 60.3 | |||
$0.070 | $0.280 | 59.8 | |||
$0.035 | $0.140 | 56.2 | |||
$2.500 | $10.000 | 49.7 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
AGIEval Chinese — reasoning tasks from Chinese standardized exams (Gaokao, civil service).
This leaderboard shows all models with AGIEval Chinese benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.