GAIA — General AI Assistants benchmark testing multi-step real-world tasks.
Data from LayerLens
As of April 18, 2026, the top-scoring model on GAIA is GPT-5 Mini at 44.8%, followed by GPT-5 Mini at 44.8% and Claude 3.7 Sonnet at 43.9%. 12 models have been evaluated on this benchmark.
Last updated: April 18, 2026
Models
12
Best Score
44.8
Average
27.5
Std Dev
13.6
Provider | Model | Input $/M | Output $/M | GAIA | Actions |
|---|---|---|---|---|---|
$0.125 | $1.000 | 44.8 | |||
$0.125 | $1.000 | 44.8 | |||
$3.000 | $15.000 | 43.9 | |||
$3.000 | $15.000 | 43.9 | |||
$1.000 | $10.000 | 33.3 | |||
$0.500 | $2.150 | 27.9 | |||
$0.400 | $2.000 | 23.3 | |||
$0.090 | $0.400 | 20.6 | |||
$0.080 | $0.240 | 12.3 | |||
$0.080 | $0.240 | 12.3 | |||
$0.150 | $0.750 | 11.5 | |||
$0.150 | $0.750 | 11.5 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.
Get our weekly newsletter on pricing changes, new releases, and tools.
GAIA — General AI Assistants benchmark testing multi-step real-world tasks.
This leaderboard shows all models with GAIA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.