GAIA — General AI Assistants benchmark testing multi-step real-world tasks.
Data from LayerLens
Models
12
Best Score
44.8
Average
25.3
Std Dev
12.8
Provider | Model | Input $/M | Output $/M | GAIA | Actions |
|---|---|---|---|---|---|
$0.250 | $2.000 | 44.8 | |||
$3.000 | $15.000 | 43.9 | |||
$3.000 | $15.000 | 43.9 | |||
$1.250 | $10.000 | 33.3 | |||
$0.450 | $2.150 | 27.9 | |||
$0.400 | $2.000 | 23.3 | |||
$0.090 | $0.450 | 20.6 | |||
$2.500 | $10.000 | 17.6 | |||
$0.080 | $0.240 | 12.3 | |||
$0.080 | $0.240 | 12.3 | |||
$0.150 | $0.750 | 11.5 | |||
$0.150 | $0.750 | 11.5 |
Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Deploy OpenClaw in Under 1 Minute— We handle hosting, scaling, and maintenance
93 out of our 301 tracked models have had a price change in March.
Get our weekly newsletter on pricing changes, new releases, and tools.
GAIA — General AI Assistants benchmark testing multi-step real-world tasks.
This leaderboard shows all models with GAIA benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.