Price Per TokenPrice Per Token

Big-Bench Hard Leaderboard

Challenging subset of BIG-Bench focusing on tasks where language models previously underperformed, testing advanced reasoning capabilities.

OpenClaw

Best LLMs for OpenClaw Vote for which model works best with OpenClaw

112 out of our 301 tracked models have had a price change in February.

Get our weekly newsletter on pricing changes, new releases, and tools.

About Big-Bench Hard

Challenging subset of BIG-Bench focusing on tasks where language models previously underperformed, testing advanced reasoning capabilities.

This leaderboard shows all models with Big-Bench Hard benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.