Price Per TokenPrice Per Token

Big-Bench Hard Leaderboard

Challenging subset of BIG-Bench focusing on tasks where language models previously underperformed, testing advanced reasoning capabilities.

About Big-Bench Hard

Challenging subset of BIG-Bench focusing on tasks where language models previously underperformed, testing advanced reasoning capabilities.

This leaderboard shows all models with Big-Bench Hard benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Built by @aellman

2026 68 Ventures, LLC. All rights reserved.