Compare mathematical reasoning performance across LLMs using competition-level benchmarks. Models are ranked by MATH (Hard) score with pricing information.
This leaderboard ranks AI models by their MATH (Hard) benchmark score, helping you find the best LLM for mathematical reasoning.
All models shown have benchmark data available. Pricing is shown per million tokens from OpenRouter. MATH (Hard) tests advanced mathematical reasoning with competition-level problems.
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.