Compare mathematical reasoning performance across LLMs using competition-level benchmarks. Models are ranked by MATH (Hard) score with pricing information.
This leaderboard ranks AI models by their MATH (Hard) benchmark score, helping you find the best LLM for mathematical reasoning.
All models shown have benchmark data available. Pricing is shown per million tokens from OpenRouter. MATH (Hard) tests advanced mathematical reasoning with competition-level problems.