LLM Rankings & Leaderboards
Compare AI model performance across industry-standard benchmarks with pricing data.
Rankings by Task
Best LLM for Coding
Ranked by LiveCodeBench
Find the best AI models for code generation, debugging, and programming tasks.
Best LLM for Math
Ranked by MATH (Hard)
Find the best AI models for mathematical reasoning and problem-solving.
Best LLM for Writing
Ranked by IFEval
Find the best AI models for writing and instruction-following tasks.
Best LLM for RAG
Ranked by MMLU-Pro
Find the best AI models for retrieval-augmented generation pipelines.
Community Voted Leaderboards
Vote for the best AI models for your favorite coding tools.
Best LLM for Cursor
Community Rated
Community-voted rankings for Cursor, the AI-powered code editor.
Best LLM for Windsurf
Community Rated
Community-voted rankings for Windsurf by Codeium.
Best LLM for Zed
Community Rated
Community-voted rankings for Zed, the high-performance Rust editor.
Best LLM for Trae
Community Rated
Community-voted rankings for Trae, ByteDance's free AI IDE.

Best LLM for OpenClaw
Community Rated
Community-voted rankings for OpenClaw agentic workflows.
Best LLM for Cline
Community Rated
Community-voted rankings for Cline, the autonomous VS Code agent.
Best LLM for Roo Code
Community Rated
Community-voted rankings for Roo Code, the Cline-fork coding agent.
Best LLM for Claude Code
Community Rated
Community-voted rankings for Claude Code, Anthropic's CLI agent.
Best LLM for OpenCode
Community Rated
Community-voted rankings for OpenCode, the open-source terminal agent.
Best LLM for Warp
Community Rated
Community-voted rankings for Warp, the AI-powered terminal.
Best LLM for Aider
Community Rated
Community-voted rankings for Aider, the AI pair programmer.
Best LLM for GitHub Copilot
Community Rated
Community-voted rankings for GitHub Copilot.
Best LLM for Bolt
Community Rated
Community-voted rankings for Bolt.new web development.
Best LLM for Bolt.diy
Community Rated
Community-voted rankings for Bolt.diy, the open-source AI web builder.
Best LLM for Lovable
Community Rated
Community-voted rankings for Lovable, the AI app builder.
Best LLM for Open WebUI
Community Rated
Community-voted rankings for Open WebUI, the self-hosted LLM interface.
Best LLM for LibreChat
Community Rated
Community-voted rankings for LibreChat, the multi-model chat platform.

Best LLM for TypingMind
Community Rated
Community-voted rankings for TypingMind, the BYOK AI chat client.

Best LLM for Jan
Community Rated
Community-voted rankings for Jan, the privacy-focused AI assistant.
Best LLM for n8n
Community Rated
Community-voted rankings for n8n workflow automation.
Best LLM for Flowise
Community Rated
Community-voted rankings for Flowise, the visual LLM app builder.
Best LLM for Dify
Community Rated
Community-voted rankings for Dify, the AI workflow canvas.
Best LLM for AnythingLLM
Community Rated
Community-voted rankings for AnythingLLM RAG and agents platform.

Best LLM for Continue.dev
Community Rated
Community-voted rankings for the open-source AI code assistant.
Best LLM for Tabnine
Community Rated
Community-voted rankings for Tabnine AI code completion.
Best LLM for Pieces
Community Rated
Community-voted rankings for Pieces developer tool.
Best LLM for Cody
Community Rated
Community-voted rankings for Cody, Sourcegraph's AI code assistant.
Best LLM for Raycast AI
Community Rated
Community-voted rankings for Raycast AI on Mac.
Featured Benchmarks
Performance evaluations across domains
GPQA
Graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry. Questions are Google-proof and extremely difficult.
MMLU-Pro
Massive Multitask Language Understanding benchmark testing knowledge across 57 diverse subjects including STEM, humanities, social sciences, and professional domains.
AIME 2024
American Invitational Mathematics Examination problems testing olympiad-level mathematical reasoning with integer answers from 000-999.
LiveCodeBench
Real-world coding benchmark with problems from competitive programming contests, testing code generation and problem-solving abilities.
Aider
Aider code editing benchmark measuring LLM ability to modify existing code based on natural language instructions.
Big-Bench Hard
Challenging subset of BIG-Bench focusing on tasks where language models previously underperformed, testing advanced reasoning capabilities.
MATH (Hard)
Competition mathematics problems requiring multi-step reasoning, covering algebra, geometry, number theory, and calculus.

Best LLMs for OpenClaw— Vote for which model works best with OpenClaw
112 out of our 301 tracked models have had a price change in February.
Get our weekly newsletter on pricing changes, new releases, and tools.
