Price Per TokenPrice Per Token

Best LLM for RAG

Compare LLM performance for retrieval-augmented generation. Models are ranked by MMLU-Pro (knowledge breadth) with reasoning and instruction-following scores.

Get our weekly newsletter on pricing changes, new releases, and tools.

About This Leaderboard

This leaderboard ranks AI models by community votes from developers, with benchmark_mmlu_pro and other benchmark scores shown alongside, helping you find the best llm for rag.

Pricing is shown per million tokens from OpenRouter. Compare LLM performance for retrieval-augmented generation. Models are ranked by MMLU-Pro (knowledge breadth) with reasoning and instruction-following scores.

Frequently Asked Questions

Based on MMLU-Pro knowledge breadth scores, the top-ranked model currently leads our RAG leaderboard. RAG performance depends on knowledge retrieval accuracy, context handling, and reasoning over retrieved documents.
Good RAG models need strong knowledge breadth (MMLU-Pro), reasoning ability (GPQA, BBH), and instruction following (IFEval). Large context windows also help for processing retrieved documents.
Yes. Larger context windows allow more retrieved documents to be included, but model quality matters more than raw context size. A model that reasons well over 32K tokens often outperforms one that accepts 128K tokens but loses information.
Benchmark scores are updated when new evaluations are published. Community votes update in real-time. Pricing data is refreshed daily.