Price Per TokenPrice Per Token

Best Local LLMs & Local Models (2026)

Community-voted rankings for the best local models — open-source LLMs for coding, math, reasoning, and more

Get our weekly newsletter on pricing changes, new releases, and tools.

Provider
Model
Input $/M
Output $/M
LIVECODEBENCH
MATH_HARD
GPQA
Vote
Score
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.050
$0.200
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.010
$0.020
-
-
-
000
$0.010
$0.020
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.200
-
-
-
000
-
$0.000
$0.000
-
-
-
000
$0.800
$1.200
-
-
-
000
$0.037
$0.150
-
-
-
000
$0.150
$0.150
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.500
$0.500
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.170
$0.170
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.045
$0.150
-
-
-
000
$0.100
$0.200
-
-
-
000
$0.120
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.150
$0.150
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.050
$0.080
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.500
$0.500
-
-
-
000
$0.170
$0.170
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.600
$1.800
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.300
$0.300
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.020
$0.040
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.500
$0.500
-
-
-
000
$0.070
$0.070
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.240
$0.240
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.050
$0.150
-
-
-
000
$0.300
$0.300
-
-
-
000
$0.030
$0.090
-
-
-
000
$0.800
$1.200
-
-
-
000
$0.040
$0.080
-
-
-
000
$0.040
$0.130
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.400
$0.400
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.040
$0.040
-
-
-
000
$0.100
$0.100
-
-
-
000
IB
Ibm
$0.000
$0.000
-
-
-
000
$0.300
$0.300
-
-
-
000
$0.040
$0.100
-
-
-
000
$0.060
$0.060
-
-
-
000
$0.170
$0.430
-
-
-
000
$0.030
$0.050
-
-
-
000
$0.020
$0.020
-
-
-
000
$0.049
$0.049
-
-
-
000
$0.080
$0.200
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.150
$0.150
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.040
$0.050
-
-
-
000
$0.020
$0.050
-
-
-
000
$0.020
$0.040
-
-
-
000
$0.140
$0.140
-
-
-
000
$0.030
$0.080
-
-
-
000
$0.059
$0.059
-
-
-
000
$0.140
$0.200
-
-
-
000
$0.000
$0.000
-
-
-
000
$0.140
$0.420
-
-
-
000
$0.030
$0.040
-
-
-
000
$0.140
$0.200
-
-
-
000
$0.140
$0.420
-
-
-
000
$0.300
$0.300
-
-
-
000
$0.060
$0.060
-
-
-
000
$0.100
$0.100
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000
$0.200
$0.200
-
-
-
000

Vote for open-source models that work well (or don't) for local use.

Pricing from OpenRouter.

Running LLMs Locally

Open-source models can run on your own hardware using tools like Ollama, llama.cpp, or vLLM. This gives you full privacy, zero API costs, and offline capability.

1

VRAM Requirements

7B models need ~6GB VRAM (4-bit quantized). 13B models need ~10GB. 70B models need ~40GB or multi-GPU.

2

Recommended: Ollama

Install Ollama, run ollama pull qwen3:8b, then point your tool at http://localhost:11434/v1.

3

Alternative: llama.cpp / vLLM

For more control over quantization and batched inference, these give you a full OpenAI-compatible API with fine-grained settings.

Compare all local LLM runners →

About This Leaderboard

This leaderboard ranks open-source, locally-runnable AI models by community votes from developers. Models are filtered to open-weight models with 13B parameters or fewer — small enough to run on a single consumer GPU.

Use the tabs above to see which local models are best for specific tasks like coding, math, or general reasoning. Benchmark scores from Artificial Analysis are shown alongside votes to help you compare real-world experience with synthetic performance.

Frequently Asked Questions

Based on community votes and LiveCodeBench scores, Qwen 2.5 Coder and DeepSeek Coder V2 are top-rated for local coding. Performance depends on model size and VRAM.
7B models need ~6GB VRAM (4-bit). 13B models need ~10GB. 70B models need 40GB+ or multi-GPU. CPU-only works but is very slow for interactive use.
DeepSeek and Qwen models with reasoning capabilities tend to score highest on math benchmarks. Check the Math tab for current community rankings.
Ollama is the easiest option — one command to download and serve any model with an OpenAI-compatible API. For more control, llama.cpp or vLLM offer advanced quantization and batching options.
For small, focused tasks, the best open-source models come close. But frontier API models still lead on complex multi-step reasoning, long-context tasks, and agentic coding. Many teams use a hybrid approach — local models for simple tasks, API models for hard ones.