About
Ref provides token-efficient access to technical documentation for AI coding agents, enabling precise search and retrieval of API references, library guides, and service documentation. Unlike bulk documentation loaders, it uses intelligent search to deliver only the relevant context, reducing token consumption by 60-95%. Key capabilities: - Search across public documentation or index private GitHub repositories and PDFs for custom knowledge bases - Session-aware search that filters duplicate results and tracks query trajectories to refine results - Smart content extraction that surfaces only the most relevant sections of documentation pages - Optimized MCP implementation designed to minimize context rot while maximizing information retrieval accuracy
README
[](https://ref.tools) [](https://smithery.ai/server/@ref-tools/ref-tools-mcp) [](https://ref.tools) [](LICENSE) [](https://www.npmjs.com/package/ref-tools-mcp)
Ref MCP
A ModelContextProtocol server that gives your AI coding tool or agent access to documentation for APIs, services, libraries etc. It's your one-stop-shop to keep your agent up-to-date on documentation in a fast and token-efficient way.
For more see info ref.tools
Agentic search for exactly the right context
Ref's tools are design to match how models search while using as little context as possible to reduce context rot. The goal is to find exactly the context your coding agent needs to be successful while using minimum tokens.
Depending on the complexity of the prompt, LLM coding agents like Claude Code will typically do one or more searches and then choose a few resources to read in more depth.
For a simple query about Figma's Comment REST API it will make a couple calls to get exactly what it needs:
SEARCH 'Figma API post comment endpoint documentation' (54 tokens)
READ https://www.figma.com/developers/api#post-comments-endpoint (385 tokens)
For more complex situations, the LLM will try to refine it's prompt as it reads results. For example:
SEARCH 'n8n merge node vs Code node multiple inputs best practices' (126)
READ https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.merge/#merge (4961)
READ https://docs.n8n.io/flow-logic/merging/#merge-data-from-multiple-node-executions (138)
SEARCH 'n8n Code node multiple inputs best practices when to use' (107)
READ https://docs.n8n.io/code/code-node/#usage (80)
SEARCH 'n8n Code node access multiple inputs from different nodes' (370)
SEARCH 'n8n Code node $input access multiple node inputs' (372)
READ https://docs.n8n.io/code/builtin/output-other-nodes/#output-of-other-nodes (2310)
Ref takes advantage of MCP sessions to track search trajectory and minimize context usage. There's a lot more ideas cooking but here's what we've implemented so far.
1. Filtering search results
For repeated similar searches in a session, Ref will never return repeated results. Traditionally, you dig farther in to search results by paging to the next result but this approach allows the agent to page AND adjust the prompt at the same time.2. Fetching the part of the page that matters
When reading a page of documentation, Ref will use the agent's session search history to dropout less relevant sections and return the most relevant 5k tokens. This helps Ref avoid a big problem with standardfetch() web scraping which is when it hits a large documentation page you can easily end up pull in 20k+ tokens into context, most of which are irrelevant. Why does minimizing tokens from documentation context matter?
1. More context makes models dumber
It's well documented that as of July 2025 that models get dumber as you put in more tokens. You might have heard about how models are great with long context now and that's kind of true but not the whole picture. For a quick primer on some research, checkout this video from the team at Chroma.
2. Tokens cost $$$
Imagine you are using Claude Opus as a background agent and you start by having the agent pull in documentation context and suppose it pulls in 10000 tokens of context with 4000 being relevant and 6000 being extra noise. At API pricing, that 6k tokens cost about $0.09 PER STEP. If one prompt ends up taking 11 steps with Opus, you've spent $1 for no reason.
Setup
There are two options for setting up Ref as an MCP server, either via the streamable-http server (recommended) or local stdio server (legacy).
This repo contains the legacy stdio server.
Streamable HTTP (recommended)
[](https://cursor.com/install-mcp?name=Ref&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIm1jcC1yZW1vdGVAMC4xLjAtMCIsImh0dHBzOi8vYXBpLnJlZi50b29scy9tY3AiLCItLWhlYWRlcj14LXJlZi1hcGkta2V5OjxzaWduIHVwIHRvIGdldCBhbiBhcGkga2V5PiJdfQ==)
"Ref": {
"type": "http",
"url": "https://api.ref.tools/mcp?apiKey=YOUR_API_KEY"
}
stdio
[](https://cursor.com/install-mcp?name=Ref&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyJyZWYtdG9vbHMtbWNwIl0sImVudiI6eyJSRUZfQVBJX0tFWSI6IjxzaWduIHVwIHRvIGdldCBhbiBhcGkga2V5PiJ9fQ==)
```
"Ref": {
"command": "npx",
"args": ["ref-tools-mcp@latest"],
"env": {
"REF_API_KEY":
Related MCP Servers
AI Research Assistant
hamid-vakilzadeh
AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar
Linkup
LinkupPlatform
Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages
Math-MCP
EthanHenrickson
Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support