PDF Text Reader

by wfyi-joy

GitHub 2 746 uses Remote

About

PDF Text Reader extracts text content from PDF documents, supporting both local files and remote URLs. It enables quick text extraction for research, summarization, and citation workflows by eliminating manual copy-paste operations. Key features: - Extract text from local PDF files via Docker volume mounts - Fetch and parse PDFs from remote URLs - Auto-detection of PDF encoding formats - Robust error handling for corrupt, invalid, or inaccessible PDFs - Standardized JSON output format for easy integration with other tools

README

PDF Reader MCP Server

A Model Context Protocol (MCP) server that provides tools for reading and extracting text from PDF files, supporting both local files and URLs.

Author

Philip Van de Walker Email: philip.vandewalker@gmail.com GitHub: https://github.com/trafflux

Features

Read text content from local PDF files

Read text content from PDF URLs

Error handling for corrupt or invalid PDFs

Volume mounting for accessing local PDFs

Auto-detection of PDF encoding

Standardized JSON output format

Installation

1. Clone the repository:

git clone https://github.com/trafflux/pdf-reader-mcp.git
cd pdf-reader-mcp

2. Build the Docker image:

docker build -t mcp/pdf-reader .

Usage

Running the Server

To run the server with access to local PDF files:

docker run -i --rm -v /path/to/pdfs:/pdfs mcp/pdf-reader

Replace /path/to/pdfs with the actual path to your PDF files directory.

If not using local PDF files:

docker run -i --rm mcp/pdf-reader

MCP Configuration

Add to your MCP settings configuration:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-v",
        "/path/to/pdfs:/pdfs",
        "mcp/pdf-reader"
      ],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Without local file PDF files:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "mcp/pdf-reader"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Available Tools

1. read_local_pdf

- Purpose: Read text content from a local PDF file - Input:

     {
       "path": "/pdfs/document.pdf"
     }

- Output:

     {
       "success": true,
       "data": {
         "text": "Extracted content..."
       }
     }

2. read_pdf_url - Purpose: Read text content from a PDF URL - Input:

     {
       "url": "https://example.com/document.pdf"
     }

- Output:

     {
       "success": true,
       "data": {
         "text": "Extracted content..."
       }
     }

Error Handling

The server handles various error cases with clear error messages:

Invalid or corrupt PDF files

Missing files

Failed URL requests

Permission issues

Network connectivity problems

Error responses follow the format:

{
  "success": false,
  "error": "Detailed error message"
}

Dependencies

Python 3.11+

PyPDF2: PDF parsing and text extraction

requests: HTTP client for fetching PDFs from URLs

MCP SDK: Model Context Protocol implementation

Project Structure

.
├── Dockerfile          # Container configuration
├── README.md          # This documentation
├── requirements.txt   # Python dependencies
└── src/
    ├── __init__.py    # Package initialization
    └── server.py      # Main server implementation

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Contact

For questions, issues, or contributions, please contact Philip Van de Walker:

Email: philip.vandewalker@gmail.com

GitHub: https://github.com/trafflux

Related MCP Servers

AI Research Assistant

hamid-vakilzadeh

AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar

Web & Search

12 8

Linkup

LinkupPlatform

Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages

Web & Search

2 24

Math-MCP

EthanHenrickson

Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support

Developer Tools

22 81