Price Per TokenPrice Per Token
PDF Text Reader

PDF Text Reader

by wfyi-joy

GitHub 2 746 uses Remote
0

About

PDF Text Reader extracts text content from PDF documents, supporting both local files and remote URLs. It enables quick text extraction for research, summarization, and citation workflows by eliminating manual copy-paste operations. Key features: - Extract text from local PDF files via Docker volume mounts - Fetch and parse PDFs from remote URLs - Auto-detection of PDF encoding formats - Robust error handling for corrupt, invalid, or inaccessible PDFs - Standardized JSON output format for easy integration with other tools

README

PDF Reader MCP Server

A Model Context Protocol (MCP) server that provides tools for reading and extracting text from PDF files, supporting both local files and URLs.

Author

Philip Van de Walker Email: philip.vandewalker@gmail.com GitHub: https://github.com/trafflux

Features

  • Read text content from local PDF files
  • Read text content from PDF URLs
  • Error handling for corrupt or invalid PDFs
  • Volume mounting for accessing local PDFs
  • Auto-detection of PDF encoding
  • Standardized JSON output format
  • Installation

    1. Clone the repository:

    git clone https://github.com/trafflux/pdf-reader-mcp.git
    cd pdf-reader-mcp
    

    2. Build the Docker image:

    docker build -t mcp/pdf-reader .
    

    Usage

    Running the Server

    To run the server with access to local PDF files:

    docker run -i --rm -v /path/to/pdfs:/pdfs mcp/pdf-reader
    

    Replace /path/to/pdfs with the actual path to your PDF files directory.

    If not using local PDF files:

    docker run -i --rm mcp/pdf-reader
    

    MCP Configuration

    Add to your MCP settings configuration:

    {
      "mcpServers": {
        "pdf-reader": {
          "command": "docker",
          "args": [
            "run",
            "-i",
            "--rm",
            "-v",
            "/path/to/pdfs:/pdfs",
            "mcp/pdf-reader"
          ],
          "disabled": false,
          "autoApprove": []
        }
      }
    }
    

    Without local file PDF files:

    {
      "mcpServers": {
        "pdf-reader": {
          "command": "docker",
          "args": ["run", "-i", "--rm", "mcp/pdf-reader"],
          "disabled": false,
          "autoApprove": []
        }
      }
    }
    

    Available Tools

    1. read_local_pdf

    - Purpose: Read text content from a local PDF file - Input:

         {
           "path": "/pdfs/document.pdf"
         }
         
    - Output:
         {
           "success": true,
           "data": {
             "text": "Extracted content..."
           }
         }
         

    2. read_pdf_url - Purpose: Read text content from a PDF URL - Input:

         {
           "url": "https://example.com/document.pdf"
         }
         
    - Output:
         {
           "success": true,
           "data": {
             "text": "Extracted content..."
           }
         }
         

    Error Handling

    The server handles various error cases with clear error messages:

  • Invalid or corrupt PDF files
  • Missing files
  • Failed URL requests
  • Permission issues
  • Network connectivity problems
  • Error responses follow the format:

    {
      "success": false,
      "error": "Detailed error message"
    }
    

    Dependencies

  • Python 3.11+
  • PyPDF2: PDF parsing and text extraction
  • requests: HTTP client for fetching PDFs from URLs
  • MCP SDK: Model Context Protocol implementation
  • Project Structure

    .
    ├── Dockerfile          # Container configuration
    ├── README.md          # This documentation
    ├── requirements.txt   # Python dependencies
    └── src/
        ├── __init__.py    # Package initialization
        └── server.py      # Main server implementation
    

    License

    Copyright 2025 Philip Van de Walker

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

    Contributing

    Contributions are welcome! Please feel free to submit a Pull Request.

    Contact

    For questions, issues, or contributions, please contact Philip Van de Walker:

  • Email: philip.vandewalker@gmail.com
  • GitHub: https://github.com/trafflux
  • Related MCP Servers

    AI Research Assistant

    AI Research Assistant

    hamid-vakilzadeh

    AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar

    Web & Search
    12 8
    Linkup

    Linkup

    LinkupPlatform

    Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages

    Web & Search
    2 24
    Math-MCP

    Math-MCP

    EthanHenrickson

    Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support

    Developer Tools
    22 81