Price Per TokenPrice Per Token

Mineru Document Parsing Server

by demomagic

GitHub 1Remote
0

About

Mineru Document Parsing Server integrates with the Mineru API to extract structured content from documents with support for OCR, formula recognition, and table detection. Key features: - Single and batch document processing via URL or local file upload - Multi-format support including PDF, DOC/DOCX, PPT/PPTX, and image files (PNG, JPG, JPEG) - OCR text recognition for scanned documents and images - Mathematical formula recognition and table structure extraction - Multi-language document processing including Chinese and English - Real-time task status monitoring for tracking parsing progress - Optional export to additional formats including DOCX, HTML, and LaTeX

README

Mineru MCP Server

A Model Context Protocol (MCP) document parsing server that integrates with Mineru API to provide powerful document parsing capabilities.

Features

  • Single File Parsing: Create document parsing tasks via URL
  • Batch File Parsing: Support multiple file batch upload and parsing
  • Task Status Monitoring: Real-time query of parsing progress and results
  • Multi-format Support: Support PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
  • OCR Functionality: Optional OCR text recognition
  • Formula Recognition: Support mathematical formula recognition
  • Table Recognition: Support table structure recognition
  • Multi-language Support: Support Chinese, English and other languages
  • Installation

    npm install
    

    Configuration

    Before using, you need to configure the Mineru API key:

    const config = {
      mineruApiKey: "your-mineru-api-bearer-token", // Mineru API Bearer token
      mineruBaseUrl: "https://mineru.net/api/v4" // Mineru API base URL
    };
    

    Available Tools

    1. create_parsing_task

    Create a document parsing task for a single file

    Parameters:

  • url (required): File URL
  • is_ocr (optional): Enable OCR, default false
  • enable_formula (optional): Enable formula recognition, default true
  • enable_table (optional): Enable table recognition, default true
  • language (optional): Document language, default "ch"
  • page_ranges (optional): Page ranges, e.g., "1-10,15-20"
  • model_version (optional): Model version, "v1" or "v2"
  • extra_formats (optional): Additional export formats, ["docx", "html", "latex"]
  • 2. get_task_status

    Query parsing task status

    Parameters:

  • task_id (required): Task ID
  • 3. create_batch_parsing_task

    Create a batch file upload parsing task (for local file uploads)

    Parameters:

  • files (required): File array, each file contains name, is_ocr, page_ranges and other properties
  • enable_formula (optional): Enable formula recognition
  • enable_table (optional): Enable table recognition
  • language (optional): Document language
  • model_version (optional): Model version
  • extra_formats (optional): Additional export formats
  • 4. create_batch_url_parsing_task

    Create a batch URL parsing task (for remote file URLs)

    Parameters:

  • files (required): File array, each file contains url, is_ocr, page_ranges and other properties
  • enable_formula (optional): Enable formula recognition
  • enable_table (optional): Enable table recognition
  • language (optional): Document language
  • model_version (optional): Model version
  • extra_formats (optional): Additional export formats
  • 5. get_batch_task_results

    Query batch parsing task results (supports both URL batch parsing and local upload batch parsing)

    Parameters:

  • batch_id (required): Batch task ID (from create_batch_url_parsing_task or create_batch_parsing_task)
  • Usage Examples

    Single File Parsing

    // Create parsing task
    const taskResult = await create_parsing_task({
      url: "https://example.com/document.pdf",
      is_ocr: true,
      enable_formula: true,
      language: "en"
    });

    // Query task status const status = await get_task_status({ task_id: taskResult.task_id });

    Batch File Upload Parsing

    // Create batch upload task
    const batchResult = await create_batch_parsing_task({
      files: [
        { name: "document1.pdf", is_ocr: true },
        { name: "document2.docx" }
      ],
      enable_formula: true,
      language: "ch"
    });

    // Query batch task results (applicable to both batch parsing methods) const batchStatus = await get_batch_task_results({ batch_id: batchResult.batch_id });

    Batch URL Parsing

    // Create batch URL parsing task
    const batchUrlResult = await create_batch_url_parsing_task({
      files: [
        { url: "https://example.com/doc1.pdf", is_ocr: true },
        { url: "https://example.com/doc2.docx" }
      ],
      enable_formula: true,
      language: "en"
    });

    // Query batch task results (applicable to both batch parsing methods) const batchUrlStatus = await get_batch_task_results({ batch_id: batchUrlResult.batch_id });

    Development

    npm run dev
    

    Important Notes

    1. Single file size cannot exceed 200MB, page count cannot exceed 600 pages 2. Each account has 2000 pages of highest priority parsing quota per day 3. Due to network restrictions, foreign URLs like GitHub and AWS may timeout 4. Batch upload file links are valid for 24 hours 5. No need to set Content-Type header when uploading files

    Common Error Codes

    | Error Code | Description | Solution | |------------|-------------|----------| | A0202 | Token error | Check if the Token is correct, or replace with a new Token | | A0211 | Token expired | Replace with a new Token | | -500 | Parameter error | Ensure parameter types and Content-Type are correct | | -10001 | Service exception | Please try again later | | -10002 | Request parameter error | Check request parameter format | | -6000

    Related MCP Servers

    AI Research Assistant

    AI Research Assistant

    hamid-vakilzadeh

    AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar

    Web & Search
    12 8
    Linkup

    Linkup

    LinkupPlatform

    Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages

    Web & Search
    2 24
    Math-MCP

    Math-MCP

    EthanHenrickson

    Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support

    Developer Tools
    22 81