Price Per TokenPrice Per Token
mcp-crew-risk

mcp-crew-risk

by deeppath-ai

GitHub 3 246 uses Remote
0

About

mcp-crew-risk is an automated crawler compliance risk assessment framework that evaluates websites for crawler-friendliness across legal, ethical, and technical dimensions. It analyzes target webpages to help developers avoid legal disputes, ethical concerns, and technical obstacles when planning web scraping strategies. Key features of mcp-crew-risk: - Legal risk detection including Terms of Service restrictions, copyright declarations, and sensitive personal data (emails, phone numbers, ID numbers) - Social and ethical compliance checks for robots.txt rules, anti-crawling technologies (Cloudflare JS Challenge), and privacy protection measures - Technical risk assessment covering redirects, CAPTCHAs, JavaScript rendering obstacles, and API path exposure - Multi-level risk ratings (allowed, partial, blocked) with specific recommendations for crawler strategy planning

README

mcp-crew-risk

A Crawler Risk Assessor based on the Model Context Protocol (MCP). This server provides a simple API interface that allows users to perform a comprehensive crawler compliance risk assessment for a specified webpage.

Crawler Compliance Risk Assessment Framework Description

This framework aims to provide crawler developers and operators with a comprehensive automated compliance detection toolset to evaluate the crawler-friendliness and potential risks of target websites. It covers three major dimensions: legal, social ethics, and technical aspects. Through multi-level risk warnings and specific recommendations, it helps plan crawler strategies reasonably to avoid legal disputes and negative social impacts while improving technical stability and efficiency.

---

Framework Structure

1. Legal Risk

#### Detection Content

  • Whether there are explicit Terms of Service restricting crawler activities
  • Whether the website declares copyright information and whether content is copyright protected
  • Whether pages contain sensitive personal data (e.g., emails, phone numbers, ID numbers)
  • #### Risk Significance Violating terms may lead to breach of contract, infringement, or criminal liability; scraping sensitive data may violate privacy regulations such as GDPR, CCPA, etc.

    #### Detection Examples

  • Detect ` tags and key keywords in page content
  • Regex matching for emails, phone numbers
  • ---

    2. Social/Ethical Risk

    #### Detection Content

  • Whether robots.txt disallows crawler access to specific paths
  • Anti-crawling technologies deployed by the site (e.g., Cloudflare JS Challenge)
  • Risks of collecting user privacy or sensitive information
  • #### Risk Significance Excessive crawling may harm user experience and trust; collecting private data has ethical risks and social responsibility implications.

    #### Detection Examples

  • Accessing and parsing robots.txt
  • Detecting anti-crawling mechanisms and JS challenges
  • Sensitive information extraction warnings
  • ---

    3. Technical Risk

    #### Detection Content

  • Whether redirects, CAPTCHAs, JS rendering obstacles are encountered during access
  • Whether robots.txt can be successfully accessed to get crawler rules
  • Exposure of target API paths, possible permissions or rate limiting restrictions
  • #### Risk Significance Technical risks may cause crawler failure, IP bans, or incomplete data, affecting business stability.

    #### Detection Examples

  • HTTP status code and response header analysis
  • Anti-crawling technology detection
  • API path scanning
  • ---

    Rating System

  • allowed: No obvious restrictions or risks, generally safe to crawl
  • partial: Some restrictions (e.g., robots.txt disallows some paths, anti-crawling measures), requires cautious operation
  • blocked: Severe restrictions or high risk (e.g., heavy JS anti-crawling challenges, sensitive data protection), crawling is not recommended
  • ---

    Recommendations

    | Risk Dimension | Summary Recommendations | | -------------------- | --------------------------------------------------------------------------------------- | | Legal Risk | Carefully read and comply with the target site's Terms of Service; avoid scraping sensitive or personal data; consult legal counsel if necessary. | | Social/Ethical Risk | Control crawl frequency; avoid impacting server performance and user experience; be transparent about data sources and usage. | | Technical Risk | Use appropriate crawler frameworks and strategies; support dynamic rendering and anti-crawling bypass; handle exceptions and monitor access health in real-time. |

    ---

    Implementation Process

    1. Pre-crawl Assessment Run compliance assessment on the target site to confirm risk levels and restrictions.

    2. Compliance Strategy Formulation Adjust crawler access frequency and content scope according to assessment results to avoid breaches or violations.

    3. Crawler Execution and Monitoring Continuously monitor technical exceptions and risk changes during crawling; regularly reassess.

    4. Data Processing and Protection Ensure crawled data complies with privacy protection requirements and perform necessary anonymization.

    ---

    Technical Implementation Overview

  • Use Axios + node-fetch for HTTP requests, supporting timeout and redirect control.
  • Parse robots.txt and page meta` tags to automatically identify crawler rules.
  • Use regex to detect privacy-sensitive information (emails, phones, ID numbers, etc.).
  • Detect anti-crawling tech (e.g., Cloudflare JS Challenge) and exposed API endpoints.
  • Provide legal, social, and technical risk warnings and comprehensive suggestions via risk evaluation functions.
  • ---

    Future Extensions

  • Integrate Puppeteer/Playwright for JavaScript-rendered page detection.
  • Automatically parse and notify on Terms of Service t
  • Related MCP Servers

    AI Research Assistant

    AI Research Assistant

    hamid-vakilzadeh

    AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar

    Web & Search
    12 8
    Linkup

    Linkup

    LinkupPlatform

    Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages

    Web & Search
    2 24
    Math-MCP

    Math-MCP

    EthanHenrickson

    Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support

    Developer Tools
    22 81