About
ShadowCrawl is a self-hosted web scraping and federated search engine designed for AI agent workflows. Built in Rust, it provides a privacy-focused alternative to services like Firecrawl, Jina, and Tavily. Key features: - Federated web search across multiple sources with real-time results - Advanced anti-bot bypass capabilities using Browserless stealth, Playwright rendering, and proxy rotation to handle Cloudflare, DataDome, Akamai, and PerimeterX protections - Human-in-the-Loop (HITL) mode that launches a native Brave Browser instance for manual CAPTCHA solving and authentication when automated methods fail - Semantic research memory that recalls and builds upon prior search sessions - Structured data extraction using schema-driven approaches from messy or JavaScript-heavy websites - Batch URL scraping and bounded recursive website crawling for large-scale data collection - Clean Markdown output with noise reduction to minimize LLM token costs - MCP-native integration supporting both stdio and HTTP transports with Cursor, Claude Desktop, and IDEs - 100% self-hosted via Docker with no API keys or third-party data tracking
README
🥷 ShadowCrawl MCP
Bypass Anything. Scrape Everything. The 99.99% Success Rate Stealth Engine for AI Agents The Sovereign, Self-Hosted Alternative to Firecrawl, Jina, and Tavily.
---
ShadowCrawl is not just a scraper—it's a Cyborg Intelligence Layer. While other APIs fail against Cloudflare, Akamai, and PerimetterX, ShadowCrawl leverages a unique Human-AI Collaboration model to achieve a near-perfect bypass rate on even the most guarded "Boss Level" sites (LinkedIn, Airbnb, Ticketmaster).
🚀 Why ShadowCrawl?
---
💎 The "Nuclear Option": Stealth Scrape (HITL)
Most scrapers try to "act" like a human and fail. ShadowCrawl uses a human when it matters.
stealth_scrape is our flagship tool for high-fidelity rendering. It launches a visible, native Brave Browser instance on your machine.
navigator.webdriver, etc.) before extraction.---
💥 Shattering the "Unscrapable" (Anti-Bot Bypass)
Most scraping APIs surrender when facing enterprise-grade shields. ShadowCrawl is the Hammer that breaks through. We successfully bypass and extract data from:
The Secret? The Cyborg Approach (HITL). ShadowCrawl doesn't just "imitate" a human—it bridges your real, native Brave/Chrome session into the agent's workflow. If a human can see it, ShadowCrawl can scrape it.
---
📂 Verified Evidence (Boss-Level Targets)
We don't just claim to bypass—we provide the receipts. All evidence below was captured using stealth_scrape (feature flag: non_robot_search) with the Safety Kill Switch enabled (2026-02-14).
| Target Site | Protection | Evidence Size | Data Extracted | Status | |-------------|-----------|---------------|----------------|--------| | LinkedIn | Cloudflare + Auth | 413KB | 📄 JSON · 📝 Snippet | 60+ job IDs, listings ✅ | | Ticketmaster | Cloudflare Turnstile | 1.1MB | 📄 JSON · 📝 Snippet | Tour dates, venues ✅ | | Airbnb | DataDome | 1.8MB | 📄 JSON · 📝 Snippet | 1000+ Tokyo listings ✅ | | Upwork | reCAPTCHA | 300KB | 📄 JSON · 📝 Snippet | 160K+ job postings ✅ | | Amazon | AWS Shield | 814KB | 📄 JSON · 📝 Snippet | RTX 5070 Ti results ✅ | | nowsecure.nl | Cloudflare | 168KB | 📄 JSON · 📸 Screenshot | Manual button tested ✅ |
> 📖 Full Documentation: See proof/README.md for verification steps, protection analysis, and quality metrics.
---
🛠 Features at a Glance
| Feature | Description | | --- | --- | | Search & Discovery | Federated search via SearXNG. Finds what Google hides. | | Deep Crawling | Recursive, bounded crawling to map entire subdomains. | | Semantic Memory | (Optional) Qdrant integration for long-term research recall. | | Proxy Master | Native rotation logic for HTTP/SOCKS5 pools. | | Hydration Scraper | Specialized logic to extract "hidden" JSON data from React/Next.js sites. | | Universal Janitor | Automatic removal of popups, cookie banners, and overlays. |
---
🏆 Comparison
| Feature | Firecrawl / Jina | ShadowCrawl | | --- | --- | --- | | Cost | Monthly Subscription | $0 (Self-hosted) | | Privacy | They see your data | 100% Private | | LinkedIn/Airbnb | Often Blocked | 99.99% Success (via HITL) | | **J
Related MCP Servers
AI Research Assistant
hamid-vakilzadeh
AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar
Linkup
LinkupPlatform
Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages
Math-MCP
EthanHenrickson
Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support