Content

[![MseeP.ai Security Assessment Badge](https://mseep.net/pr/weidwonder-crawl4ai-mcp-server-badge.png)](https://mseep.ai/app/weidwonder-crawl4ai-mcp-server) # Crawl4AI MCP Server [![smithery badge](https://smithery.ai/badge/@weidwonder/crawl4ai-mcp-server)](https://smithery.ai/server/@weidwonder/crawl4ai-mcp-server) This is an intelligent information retrieval server based on MCP (Model Context Protocol), providing powerful search capabilities and LLM-optimized web content understanding for AI assistant systems. Through multi-engine search and intelligent content extraction, it helps AI systems efficiently acquire and understand internet information, converting web content into the most suitable format for LLM processing. ## Features - 🔍 Powerful multi-engine search capability, supporting DuckDuckGo and Google - 📚 LLM-optimized web content extraction, intelligently filtering non-core content - 🎯 Focus on information value, automatically identifying and retaining key content - 📝 Multiple output formats, supporting citation traceability - 🚀 High-performance asynchronous design based on FastMCP ## Installation ### Method 1: For most installation scenarios 1. Ensure your system meets the following requirements: - Python >= 3.9 - It is recommended to use a dedicated virtual environment 2. Clone the repository: ```bash git clone https://github.com/yourusername/crawl4ai-mcp-server.git cd crawl4ai-mcp-server ``` 3. Create and activate a virtual environment: ```bash python -m venv crawl4ai_env source crawl4ai_env/bin/activate # Linux/Mac # Or .\crawl4ai_env\Scripts\activate # Windows ``` 4. Install dependencies: ```bash pip install -r requirements.txt ``` 5. Install playwright browser: ```bash playwright install ``` ### Method 2: Install to Claude desktop client via Smithery Install and automatically configure the Crawl4AI MCP Claude desktop service to your local `Claude Extension Center` via [Smithery](https://smithery.ai/server/@weidwonder/crawl4ai-mcp-server): ```bash npx -y @smithery/cli install @weidwonder/crawl4ai-mcp-server --client claude ``` ## Usage The server provides the following tools: ### search Powerful web search tool, supporting multiple search engines: - DuckDuckGo search (default): No API key required, comprehensively processes AbstractText, Results, and RelatedTopics - Google search: Requires API key configuration, providing accurate search results - Supports using multiple engines simultaneously to obtain more comprehensive results Parameter Description: - `query`: Search query string - `num_results`: Number of results to return (default 10) - `engine`: Search engine selection - "duckduckgo": DuckDuckGo search (default) - "google": Google search (requires API key) - "all": Use all available search engines simultaneously Example: ```python # DuckDuckGo search (default) { "query": "python programming", "num_results": 5 } # Use all available engines { "query": "python programming", "num_results": 5, "engine": "all" } ``` ### read_url LLM-optimized web content understanding tool, providing intelligent content extraction and format conversion: - `markdown_with_citations`: Markdown with inline citations (default), maintaining information traceability - `fit_markdown`: LLM-optimized concise content, removing redundant information - `raw_markdown`: Basic HTML→Markdown conversion - `references_markdown`: Separate references section - `fit_html`: Generates filtered HTML for fit_markdown - `markdown`: Default Markdown format Example: ```python { "url": "https://example.com", "format": "markdown_with_citations" } ``` Example: ```python # DuckDuckGo search (default) { "query": "python programming", "num_results": 5 } # Google search { "query": "python programming", "num_results": 5, "engine": "google" } ``` To use Google search, you need to configure the API key in config.json: ```json { "google": { "api_key": "your-api-key", "cse_id": "your-cse-id" } } ``` ## LLM Content Optimization The server employs a series of content optimization strategies specifically for LLMs: - Intelligent Content Recognition: Automatically identifies and retains the main body of the article and key information paragraphs - Noise Filtering: Automatically filters out navigation bars, advertisements, footers, and other content that does not help with understanding - Information Integrity: Retains URL references to support information traceability - Length Optimization: Uses a minimum word count threshold (10) to filter out invalid fragments - Format Optimization: Defaults to outputting in markdown_with_citations format for easy LLM understanding and citation ## Development Notes Project Structure: ``` crawl4ai_mcp_server/ ├── src/ │ ├── index.py # Main server implementation │ └── search.py # Search function implementation ├── config_demo.json # Configuration file example ├── pyproject.toml # Project configuration ├── requirements.txt # Dependency list └── README.md # Project documentation ``` ## Configuration Instructions 1. Copy the configuration example file: ```bash cp config_demo.json config.json ``` 2. To use Google search, configure the API key in config.json: ```json { "google": { "api_key": "your-google-api-key", "cse_id": "your-google-cse-id" } } ``` ## Changelog - 2025.02.08: Added search function, supporting DuckDuckGo (default) and Google search - 2025.02.07: Refactored project structure, implemented with FastMCP, optimized dependency management - 2025.02.07: Optimized content filtering configuration, improved token efficiency and maintained URL integrity ## License MIT License ## Contribution Welcome to submit Issues and Pull Requests! ## Author - Owner: weidwonder - Coder: Claude Sonnet 3.5 - 100% Code wrote by Claude. Cost: $9 ($2 for code writing, $7 cost for Debuging😭) - 3 hours time cost. 0.5 hours for code writing, 0.5 hours for env preparing, 2 hours for debuging.😭 ## Acknowledgements Thanks to all the developers who contributed to the project! Special thanks to: - [Crawl4ai](https://github.com/crawl4ai/crawl4ai) project for providing excellent web content extraction technical support

crawl4ai-mcp-server

Content

Connection Info

You Might Also Like

markitdown

servers

Time

Filesystem

Sequential Thinking

git

crawl4ai-mcp-server

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

markitdown

servers

Time

Filesystem

Sequential Thinking

git