Content
[](https://mseep.ai/app/weidwonder-crawl4ai-mcp-server)
# Crawl4AI MCP Server
[](https://smithery.ai/server/@weidwonder/crawl4ai-mcp-server)
This is an intelligent information retrieval server based on MCP (Model Context Protocol), providing powerful search capabilities and LLM-optimized web content understanding for AI assistant systems. Through multi-engine search and intelligent content extraction, it helps AI systems efficiently acquire and understand internet information, converting web content into the most suitable format for LLM processing.
## Features
- 🔍 Powerful multi-engine search capability, supporting DuckDuckGo and Google
- 📚 LLM-optimized web content extraction, intelligently filtering non-core content
- 🎯 Focus on information value, automatically identifying and retaining key content
- 📝 Multiple output formats, supporting citation traceability
- 🚀 High-performance asynchronous design based on FastMCP
## Installation
### Method 1: For most installation scenarios
1. Ensure your system meets the following requirements:
- Python >= 3.9
- It is recommended to use a dedicated virtual environment
2. Clone the repository:
```bash
git clone https://github.com/yourusername/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
```
3. Create and activate a virtual environment:
```bash
python -m venv crawl4ai_env
source crawl4ai_env/bin/activate # Linux/Mac
# Or
.\crawl4ai_env\Scripts\activate # Windows
```
4. Install dependencies:
```bash
pip install -r requirements.txt
```
5. Install playwright browser:
```bash
playwright install
```
### Method 2: Install to Claude desktop client via Smithery
Install and automatically configure the Crawl4AI MCP Claude desktop service to your local `Claude Extension Center` via [Smithery](https://smithery.ai/server/@weidwonder/crawl4ai-mcp-server):
```bash
npx -y @smithery/cli install @weidwonder/crawl4ai-mcp-server --client claude
```
## Usage
The server provides the following tools:
### search
Powerful web search tool, supporting multiple search engines:
- DuckDuckGo search (default): No API key required, comprehensively processes AbstractText, Results, and RelatedTopics
- Google search: Requires API key configuration, providing accurate search results
- Supports using multiple engines simultaneously to obtain more comprehensive results
Parameter Description:
- `query`: Search query string
- `num_results`: Number of results to return (default 10)
- `engine`: Search engine selection
- "duckduckgo": DuckDuckGo search (default)
- "google": Google search (requires API key)
- "all": Use all available search engines simultaneously
Example:
```python
# DuckDuckGo search (default)
{
"query": "python programming",
"num_results": 5
}
# Use all available engines
{
"query": "python programming",
"num_results": 5,
"engine": "all"
}
```
### read_url
LLM-optimized web content understanding tool, providing intelligent content extraction and format conversion:
- `markdown_with_citations`: Markdown with inline citations (default), maintaining information traceability
- `fit_markdown`: LLM-optimized concise content, removing redundant information
- `raw_markdown`: Basic HTML→Markdown conversion
- `references_markdown`: Separate references section
- `fit_html`: Generates filtered HTML for fit_markdown
- `markdown`: Default Markdown format
Example:
```python
{
"url": "https://example.com",
"format": "markdown_with_citations"
}
```
Example:
```python
# DuckDuckGo search (default)
{
"query": "python programming",
"num_results": 5
}
# Google search
{
"query": "python programming",
"num_results": 5,
"engine": "google"
}
```
To use Google search, you need to configure the API key in config.json:
```json
{
"google": {
"api_key": "your-api-key",
"cse_id": "your-cse-id"
}
}
```
## LLM Content Optimization
The server employs a series of content optimization strategies specifically for LLMs:
- Intelligent Content Recognition: Automatically identifies and retains the main body of the article and key information paragraphs
- Noise Filtering: Automatically filters out navigation bars, advertisements, footers, and other content that does not help with understanding
- Information Integrity: Retains URL references to support information traceability
- Length Optimization: Uses a minimum word count threshold (10) to filter out invalid fragments
- Format Optimization: Defaults to outputting in markdown_with_citations format for easy LLM understanding and citation
## Development Notes
Project Structure:
```
crawl4ai_mcp_server/
├── src/
│ ├── index.py # Main server implementation
│ └── search.py # Search function implementation
├── config_demo.json # Configuration file example
├── pyproject.toml # Project configuration
├── requirements.txt # Dependency list
└── README.md # Project documentation
```
## Configuration Instructions
1. Copy the configuration example file:
```bash
cp config_demo.json config.json
```
2. To use Google search, configure the API key in config.json:
```json
{
"google": {
"api_key": "your-google-api-key",
"cse_id": "your-google-cse-id"
}
}
```
## Changelog
- 2025.02.08: Added search function, supporting DuckDuckGo (default) and Google search
- 2025.02.07: Refactored project structure, implemented with FastMCP, optimized dependency management
- 2025.02.07: Optimized content filtering configuration, improved token efficiency and maintained URL integrity
## License
MIT License
## Contribution
Welcome to submit Issues and Pull Requests!
## Author
- Owner: weidwonder
- Coder: Claude Sonnet 3.5
- 100% Code wrote by Claude. Cost: $9 ($2 for code writing, $7 cost for Debuging😭)
- 3 hours time cost. 0.5 hours for code writing, 0.5 hours for env preparing, 2 hours for debuging.😭
## Acknowledgements
Thanks to all the developers who contributed to the project!
Special thanks to:
- [Crawl4ai](https://github.com/crawl4ai/crawl4ai) project for providing excellent web content extraction technical support
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.