Content

<h1 align="center">Crawl4AI RAG MCP Server</h1> <p align="center"> <em>Web Crawling and RAG Capabilities for AI Agents and AI Coding Assistants</em> </p> A powerful implementation of the [Model Context Protocol (MCP)](https://modelcontextprotocol.io) integrated with [Crawl4AI](https://crawl4ai.com) and [Supabase](https://supabase.com/) for providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities. With this MCP server, you can <b>scrape anything</b> and then <b>use that knowledge anywhere</b> for RAG. ## Overview This MCP server provides tools that enable AI agents to crawl websites, store content in a vector database (Supabase), and perform RAG over the crawled content. ## Features - **Smart URL Detection**: Automatically detects and handles different URL types (regular webpages, sitemaps, text files) - **Recursive Crawling**: Follows internal links to discover content - **Parallel Processing**: Efficiently crawls multiple pages simultaneously - **Content Chunking**: Intelligently splits content by headers and size for better processing - **Vector Search**: Performs RAG over crawled content, optionally filtering by data source for precision - **Source Retrieval**: Retrieve sources available for filtering to guide the RAG process ## Tools The server provides four essential web crawling and search tools: 1. **`crawl_single_page`**: Quickly crawl a single web page and store its content in the vector database 2. **`smart_crawl_url`**: Intelligently crawl a full website based on the type of URL provided (sitemap, llms-full.txt, or a regular webpage that needs to be crawled recursively) 3. **`get_available_sources`**: Get a list of all available sources (domains) in the database 4. **`perform_rag_query`**: Search for relevant content using semantic search with optional source filtering ## Prerequisites - [Docker/Docker Desktop](https://www.docker.com/products/docker-desktop/) if running the MCP server as a container (recommended) - [Python 3.12+](https://www.python.org/downloads/) if running the MCP server directly through uv - [Supabase](https://supabase.com/) (database for RAG) - [OpenAI API key](https://platform.openai.com/api-keys) (for generating embeddings) ## Installation ### Using Docker (Recommended) 1. Clone this repository: ```bash git clone https://github.com/coleam00/mcp-crawl4ai-rag.git cd mcp-crawl4ai-rag ``` 2. Build the Docker image: ```bash docker build -t mcp/crawl4ai-rag --build-arg PORT=8051 . ``` 3. Create a `.env` file based on the configuration section below ### Using uv directly (no Docker) 1. Clone this repository: ```bash git clone https://github.com/coleam00/mcp-crawl4ai-rag.git cd mcp-crawl4ai-rag ``` 2. Install uv if you don't have it: ```bash pip install uv ``` 3. Create and activate a virtual environment: ```bash uv venv .venv\Scripts\activate # on Mac/Linux: source .venv/bin/activate ``` 4. Install dependencies: ```bash uv pip install -e . crawl4ai-setup ``` 5. Create a `.env` file based on the configuration section below ### Running Supabase Locally with Docker (optional) To run Supabase locally using Docker, follow these steps: 1. **Get the Supabase code:** ```bash git clone --depth 1 https://github.com/supabase/supabase ``` 2. **Create your new Supabase project directory:** ```bash mkdir supabase-project ``` 3. **Copy the compose files to your project:** ```bash cp -rf supabase/docker/* supabase-project ``` 4. **Copy the fake environment variables:** ```bash cp supabase/docker/.env.example supabase-project/.env ``` 5. **Switch to your project directory:** ```bash cd supabase-project ``` 6. **Pull the latest images:** ```bash docker compose pull ``` 7. **Start the services (in detached mode):** ```bash docker compose up -d ``` After starting Supabase locally, ensure you configure your `.env` file in this project with the correct `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` pointing to your local Supabase instance. Typically, for a local setup, these would be: ## Database Setup Before running the server, you need to set up the database with the pgvector extension: 1. Go to the SQL Editor in your Supabase dashboard (create a new project first if necessary) 2. Create a new query and paste the contents of `crawled_pages.sql` 3. Run the query to create the necessary tables and functions ## Configuration Create a `.env` file in the project root with the following variables: ``` # MCP Server Configuration HOST=0.0.0.0 PORT=8051 TRANSPORT=sse # OpenAI API Configuration OPENAI_API_KEY=your_openai_api_key # Supabase Configuration SUPABASE_URL=your_supabase_project_url SUPABASE_SERVICE_KEY=your_supabase_service_key #local supbase config SUPABASE_URL=your_local_supbase_url SUPABASE_SERVICE_KEY=yuut_local_supbase_service_key ``` ## Running the Server ### Using Docker ```bash docker run --env-file .env -p 8051:8051 mcp/crawl4ai-rag ``` ### Using Python ```bash uv run src/crawl4ai_mcp.py ``` The server will start and listen on the configured host and port. ## Integration with MCP Clients ### SSE Configuration Once you have the server running with SSE transport, you can connect to it using this configuration: ```json { "mcpServers": { "crawl4ai-rag": { "transport": "sse", "url": "http://localhost:8051/sse" } } } ``` > **Note for Windsurf users**: Use `serverUrl` instead of `url` in your configuration: > ```json > { > "mcpServers": { > "crawl4ai-rag": { > "transport": "sse", > "serverUrl": "http://localhost:8051/sse" > } > } > } > ``` > > **Note for Docker users**: Use `host.docker.internal` instead of `localhost` if your client is running in a different container. This will apply if you are using this MCP server within n8n! ### Stdio Configuration Add this server to your MCP configuration for Claude Desktop, Windsurf, or any other MCP client: ```json { "mcpServers": { "crawl4ai-rag": { "command": "python", "args": ["path/to/crawl4ai-mcp/src/crawl4ai_mcp.py"], "env": { "TRANSPORT": "stdio", "OPENAI_API_KEY": "your_openai_api_key", "SUPABASE_URL": "your_supabase_url", "SUPABASE_SERVICE_KEY": "your_supabase_service_key" } } } } ``` ### Docker with Stdio Configuration ```json { "mcpServers": { "crawl4ai-rag": { "command": "docker", "args": ["run", "--rm", "-i", "-e", "TRANSPORT", "-e", "OPENAI_API_KEY", "-e", "SUPABASE_URL", "-e", "SUPABASE_SERVICE_KEY", "mcp/crawl4ai"], "env": { "TRANSPORT": "stdio", "OPENAI_API_KEY": "your_openai_api_key", "SUPABASE_URL": "your_supabase_url", "SUPABASE_SERVICE_KEY": "your_supabase_service_key" } } } } ``` ## Building Your Own Server This implementation provides a foundation for building more complex MCP servers with web crawling capabilities. To build your own: 1. Add your own tools by creating methods with the `@mcp.tool()` decorator 2. Create your own lifespan function to add your own dependencies 3. Modify the `utils.py` file for any helper functions you need 4. Extend the crawling capabilities by adding more specialized crawlers

Crawl4AI-RAG-MCP-Server

Content

You Might Also Like

Ollama

n8n

OpenWebUI

Dify

Zed

MarkItDown MCP

Crawl4AI-RAG-MCP-Server

Scan with WeChat to Share

Content

You Might Also Like

Ollama

n8n

OpenWebUI

Dify

Zed

MarkItDown MCP