Content
<h1 align="center">Crawl4AI RAG MCP Server</h1>
<p align="center">
<em>Web Crawling and RAG Capabilities for AI Agents and AI Coding Assistants</em>
</p>
A powerful implementation of the [Model Context Protocol (MCP)](https://modelcontextprotocol.io) integrated with [Crawl4AI](https://crawl4ai.com) and [Supabase](https://supabase.com/) for providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.
With this MCP server, you can <b>scrape anything</b> and then <b>use that knowledge anywhere</b> for RAG.
## Overview
This MCP server provides tools that enable AI agents to crawl websites, store content in a vector database (Supabase), and perform RAG over the crawled content.
## Features
- **Smart URL Detection**: Automatically detects and handles different URL types (regular webpages, sitemaps, text files)
- **Recursive Crawling**: Follows internal links to discover content
- **Parallel Processing**: Efficiently crawls multiple pages simultaneously
- **Content Chunking**: Intelligently splits content by headers and size for better processing
- **Vector Search**: Performs RAG over crawled content, optionally filtering by data source for precision
- **Source Retrieval**: Retrieve sources available for filtering to guide the RAG process
## Tools
The server provides four essential web crawling and search tools:
1. **`crawl_single_page`**: Quickly crawl a single web page and store its content in the vector database
2. **`smart_crawl_url`**: Intelligently crawl a full website based on the type of URL provided (sitemap, llms-full.txt, or a regular webpage that needs to be crawled recursively)
3. **`get_available_sources`**: Get a list of all available sources (domains) in the database
4. **`perform_rag_query`**: Search for relevant content using semantic search with optional source filtering
## Prerequisites
- [Docker/Docker Desktop](https://www.docker.com/products/docker-desktop/) if running the MCP server as a container (recommended)
- [Python 3.12+](https://www.python.org/downloads/) if running the MCP server directly through uv
- [Supabase](https://supabase.com/) (database for RAG)
- [OpenAI API key](https://platform.openai.com/api-keys) (for generating embeddings)
## Installation
### Using Docker (Recommended)
1. Clone this repository:
```bash
git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
cd mcp-crawl4ai-rag
```
2. Build the Docker image:
```bash
docker build -t mcp/crawl4ai-rag --build-arg PORT=8051 .
```
3. Create a `.env` file based on the configuration section below
### Using uv directly (no Docker)
1. Clone this repository:
```bash
git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
cd mcp-crawl4ai-rag
```
2. Install uv if you don't have it:
```bash
pip install uv
```
3. Create and activate a virtual environment:
```bash
uv venv
.venv\Scripts\activate
# on Mac/Linux: source .venv/bin/activate
```
4. Install dependencies:
```bash
uv pip install -e .
crawl4ai-setup
```
5. Create a `.env` file based on the configuration section below
### Running Supabase Locally with Docker (optional)
To run Supabase locally using Docker, follow these steps:
1. **Get the Supabase code:**
```bash
git clone --depth 1 https://github.com/supabase/supabase
```
2. **Create your new Supabase project directory:**
```bash
mkdir supabase-project
```
3. **Copy the compose files to your project:**
```bash
cp -rf supabase/docker/* supabase-project
```
4. **Copy the fake environment variables:**
```bash
cp supabase/docker/.env.example supabase-project/.env
```
5. **Switch to your project directory:**
```bash
cd supabase-project
```
6. **Pull the latest images:**
```bash
docker compose pull
```
7. **Start the services (in detached mode):**
```bash
docker compose up -d
```
After starting Supabase locally, ensure you configure your `.env` file in this project with the correct `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` pointing to your local Supabase instance. Typically, for a local setup, these would be:
## Database Setup
Before running the server, you need to set up the database with the pgvector extension:
1. Go to the SQL Editor in your Supabase dashboard (create a new project first if necessary)
2. Create a new query and paste the contents of `crawled_pages.sql`
3. Run the query to create the necessary tables and functions
## Configuration
Create a `.env` file in the project root with the following variables:
```
# MCP Server Configuration
HOST=0.0.0.0
PORT=8051
TRANSPORT=sse
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key
# Supabase Configuration
SUPABASE_URL=your_supabase_project_url
SUPABASE_SERVICE_KEY=your_supabase_service_key
#local supbase config
SUPABASE_URL=your_local_supbase_url
SUPABASE_SERVICE_KEY=yuut_local_supbase_service_key
```
## Running the Server
### Using Docker
```bash
docker run --env-file .env -p 8051:8051 mcp/crawl4ai-rag
```
### Using Python
```bash
uv run src/crawl4ai_mcp.py
```
The server will start and listen on the configured host and port.
## Integration with MCP Clients
### SSE Configuration
Once you have the server running with SSE transport, you can connect to it using this configuration:
```json
{
"mcpServers": {
"crawl4ai-rag": {
"transport": "sse",
"url": "http://localhost:8051/sse"
}
}
}
```
> **Note for Windsurf users**: Use `serverUrl` instead of `url` in your configuration:
> ```json
> {
> "mcpServers": {
> "crawl4ai-rag": {
> "transport": "sse",
> "serverUrl": "http://localhost:8051/sse"
> }
> }
> }
> ```
>
> **Note for Docker users**: Use `host.docker.internal` instead of `localhost` if your client is running in a different container. This will apply if you are using this MCP server within n8n!
### Stdio Configuration
Add this server to your MCP configuration for Claude Desktop, Windsurf, or any other MCP client:
```json
{
"mcpServers": {
"crawl4ai-rag": {
"command": "python",
"args": ["path/to/crawl4ai-mcp/src/crawl4ai_mcp.py"],
"env": {
"TRANSPORT": "stdio",
"OPENAI_API_KEY": "your_openai_api_key",
"SUPABASE_URL": "your_supabase_url",
"SUPABASE_SERVICE_KEY": "your_supabase_service_key"
}
}
}
}
```
### Docker with Stdio Configuration
```json
{
"mcpServers": {
"crawl4ai-rag": {
"command": "docker",
"args": ["run", "--rm", "-i",
"-e", "TRANSPORT",
"-e", "OPENAI_API_KEY",
"-e", "SUPABASE_URL",
"-e", "SUPABASE_SERVICE_KEY",
"mcp/crawl4ai"],
"env": {
"TRANSPORT": "stdio",
"OPENAI_API_KEY": "your_openai_api_key",
"SUPABASE_URL": "your_supabase_url",
"SUPABASE_SERVICE_KEY": "your_supabase_service_key"
}
}
}
}
```
## Building Your Own Server
This implementation provides a foundation for building more complex MCP servers with web crawling capabilities. To build your own:
1. Add your own tools by creating methods with the `@mcp.tool()` decorator
2. Create your own lifespan function to add your own dependencies
3. Modify the `utils.py` file for any helper functions you need
4. Extend the crawling capabilities by adding more specialized crawlers
You Might Also Like
Ollama
Ollama enables easy access to large language models on various platforms.

n8n
n8n is a secure workflow automation platform for technical teams with 400+...
OpenWebUI
Open WebUI is an extensible web interface for customizable applications.

Dify
Dify is a platform for AI workflows, enabling file uploads and self-hosting.

Zed
Zed is a high-performance multiplayer code editor from the creators of Atom.
MarkItDown MCP
markitdown-mcp is a lightweight MCP server for converting various URIs to Markdown.