Content
This is a sample for embedding and vectorizing markdown documents so that they can be explained from MCP via RAG.
Vectorization uses [Plamo-Embedding-1B](https://tech.preferred.jp/ja/blog/plamo-embedding-1b/).
## Features
- Text extraction and vectorization from markdown files
- Vector search using DuckDB
- Persistence of vector data via Parquet files
- Vector search from MCP
## Usage
### Generating Vector Data
First, place the markdown files you want to search in a specific directory and convert them to Parquet files using the following command.
```bash
uv run main.py --directory ~/path/to/markdown/files --parquet vectors.parquet
```
### MCP Configuration
#### Build
The following command generates a single binary at `dist/server`.
```
uv run pyinstaller --clean --strip --noconfirm --onefile server.py
```
#### MCP Client Configuration
Configure according to the client you want to use.
For Claude Desktop, it looks like this:
Please specify the file you converted earlier for VECTOR_PARQUET.
```bash
uv run mcp install server.py -v VECTOR_PARQUET=/path/to/vectors.parquet
```
It will be configured as follows:
```JSON:~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"DuckDB-RAG-MCP-Sample": {
"command": "/path/to/dist/server",
"env": {
"VECTOR_PARQUET": "/path/to/vectors.parquet"
}
}
}
}
```
### Starting Development Server
```bash
uv run mcp dev server.py
```
## License
DuckDB RAG MCP Sample is provided under the Apache License, Version 2.0.
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
markitdown
Python tool for converting files and office documents to Markdown.
firecrawl
Firecrawl MCP Server enables web scraping, crawling, and content extraction.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.