PageIndex MCP

VectifyAI

302

Favorite:

PageIndex MCP is a reasoning-based RAG system using hierarchical tree structures.

Content

<div align="center"> <a href="https://pageindex.ai/mcp"> <img src="https://docs.pageindex.ai/images/general/mcp_banner.jpg"> </a> </div> # PageIndex MCP > If you find this repo useful, please also star our **[main PageIndex repo](https://github.com/VectifyAI/PageIndex)** ⭐ [![PageIndex GitHub](https://img.shields.io/badge/PageIndex_GitHub-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/VectifyAI/PageIndex)  [![PageIndex MCP Home](https://img.shields.io/badge/PageIndex_MCP-4280d3?style=for-the-badge&logo=readthedocs&logoColor=white)](https://pageindex.ai/mcp)  [![PageIndex Home](https://img.shields.io/badge/PageIndex-3B82F6?style=for-the-badge&logo=homeadvisor&logoColor=white)](https://vectify.ai/pageindex) 📘 [**PageIndex**](https://github.com/VectifyAI/PageIndex) is a vectorless, reasoning-based RAG system that represents documents as hierarchical **tree structures**. It enables LLMs to navigate and retrieve information through structure and **reasoning**, not vector similarity — much like a human would retrieve information using a book's index. 🔌 [**PageIndex MCP**](https://pageindex.ai/mcp) exposes this **LLM-native, in-context tree index** directly to LLMs via MCP, allowing platforms like **Claude**, **Cursor**, and other MCP-compatible agents or LLMs to reason over document structure and retrieve the right information — without vector databases. Want to chat with long PDFs but hit context limit reached errors? Add your file to PageIndex to seamlessly chat with long PDFs on any agent/LLM platforms. ✨ Chat to long PDFs the **human-like, reasoning-based way** ✨ - Support local and online PDFs - Free 1000 pages - Unlimited conversations For more information, visit the [PageIndex MCP](https://pageindex.ai/mcp) page. 💡 Looking for a fully hosted experience? Try [**PageIndex Chat**](https://chat.pageindex.ai) 🤖: a human-like document analyst that lets you chat with long PDFs using the same agentic, reasoning-based workflow as PageIndex MCP. <p align="center"> <a href="https://pageindex.ai/mcp"> <img src="https://github.com/user-attachments/assets/d807d506-131d-4c7b-837c-96ab1adb2271"> </a> </p> # What is PageIndex? <div align="center"> <a href="https://pageindex.ai/mcp"> <img src="https://docs.pageindex.ai/images/cookbook/vectorless-rag.png" width="70%"> </a> </div> PageIndex is a vectorless, **reasoning-based RAG** system that generates hierarchical **tree structures** of documents and uses multi-step **reasoning** and tree search to retrieve information like a human expert would. It has the following key properties: - **Higher Accuracy**: Relevance beyond similarity - **Better Transparency**: Clear reasoning trajectory with traceable search paths - **Like A Human**: Retrieve information like a human expert navigates documents - **No Vector DB**: No extra infrastructure overhead - **No Chunking**: Preserve full document context and structure - **No Top-K**: Retrieve all relevant passages automatically --- # PageIndex MCP Setup ### For Developers Connect PageIndex to your agent framework or AI SDK via MCP. Works with [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python), [Vercel AI SDK](https://ai-sdk.dev/docs/ai-sdk-core/mcp-tools), [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/mcp/), [LangChain](https://github.com/langchain-ai/langchain-mcp-adapters), and any MCP-compatible client. Simple API Key authentication — no OAuth flow required. 1. Go to [PageIndex Dashboard](https://dash.pageindex.ai/api-keys) to create an API Key 2. Copy the generated key 3. Add to your MCP configuration: ```json { "mcpServers": { "pageindex": { "type": "http", "url": "https://api.pageindex.ai/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` For more details, visit the [PageIndex API Dashboard](https://dash.pageindex.ai). ### For PageIndex Chat Users If you already have a [PageIndex Chat](https://chat.pageindex.ai) account, you can connect your MCP client directly via OAuth. **Claude Desktop — One-Click Install:** Download the `.mcpb` file from [Releases](https://github.com/VectifyAI/pageindex-mcp/releases) and double-click to install. OAuth authentication is handled automatically. **Other MCP Clients:** ```json { "mcpServers": { "pageindex": { "type": "http", "url": "https://chat.pageindex.ai/mcp" } } } ``` **Local MCP Server (with local PDF upload):** If you need to upload local PDF files, you can run the local MCP server (requires Node.js ≥18.0.0): ```json { "mcpServers": { "pageindex": { "command": "npx", "args": ["-y", "@pageindex/mcp"] } } } ``` For more details, visit [PageIndex Chat](https://chat.pageindex.ai). # Related Links [![PageIndex Home](https://img.shields.io/badge/PageIndex_Home-3B82F6?style=for-the-badge&logo=homeadvisor&logoColor=white)](https://vectify.ai/pageindex)   [![PageIndex GitHub](https://img.shields.io/badge/PageIndex_GitHub-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/VectifyAI/PageIndex) ## License This project is licensed under the terms of the MIT open source license. Please refer to [MIT](./LICENSE) for the full terms.

PageIndex MCP 8

PageIndex MCP is a reasoning-based RAG system using hierarchical tree structures.

process_document

Upload and process PDF documents from public URLs. Entry point for document workflow. Performs document processing and intelligent content analysis. Returns document name for subsequent operations. Supports files up to 100MB. After processing, use get_document() to check status before accessing content.

Parameters (2)

url string Required

Direct URL to PDF file

folder_id ['string', 'null'] Optional

Target folder ID. Use null for root folder, omit to use X-Folder-Scope header default.

recent_documents

List your recent document uploads with processing status. Returns up to 5 most recent documents with status and actionable suggestions. Use this to check which documents are ready for analysis, or when a user asks 'what documents do I have' or 'show my documents'.

No parameters required

find_relevant_documents

DOCUMENT DISCOVERY: Find documents in your collection by filtering on document NAMES and DESCRIPTIONS (metadata only, NOT content). This searches the document list, not document content. Returns up to 20 documents per page with cursor-based pagination. Use when: User asks 'What documents do I have about X?' or hasn't specified which document to analyze. Don't use when: Already working with a specific document - use get_page_content() instead. Note: To search WITHIN a document's content, use get_document_structure() to locate sections, then get_page_content() to extract them.

Parameters (4)

name_or_description_filter string Optional

Filter documents by their name or description (metadata only, NOT content). Example: 'climate' finds 'Climate Report 2023.pdf'. This does NOT search document content.

folder_id ['string', 'null'] Optional

Filter by folder ID. Use null for root folder files, omit for all files.

cursor string Optional

Pagination cursor for fetching next page

limit number Optional

Number of documents to return per page (1-20, default 10)

get_document

Get detailed information about a specific document by name. Requires: doc_name (string). Optional: wait_for_completion (boolean) to automatically wait up to 3 minutes if document is still processing. Returns document status, metadata, and intelligent next-step suggestions based on processing state and document size. Use this to check document readiness before get_document_structure() or get_page_content().

Parameters (2)

doc_name string Required

Document name from recent_documents()

wait_for_completion boolean Optional

If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.

get_document_structure

Extract the hierarchical structure of a completed document. Optional: wait_for_completion (boolean) to wait up to 3 minutes if still processing. Returns structured outline with headers, sections, and page references. REQUIRED for documents over 20 pages - use this first to understand layout and identify relevant sections before extracting content. For targeted questions, use structure to locate relevant pages, then extract with get_page_content().

Parameters (3)

doc_name string Required

Document name from recent_documents()

part integer Optional

Part number for pagination (1-based)

wait_for_completion boolean Optional

If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.

get_page_content

Extract specific page content from processed documents. Flexible page selection: single page ('5'), ranges ('3-7'), or multiple pages ('1,5,10'). Optional: wait_for_completion (boolean) to wait up to 3 minutes if still processing. Returns structured text content with image paths when available. Best practice: Use get_document_structure() first to identify relevant sections, then extract only necessary pages to optimize performance. Avoid extracting entire documents at once.

Parameters (3)

doc_name string Required

Document name from recent_documents()

pages string Required

Page specification: "5", "3,7,10", "5-10", or "1-3,7,9-12"

wait_for_completion boolean Optional

If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.

get_document_image

Retrieve an embedded image from a document. Requires image_path from get_page_content(), format: <docName>/<imagePath> (e.g. MyDoc.pdf/figures/fig1.png). Returns image_base64 and content_type for rendering the image directly.

Parameters (1)

image_path string Required

Image path from get_page_content(), format: <docName>/<imagePath>

remove_document

Permanently delete documents and all associated data. Accepts an array of document names for batch deletion (maximum 10 documents per batch). Returns detailed success/failure status for each document. Use this to manage storage space. WARNING: This action is irreversible - deleted documents cannot be recovered.

Parameters (1)

doc_names array Required

Array of document names to delete

PageIndex MCP

Content

PageIndex MCP 8

process_document

recent_documents

find_relevant_documents

get_document

get_document_structure

get_page_content

get_document_image

remove_document

Connection Info

You Might Also Like

everything-claude-code

markitdown

servers

servers

Time

Filesystem

PageIndex MCP

Scan with WeChat to Share

Authentication Required

Content

PageIndex MCP 8

process_document

recent_documents

find_relevant_documents

get_document

get_document_structure

get_page_content

get_document_image

remove_document

Connection Info

You Might Also Like

everything-claude-code

markitdown

servers

servers

Time

Filesystem