PageIndex MCP

VectifyAI
252
PageIndex MCP is a reasoning-based RAG system using hierarchical tree structures.

Content

PageIndex MCP 8

PageIndex MCP is a reasoning-based RAG system using hierarchical tree structures.

process_document

Upload and process PDF documents from public URLs. Entry point for document workflow. Performs document processing and intelligent content analysis. Returns document name for subsequent operations. Supports files up to 100MB. After processing, use get_document() to check status before accessing content.

Parameters (2)
url string Required

Direct URL to PDF file

folder_id ['string', 'null'] Optional

Target folder ID. Use null for root folder, omit to use X-Folder-Scope header default.

recent_documents

List your recent document uploads with processing status. Returns up to 5 most recent documents with status and actionable suggestions. Use this to check which documents are ready for analysis, or when a user asks 'what documents do I have' or 'show my documents'.

No parameters required

find_relevant_documents

DOCUMENT DISCOVERY: Find documents in your collection by filtering on document NAMES and DESCRIPTIONS (metadata only, NOT content). This searches the document list, not document content. Returns up to 20 documents per page with cursor-based pagination. Use when: User asks 'What documents do I have about X?' or hasn't specified which document to analyze. Don't use when: Already working with a specific document - use get_page_content() instead. Note: To search WITHIN a document's content, use get_document_structure() to locate sections, then get_page_content() to extract them.

Parameters (4)
name_or_description_filter string Optional

Filter documents by their name or description (metadata only, NOT content). Example: 'climate' finds 'Climate Report 2023.pdf'. This does NOT search document content.

folder_id ['string', 'null'] Optional

Filter by folder ID. Use null for root folder files, omit for all files.

cursor string Optional

Pagination cursor for fetching next page

limit number Optional

Number of documents to return per page (1-20, default 10)

get_document

Get detailed information about a specific document by name. Requires: doc_name (string). Optional: wait_for_completion (boolean) to automatically wait up to 3 minutes if document is still processing. Returns document status, metadata, and intelligent next-step suggestions based on processing state and document size. Use this to check document readiness before get_document_structure() or get_page_content().

Parameters (2)
doc_name string Required

Document name from recent_documents()

wait_for_completion boolean Optional

If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.

get_document_structure

Extract the hierarchical structure of a completed document. Optional: wait_for_completion (boolean) to wait up to 3 minutes if still processing. Returns structured outline with headers, sections, and page references. REQUIRED for documents over 20 pages - use this first to understand layout and identify relevant sections before extracting content. For targeted questions, use structure to locate relevant pages, then extract with get_page_content().

Parameters (3)
doc_name string Required

Document name from recent_documents()

part integer Optional

Part number for pagination (1-based)

wait_for_completion boolean Optional

If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.

get_page_content

Extract specific page content from processed documents. Flexible page selection: single page ('5'), ranges ('3-7'), or multiple pages ('1,5,10'). Optional: wait_for_completion (boolean) to wait up to 3 minutes if still processing. Returns structured text content with image paths when available. Best practice: Use get_document_structure() first to identify relevant sections, then extract only necessary pages to optimize performance. Avoid extracting entire documents at once.

Parameters (3)
doc_name string Required

Document name from recent_documents()

pages string Required

Page specification: "5", "3,7,10", "5-10", or "1-3,7,9-12"

wait_for_completion boolean Optional

If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.

get_document_image

Retrieve an embedded image from a document. Requires image_path from get_page_content(), format: <docName>/<imagePath> (e.g. MyDoc.pdf/figures/fig1.png). Returns image_base64 and content_type for rendering the image directly.

Parameters (1)
image_path string Required

Image path from get_page_content(), format: <docName>/<imagePath>

remove_document

Permanently delete documents and all associated data. Accepts an array of document names for batch deletion (maximum 10 documents per batch). Returns detailed success/failure status for each document. Use this to manage storage space. WARNING: This action is irreversible - deleted documents cannot be recovered.

Parameters (1)
doc_names array Required

Array of document names to delete