Content
PageIndex MCP 8
PageIndex MCP is a reasoning-based RAG system using hierarchical tree structures.
process_document
Upload and process PDF documents from public URLs. Entry point for document workflow. Performs document processing and intelligent content analysis. Returns document name for subsequent operations. Supports files up to 100MB. After processing, use get_document() to check status before accessing content.
Direct URL to PDF file
Target folder ID. Use null for root folder, omit to use X-Folder-Scope header default.
recent_documents
List your recent document uploads with processing status. Returns up to 5 most recent documents with status and actionable suggestions. Use this to check which documents are ready for analysis, or when a user asks 'what documents do I have' or 'show my documents'.
No parameters required
find_relevant_documents
DOCUMENT DISCOVERY: Find documents in your collection by filtering on document NAMES and DESCRIPTIONS (metadata only, NOT content). This searches the document list, not document content. Returns up to 20 documents per page with cursor-based pagination. Use when: User asks 'What documents do I have about X?' or hasn't specified which document to analyze. Don't use when: Already working with a specific document - use get_page_content() instead. Note: To search WITHIN a document's content, use get_document_structure() to locate sections, then get_page_content() to extract them.
Filter documents by their name or description (metadata only, NOT content). Example: 'climate' finds 'Climate Report 2023.pdf'. This does NOT search document content.
Filter by folder ID. Use null for root folder files, omit for all files.
Pagination cursor for fetching next page
Number of documents to return per page (1-20, default 10)
get_document
Get detailed information about a specific document by name. Requires: doc_name (string). Optional: wait_for_completion (boolean) to automatically wait up to 3 minutes if document is still processing. Returns document status, metadata, and intelligent next-step suggestions based on processing state and document size. Use this to check document readiness before get_document_structure() or get_page_content().
Document name from recent_documents()
If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.
get_document_structure
Extract the hierarchical structure of a completed document. Optional: wait_for_completion (boolean) to wait up to 3 minutes if still processing. Returns structured outline with headers, sections, and page references. REQUIRED for documents over 20 pages - use this first to understand layout and identify relevant sections before extracting content. For targeted questions, use structure to locate relevant pages, then extract with get_page_content().
Document name from recent_documents()
Part number for pagination (1-based)
If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.
get_page_content
Extract specific page content from processed documents. Flexible page selection: single page ('5'), ranges ('3-7'), or multiple pages ('1,5,10'). Optional: wait_for_completion (boolean) to wait up to 3 minutes if still processing. Returns structured text content with image paths when available. Best practice: Use get_document_structure() first to identify relevant sections, then extract only necessary pages to optimize performance. Avoid extracting entire documents at once.
Document name from recent_documents()
Page specification: "5", "3,7,10", "5-10", or "1-3,7,9-12"
If true and document is processing, automatically wait up to 3 minutes until completed. Reduces repeated tool calls.
get_document_image
Retrieve an embedded image from a document. Requires image_path from get_page_content(), format: <docName>/<imagePath> (e.g. MyDoc.pdf/figures/fig1.png). Returns image_base64 and content_type for rendering the image directly.
Image path from get_page_content(), format: <docName>/<imagePath>
remove_document
Permanently delete documents and all associated data. Accepts an array of document names for batch deletion (maximum 10 documents per batch). Returns detailed success/failure status for each document. Use this to manage storage space. WARNING: This action is irreversible - deleted documents cannot be recovered.
Array of document names to delete