Content

# NCBI Model Context Protocol (MCP) A Python implementation of the Model Context Protocol for interacting with NCBI databases. ## Setup 1. Clone this repository 2. Install dependencies: ``` pip install -r requirements.txt ``` 3. Create a `.env` file with your NCBI API key: ``` NCBI_API_KEY=your_api_key_here NCBI_EMAIL=your_email@example.com ``` ## Running the MCP Server ``` python ncbi_mcp.py ``` ## Using with Cursor/Claude Once the MCP server is running, you can interact with it using natural language in Cursor/Claude. ### Using Natural Language Queries You can use natural language to perform searches and retrieve information: ``` tools/call { "name": "nlp-query", "arguments": { "query": "Find research articles about BRCA1" } } ``` Or more simply, just use the query directly: ``` @ncbi-mcp Find research articles about BRCA1 ``` ### Example Natural Language Queries Here are some example natural language queries you can try: 1. Gene function information: ``` @ncbi-mcp Please summarize the function of TNF-alpha ``` 2. Genome size and statistics: ``` @ncbi-mcp How big is the genome for Saccharomyces cerevisiae? ``` 3. Assembly statistics: ``` @ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome? ``` 4. Dataset counts: ``` @ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells? ``` 5. Search for scientific articles: ``` @ncbi-mcp Find the latest research on COVID-19 vaccines ``` 6. Get gene information: ``` @ncbi-mcp Tell me about the BRCA1 gene ``` 7. Fetch genome information: ``` @ncbi-mcp Get genome information for Homo sapiens ``` ## Testing To test the MCP server with various queries, you can use the included test files: ``` # Test natural language query functionality (default) .\run_test.bat # Test all tools .\run_test.bat all # Test specific test file .\run_test.bat test_all_tools.jsonl # Test high-level tools .\run_test.bat test_high_level_tools.jsonl ``` The test script will: 1. Start the MCP server in background 2. Send test requests from the specified file 3. Wait for a few seconds to allow processing 4. Terminate the server and display the output This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use: ``` # Run manually with any test file type test_nlp_query.jsonl | python ncbi_mcp.py ``` The test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server. ## Available Tools The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction. ## Tool Usage Guidelines for LLMs ### Recommended Workflow Patterns **For most biological queries, start with `nlp-query`** - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools. **Common Research Workflows:** 1. **Gene Analysis Workflow:** - Start with `nlp-query` for general gene questions - Use `summarize-gene` for comprehensive gene information - Use `get_gene_info` for detailed structured data - Use `ncbi-search` + `ncbi-fetch` for specific database queries 2. **Genome Analysis Workflow:** - Use `genome-stats` for organism genome statistics - Use `get_genome_info` for detailed genome metadata - Use `count-datasets` to explore available genome assemblies 3. **Literature Research Workflow:** - Use `nlp-query` for natural language literature searches - Use `ncbi-search` with database="pubmed" for precise searches - Use `ncbi-fetch` to get full publication details 4. **Dataset Discovery Workflow:** - Use `count-datasets` to assess data availability - Use `nlp-query` to explore datasets with natural language - Use `ncbi-search` for systematic database exploration 5. **E-utilities Workflow (Advanced):** - Use `ncbi-info` to discover available databases - Use `ncbi-global-query` to see which databases contain your search term - Use `ncbi-search` to find specific UIDs in target databases - Use `ncbi-summary` to get overview information about records - Use `ncbi-fetch` to retrieve complete records - Use `ncbi-link` to find related records across databases 6. **Cross-Database Analysis Workflow:** - Use `ncbi-search` to find genes of interest - Use `ncbi-link` to find related proteins, structures, or literature - Use `ncbi-summary` to get metadata about related records - Use `ncbi-fetch` to retrieve detailed information ### Tool Selection Guide **High-Level Tools (Recommended for most users):** - **`nlp-query`**: Use for general biological questions, complex queries, and when you're unsure which tool to use - **`summarize-gene`**: Use for comprehensive gene analysis and understanding gene function - **`genome-stats`**: Use for genome size, assembly quality, and organism comparison - **`count-datasets`**: Use for research planning and data availability assessment - **`get_gene_info`**: Use for detailed, structured gene information - **`get_genome_info`**: Use for detailed, structured genome information **Low-Level E-utilities Tools (For advanced users):** - **`ncbi-search` (ESearch)**: Use for precise database searches with specific filters, Boolean operators, and field qualifiers - **`ncbi-fetch` (EFetch)**: Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML) - **`ncbi-summary` (ESummary)**: Use to get document summaries without fetching complete records - **`ncbi-link` (ELink)**: Use to find related records across databases (e.g., gene to protein, protein to structure) - **`ncbi-info` (EInfo)**: Use to discover available databases and their capabilities - **`ncbi-global-query` (EGQuery)**: Use to search across all databases simultaneously - **`ncbi-spell` (ESpell)**: Use to get spelling suggestions for search terms - **`ncbi-citation-match` (ECitMatch)**: Use to find PMIDs from citation information ### Biological Context and Terminology **Understanding NCBI Databases:** - **Gene**: Contains gene records with symbols, names, functions, and genomic locations - **Protein**: Contains protein sequences and annotations - **Nucleotide**: Contains DNA/RNA sequences (genes, transcripts, genomic regions) - **PubMed**: Contains scientific literature and publications - **BioSample**: Contains biological sample metadata (tissues, cell lines, etc.) - **BioProject**: Contains research project information - **SRA**: Contains raw sequencing data - **Assembly**: Contains genome assembly information **Common Biological Terms:** - **Gene Symbol**: Short abbreviation (e.g., BRCA1, TP53, TNF) - **Gene ID**: Unique NCBI identifier (e.g., 672 for BRCA1) - **Accession**: Unique sequence identifier (e.g., NM_001126114.3) - **N50/L50**: Assembly quality metrics (larger N50 = better assembly) - **Reference Genome**: High-quality representative genome for a species - **Organism**: Use scientific names (Homo sapiens) or common names (human) **Search Strategies:** - Use specific gene symbols for precise results - Include organism names to avoid ambiguity - Use Boolean operators (AND, OR, NOT) for complex searches - Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches ### High-Level Tools #### Natural Language Query Processor ``` tools/call { "name": "nlp-query", "arguments": { "query": "Please summarize the function of TNF-alpha" } } ``` #### Gene Summarizer ``` tools/call { "name": "summarize-gene", "arguments": { "gene_name": "BRCA1" } } ``` #### Genome Statistics ``` tools/call { "name": "genome-stats", "arguments": { "organism": "Escherichia coli" } } ``` #### Dataset Counter ``` tools/call { "name": "count-datasets", "arguments": { "database": "biosample", "query": "mouse melanoma b16f10" } } ``` ### Low-Level Tools #### Search NCBI Databases ``` tools/call { "name": "ncbi-search", "arguments": { "database": "pubmed", "term": "BRCA1", "filters": { "organism": "Homo sapiens", "date_range": { "start": "2020" } } } } ``` #### Fetch NCBI Records ``` tools/call { "name": "ncbi-fetch", "arguments": { "database": "gene", "ids": ["70"], "rettype": "gb" } } ``` #### Get Gene Information ``` tools/call { "name": "get_gene_info", "arguments": { "gene_id": "672" } } ``` #### Get Genome Information ``` tools/call { "name": "get_genome_info", "arguments": { "organism": "Homo sapiens", "reference": true } } ``` ## License Apache-2.0

ncbi-mcp

Content

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace

ncbi-mcp

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace