Content
# NCBI Model Context Protocol (MCP)
A Python implementation of the Model Context Protocol for interacting with NCBI databases.
## Setup
1. Clone this repository
2. Install dependencies:
```
pip install -r requirements.txt
```
3. Create a `.env` file with your NCBI API key:
```
NCBI_API_KEY=your_api_key_here
NCBI_EMAIL=your_email@example.com
```
## Running the MCP Server
```
python ncbi_mcp.py
```
## Using with Cursor/Claude
Once the MCP server is running, you can interact with it using natural language in Cursor/Claude.
### Using Natural Language Queries
You can use natural language to perform searches and retrieve information:
```
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Find research articles about BRCA1"
}
}
```
Or more simply, just use the query directly:
```
@ncbi-mcp Find research articles about BRCA1
```
### Example Natural Language Queries
Here are some example natural language queries you can try:
1. Gene function information:
```
@ncbi-mcp Please summarize the function of TNF-alpha
```
2. Genome size and statistics:
```
@ncbi-mcp How big is the genome for Saccharomyces cerevisiae?
```
3. Assembly statistics:
```
@ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome?
```
4. Dataset counts:
```
@ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells?
```
5. Search for scientific articles:
```
@ncbi-mcp Find the latest research on COVID-19 vaccines
```
6. Get gene information:
```
@ncbi-mcp Tell me about the BRCA1 gene
```
7. Fetch genome information:
```
@ncbi-mcp Get genome information for Homo sapiens
```
## Testing
To test the MCP server with various queries, you can use the included test files:
```
# Test natural language query functionality (default)
.\run_test.bat
# Test all tools
.\run_test.bat all
# Test specific test file
.\run_test.bat test_all_tools.jsonl
# Test high-level tools
.\run_test.bat test_high_level_tools.jsonl
```
The test script will:
1. Start the MCP server in background
2. Send test requests from the specified file
3. Wait for a few seconds to allow processing
4. Terminate the server and display the output
This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use:
```
# Run manually with any test file
type test_nlp_query.jsonl | python ncbi_mcp.py
```
The test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server.
## Available Tools
The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction.
## Tool Usage Guidelines for LLMs
### Recommended Workflow Patterns
**For most biological queries, start with `nlp-query`** - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools.
**Common Research Workflows:**
1. **Gene Analysis Workflow:**
- Start with `nlp-query` for general gene questions
- Use `summarize-gene` for comprehensive gene information
- Use `get_gene_info` for detailed structured data
- Use `ncbi-search` + `ncbi-fetch` for specific database queries
2. **Genome Analysis Workflow:**
- Use `genome-stats` for organism genome statistics
- Use `get_genome_info` for detailed genome metadata
- Use `count-datasets` to explore available genome assemblies
3. **Literature Research Workflow:**
- Use `nlp-query` for natural language literature searches
- Use `ncbi-search` with database="pubmed" for precise searches
- Use `ncbi-fetch` to get full publication details
4. **Dataset Discovery Workflow:**
- Use `count-datasets` to assess data availability
- Use `nlp-query` to explore datasets with natural language
- Use `ncbi-search` for systematic database exploration
5. **E-utilities Workflow (Advanced):**
- Use `ncbi-info` to discover available databases
- Use `ncbi-global-query` to see which databases contain your search term
- Use `ncbi-search` to find specific UIDs in target databases
- Use `ncbi-summary` to get overview information about records
- Use `ncbi-fetch` to retrieve complete records
- Use `ncbi-link` to find related records across databases
6. **Cross-Database Analysis Workflow:**
- Use `ncbi-search` to find genes of interest
- Use `ncbi-link` to find related proteins, structures, or literature
- Use `ncbi-summary` to get metadata about related records
- Use `ncbi-fetch` to retrieve detailed information
### Tool Selection Guide
**High-Level Tools (Recommended for most users):**
- **`nlp-query`**: Use for general biological questions, complex queries, and when you're unsure which tool to use
- **`summarize-gene`**: Use for comprehensive gene analysis and understanding gene function
- **`genome-stats`**: Use for genome size, assembly quality, and organism comparison
- **`count-datasets`**: Use for research planning and data availability assessment
- **`get_gene_info`**: Use for detailed, structured gene information
- **`get_genome_info`**: Use for detailed, structured genome information
**Low-Level E-utilities Tools (For advanced users):**
- **`ncbi-search` (ESearch)**: Use for precise database searches with specific filters, Boolean operators, and field qualifiers
- **`ncbi-fetch` (EFetch)**: Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML)
- **`ncbi-summary` (ESummary)**: Use to get document summaries without fetching complete records
- **`ncbi-link` (ELink)**: Use to find related records across databases (e.g., gene to protein, protein to structure)
- **`ncbi-info` (EInfo)**: Use to discover available databases and their capabilities
- **`ncbi-global-query` (EGQuery)**: Use to search across all databases simultaneously
- **`ncbi-spell` (ESpell)**: Use to get spelling suggestions for search terms
- **`ncbi-citation-match` (ECitMatch)**: Use to find PMIDs from citation information
### Biological Context and Terminology
**Understanding NCBI Databases:**
- **Gene**: Contains gene records with symbols, names, functions, and genomic locations
- **Protein**: Contains protein sequences and annotations
- **Nucleotide**: Contains DNA/RNA sequences (genes, transcripts, genomic regions)
- **PubMed**: Contains scientific literature and publications
- **BioSample**: Contains biological sample metadata (tissues, cell lines, etc.)
- **BioProject**: Contains research project information
- **SRA**: Contains raw sequencing data
- **Assembly**: Contains genome assembly information
**Common Biological Terms:**
- **Gene Symbol**: Short abbreviation (e.g., BRCA1, TP53, TNF)
- **Gene ID**: Unique NCBI identifier (e.g., 672 for BRCA1)
- **Accession**: Unique sequence identifier (e.g., NM_001126114.3)
- **N50/L50**: Assembly quality metrics (larger N50 = better assembly)
- **Reference Genome**: High-quality representative genome for a species
- **Organism**: Use scientific names (Homo sapiens) or common names (human)
**Search Strategies:**
- Use specific gene symbols for precise results
- Include organism names to avoid ambiguity
- Use Boolean operators (AND, OR, NOT) for complex searches
- Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches
### High-Level Tools
#### Natural Language Query Processor
```
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Please summarize the function of TNF-alpha"
}
}
```
#### Gene Summarizer
```
tools/call
{
"name": "summarize-gene",
"arguments": {
"gene_name": "BRCA1"
}
}
```
#### Genome Statistics
```
tools/call
{
"name": "genome-stats",
"arguments": {
"organism": "Escherichia coli"
}
}
```
#### Dataset Counter
```
tools/call
{
"name": "count-datasets",
"arguments": {
"database": "biosample",
"query": "mouse melanoma b16f10"
}
}
```
### Low-Level Tools
#### Search NCBI Databases
```
tools/call
{
"name": "ncbi-search",
"arguments": {
"database": "pubmed",
"term": "BRCA1",
"filters": {
"organism": "Homo sapiens",
"date_range": {
"start": "2020"
}
}
}
}
```
#### Fetch NCBI Records
```
tools/call
{
"name": "ncbi-fetch",
"arguments": {
"database": "gene",
"ids": ["70"],
"rettype": "gb"
}
}
```
#### Get Gene Information
```
tools/call
{
"name": "get_gene_info",
"arguments": {
"gene_id": "672"
}
}
```
#### Get Genome Information
```
tools/call
{
"name": "get_genome_info",
"arguments": {
"organism": "Homo sapiens",
"reference": true
}
}
```
## License
Apache-2.0
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.