Content
# MCP Chat API Service
This project integrates MCP tools into an HTTP API service, allowing interaction with large models and the use of various tools via API requests. It now supports two API formats: Simplified API and OpenAI-compatible API.
## Features
- Converts a command-line chat interface into an HTTP API service
- Supports the use of the MCP toolset
- Maintains context for multiple sessions
- Automatic retry mechanism and error handling
- Supports Cross-Origin Resource Sharing (CORS)
- **New**: Supports OpenAI-compatible API format
- **New**: Supports Streaming responses
## Installation
1. Clone the repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Create a `.env` file and set the following environment variables:
```
OPENAI_API_KEY=Your OpenAI API key
OPENAI_BASE_URL=https://api.openai.com/v1
DEFAULT_MODEL=gpt-3.5-turbo
PORT=8000
HOST=0.0.0.0
```
4. Ensure that the `servers_config.json` file is correctly configured with the required MCP servers
## Usage
### Start the server
```bash
python main.py
```
The server runs on `http://localhost:8000` by default.
### API Endpoints
#### Simplified API
##### GET /
Returns a simple welcome message.
##### POST /chat
Sends a chat message and gets a response.
Request body format:
```json
{
"message": "Your question or message",
"session_id": "Optional session ID"
}
```
If `session_id` is not provided, the server creates a new session.
Response format:
```json
{
"response": "The large model's response",
"session_id": "Session ID for subsequent requests"
}
```
#### OpenAI-compatible API
##### GET /v1/models
Gets a list of available models.
Response format:
```json
{
"object": "list",
"data": [
{
"id": "gpt-3.5-turbo",
"object": "model",
"created": 1677610602,
"owned_by": "organization-owner"
},
{
"id": "gpt-4",
"object": "model",
"created": 1677610602,
"owned_by": "organization-owner"
}
]
}
```
##### POST /v1/chat/completions
Sends a chat message and gets a response, fully compatible with the OpenAI API format. Supports normal and streaming responses.
Request body format:
```json
{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, introduce yourself."}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": false // Set to true to enable streaming response
}
```
**Normal Response Format:**
```json
{
"id": "chatcmpl-123abc456def",
"object": "chat.completion",
"created": 1677610602,
"model": "gpt-3.5-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I am an AI assistant..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 30,
"completion_tokens": 100,
"total_tokens": 130
}
}
```
**Streaming Response Format:**
When using the `stream=true` parameter, the server returns a series of SSE (Server-Sent Events) events:
```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
... [More content chunks]
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
```
### Client Examples
Two client examples are provided:
1. `client.py` - Command-line client using the Simplified API
2. `test_openai_client.py` - Test client using the OpenAI-compatible API, supporting normal and streaming responses
Run the client examples:
```bash
# Simplified API client
python client.py
# OpenAI-compatible API client
python test_openai_client.py
```
## Using OpenAI SDK
Since this service is compatible with the OpenAI API format, you can directly use the official OpenAI SDK or other third-party libraries to call this service. Just set the base_url to the address of this service:
### Normal Response Example
```python
from openai import OpenAI
# Specify base_url when creating the client
client = OpenAI(
api_key="Any string, not actually used",
base_url="http://localhost:8000/v1"
)
# The usage is exactly the same as calling the OpenAI API
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how is the weather today?"}
]
)
print(response.choices[0].message.content)
```
### Streaming Response Example
```python
from openai import OpenAI
# Specify base_url when creating the client
client = OpenAI(
api_key="Any string, not actually used",
base_url="http://localhost:8000/v1"
)
# Streaming response call
stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell a story about artificial intelligence"}
],
stream=True # Enable streaming response
)
# Process the response chunk by chunk
print("AI response: ", end="")
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
```
## Customization
- Modify `servers_config.json` to add or remove MCP servers
- Change the model or other configurations in the `.env` file
- Adjust the timeout settings and retry policies in `main.py`
## Notes
- `allow_origins` for CORS should be restricted in a production environment
- Consider adding an API authentication mechanism
- Session persistence can be implemented as needed
- Token counting is currently estimated and is not guaranteed to be exactly the same as OpenAI's calculation
- Tool calls are not currently supported in streaming response mode. If a tool call is detected, it will switch to normal response
## Limitations of Streaming Responses
When using streaming responses, there are the following limitations:
1. MCP tool calls are not supported - If the content returned by the model is detected to be a tool call (JSON format), the system will automatically switch to non-streaming mode for processing
2. The tool execution result will not be returned in real-time streaming, but will wait for the tool execution to complete before returning it all at once
3. Streaming responses cannot be interrupted in the middle and must wait for the complete response to complete
## Environment Requirements
- Python 3.7+
- Dependencies:
- httpx
- python-dotenv
- mcp-sdk
- fastapi
- uvicorn
- pydantic
- requests
- sseclient-py
## Configuration
### 1. Environment Variable Configuration
Create a `.env` file and configure the following environment variables:
```env
# LLM API Configuration
OPENAI_API_KEY=Your API key
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional, defaults to the official OpenAI address
DEFAULT_MODEL=gpt-3.5-turbo # Optional, defaults to gpt-3.5-turbo
PORT=8000
HOST=0.0.0.0
# JianShu Configuration (if needed)
JIANSHU_USER_ID=Your user ID
JIANSHU_COOKIES=Your Cookie string
```
### 2. Server Configuration
Edit the `servers_config.json` file to configure the servers to connect to:
```json
{
"mcpServers": {
"sqlite": {
"command": "sqlite-server",
"args": ["database.db"],
"env": {
"DB_PATH": "path/to/database.db"
}
},
"jianshu": {
"type": "sse",
"url": "http://your-sse-server/sse"
}
}
}
```
Two types of servers are supported:
- Standard input/output server: requires specifying `command` and `args`
- SSE server: requires specifying `type: "sse"` and `url`
## Usage
1. Ensure the configuration file is set correctly
2. Run the chatbot:
```bash
python main.py
```
3. Start the conversation:
- Enter a question or instruction
- The robot will automatically select the appropriate tool to process the request
- Enter "quit" or "exit" to exit the program
## Available Tools
### SQLite Tool
- `read_query`: Execute SELECT query
- `write_query`: Execute INSERT/UPDATE/DELETE query
- `create_table`: Create a new table
- `list_tables`: List all tables
- `describe_table`: Get table structure
- `append_insight`: Add business insight
## Log Level
The default is to use the INFO level log. If you need to debug, you can modify the log level in `main.py`:
```python
logging.basicConfig(
level=logging.DEBUG, # Change to DEBUG for more detailed logs
format="%(asctime)s - %(levelname)s - %(message)s"
)
```
## Error Handling
- Tool execution failures are automatically retried (2 times by default)
- Empty responses will prompt for re-questioning
- Server connection failures will be logged and the program will exit
- Resources are automatically cleaned up when the program exits
## Development Notes
### Adding a New Tool
1. Implement the tool function on the server side
2. Add the server configuration in `servers_config.json`
3. The tool will be automatically discovered and integrated into the chatbot
### Custom Response Handling
You can customize the response processing logic by modifying the `process_llm_response` method.
### Session Management
The `ChatSession` class is responsible for managing the entire conversation process, including:
- Initializing server connections
- Processing user input
- Calling LLM to get a response
- Executing tool calls
- Cleaning up resources
## Notes
1. Please ensure the API key is secure and do not submit it to the version control system
2. SSE servers need to support long connections
3. A large number of debugging logs may affect performance
4. Please check and update the dependency package version regularly
## Common Issues
1. If you encounter a connection error, please check:
- Network connection
- Whether the API key is correct
- Whether the server address is accessible
2. If the tool execution fails, please check:
- Whether the tool parameters are correct
- Server status
- Error messages in the log
3. If you receive an empty response, you can:
- Rephrase the question
- Check the API quota
- View detailed logs
## Contribution Guide
Welcome to submit questions and suggestions for improvement! Please ensure:
1. Provide a clear problem description
2. Include the necessary log information
3. Explain the reproduction steps
## License
MIT License
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.