Content
# MCP Chat API Service
This project integrates the MCP toolkit into an HTTP API service, allowing interaction with large models and the use of various tools through API requests. It currently supports two API formats: Simplified API and OpenAI-compatible API.
## Features
- Converts command-line chat interface into an HTTP API service
- Supports the use of MCP tools
- Maintains context for multiple sessions
- Automatic retry mechanism and error handling
- Supports Cross-Origin Resource Sharing (CORS)
- **New**: Supports OpenAI-compatible API format
- **New**: Supports streaming responses
## Installation
1. Clone the repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Create a `.env` file and set the following environment variables:
```
OPENAI_API_KEY=your OpenAI API key
OPENAI_BASE_URL=https://api.openai.com/v1
DEFAULT_MODEL=gpt-3.5-turbo
PORT=8000
HOST=0.0.0.0
```
4. Ensure that the `servers_config.json` file is correctly configured with the required MCP servers
## Usage
### Start the Server
```bash
python main.py
```
The server runs by default at `http://localhost:8000`
### API Endpoints
#### Simplified API
##### GET /
Returns a simple welcome message.
##### POST /chat
Sends a chat message and receives a reply.
Request body format:
```json
{
"message": "your question or message",
"session_id": "optional session ID"
}
```
If `session_id` is not provided, the server will create a new session.
Response format:
```json
{
"response": "reply from the large model",
"session_id": "session ID for subsequent requests"
}
```
#### OpenAI-Compatible API
##### GET /v1/models
Retrieves a list of available models.
Response format:
```json
{
"object": "list",
"data": [
{
"id": "gpt-3.5-turbo",
"object": "model",
"created": 1677610602,
"owned_by": "organization-owner"
},
{
"id": "gpt-4",
"object": "model",
"created": 1677610602,
"owned_by": "organization-owner"
}
]
}
```
##### POST /v1/chat/completions
Sends a chat message and receives a reply, fully compatible with OpenAI API format. Supports both standard and streaming responses.
Request body format:
```json
{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, please introduce yourself."}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": false // Set to true to enable streaming responses
}
```
**Standard response format:**
```json
{
"id": "chatcmpl-123abc456def",
"object": "chat.completion",
"created": 1677610602,
"model": "gpt-3.5-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I am an AI assistant..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 30,
"completion_tokens": 100,
"total_tokens": 130
}
}
```
**Streaming response format:**
When using the `stream=true` parameter, the server will return a series of SSE (Server-Sent Events) events:
```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
... [more content chunks]
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
```
### Client Examples
Two client examples are provided:
1. `client.py` - Command-line client using the Simplified API
2. `test_openai_client.py` - Test client using the OpenAI-compatible API, supporting both standard and streaming responses
Run client examples:
```bash
# Simplified API client
python client.py
# OpenAI-compatible API client
python test_openai_client.py
```
## Using OpenAI SDK
Since this service is compatible with OpenAI's API format, you can directly use the official OpenAI SDK or other third-party libraries to call this service. Just set the base_url to the address of this service:
### Standard Response Example
```python
from openai import OpenAI
# Specify base_url when creating the client
client = OpenAI(
api_key="any string, not actually used",
base_url="http://localhost:8000/v1"
)
# Usage is identical to calling the OpenAI API
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how is the weather today?"}
]
)
print(response.choices[0].message.content)
```
### Streaming Response Example
```python
from openai import OpenAI
# Specify base_url when creating the client
client = OpenAI(
api_key="any string, not actually used",
base_url="http://localhost:8000/v1"
)
# Streaming response call
stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell a story about artificial intelligence"}
],
stream=True # Enable streaming response
)
# Process response chunk by chunk
print("AI reply: ", end="")
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
```
## Customization
- Modify `servers_config.json` to add or remove MCP servers
- Change models or other configurations in the `.env` file
- Adjust timeout settings and retry strategies in `main.py`
## Notes
- In production environments, CORS `allow_origins` should be restricted
- Consider adding API authentication mechanisms
- Persistent storage for sessions can be implemented as needed
- Currently, token counting is estimated and may not match OpenAI's calculations exactly
- Tool calls are not supported in streaming response mode; if detected, it will switch to standard response
## Limitations of Streaming Responses
When using streaming responses, the following limitations apply:
1. MCP tool calls are not supported - If the model returns content indicating a tool call (in JSON format), the system will automatically switch to non-streaming mode for processing
2. Tool execution results will not be returned in real-time; they will be returned all at once after the tool execution is complete
3. Streaming responses cannot be interrupted; you must wait for the complete response to finish
## Environment Requirements
- Python 3.7+
- Dependencies:
- httpx
- python-dotenv
- mcp-sdk
- fastapi
- uvicorn
- pydantic
- requests
- sseclient-py
## Configuration
### 1. Environment Variable Configuration
Create a `.env` file and configure the following environment variables:
```env
# LLM API configuration
OPENAI_API_KEY=your API key
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional, defaults to OpenAI official address
DEFAULT_MODEL=gpt-3.5-turbo # Optional, defaults to gpt-3.5-turbo
PORT=8000
HOST=0.0.0.0
# Jianshu configuration (if needed)
JIANSHU_USER_ID=your user ID
JIANSHU_COOKIES=your cookie string
```
### 2. Server Configuration
Edit the `servers_config.json` file to configure the servers you need to connect to:
```json
{
"mcpServers": {
"sqlite": {
"command": "sqlite-server",
"args": ["database.db"],
"env": {
"DB_PATH": "path/to/database.db"
}
},
"jianshu": {
"type": "sse",
"url": "http://your-sse-server/sse"
}
}
}
```
Two types of servers are supported:
- Standard input/output server: Requires specifying `command` and `args`
- SSE server: Requires specifying `type: "sse"` and `url`
## Usage
1. Ensure the configuration file is correctly set
2. Run the chat robot:
```bash
python main.py
```
3. Start the conversation:
- Input questions or commands
- The robot will automatically select the appropriate tool to handle the request
- Input "quit" or "exit" to exit the program
## Available Tools
### SQLite Tools
- `read_query`: Execute SELECT queries
- `write_query`: Execute INSERT/UPDATE/DELETE queries
- `create_table`: Create a new table
- `list_tables`: List all tables
- `describe_table`: Get table structure
- `append_insight`: Add business insights
## Log Level
The default log level is INFO. For debugging, you can change the log level in `main.py`:
```python
logging.basicConfig(
level=logging.DEBUG, # Change to DEBUG for more detailed logs
format="%(asctime)s - %(levelname)s - %(message)s"
)
```
## Error Handling
- Tool execution failures will automatically retry (default 2 times)
- Empty responses will prompt for re-asking
- Server connection failures will log errors and exit
- Resources will be automatically cleaned up upon program exit
## Development Instructions
### Adding New Tools
1. Implement tool functionality on the server side
2. Add server configuration in `servers_config.json`
3. Tools will be automatically discovered and integrated into the chat robot
### Custom Response Handling
You can customize the response handling logic by modifying the `process_llm_response` method.
### Session Management
The `ChatSession` class is responsible for managing the entire conversation process, including:
- Initializing server connections
- Handling user input
- Calling LLM for responses
- Executing tool calls
- Cleaning up resources
## Notes
1. Please ensure API keys are kept secure and not submitted to version control systems
2. SSE servers need to support long connections
3. Extensive debug logs may impact performance
4. Please regularly check and update dependency versions
## Frequently Asked Questions
1. If you encounter connection errors, please check:
- Network connection
- Whether the API key is correct
- Whether the server address is accessible
2. If tool execution fails, please check:
- Whether the tool parameters are correct
- Server status
- Error messages in the logs
3. If you receive an empty response, you can:
- Rephrase the question
- Check API quotas
- Review detailed logs
## Contribution Guidelines
Contributions and suggestions for improvements are welcome! Please ensure:
1. Provide a clear description of the issue
2. Include necessary log information
3. Describe the steps to reproduce
## License
MIT License