Content
# Android MCP (Model Context Protocol) Framework
This is an innovative Android device control and application self-learning framework based on the Model Context Protocol (MCP). The framework can control Android devices through natural language commands and can automatically learn the operation methods of applications on the device.
## Project Overview
The framework mainly includes the following functional modules:
1. **MCP Core Protocol**: Defines the basic data structures and operation types for device control.
2. **Device Communication Interface**: Provides TCP communication capabilities with Android devices.
3. **Application Self-Learning Engine**: Capable of automatically learning and memorizing the UI structure and operation methods of Android applications.
4. **Deep Application Exploration**: Offers deeper application UI exploration and element detection functionalities.
5. **Natural Language Understanding**: Parses user commands through AI models and converts them into device operation sequences.
6. **HTTP API**: Provides a RESTful API interface for easy integration into other systems.
## System Architecture
```
+----------------+
| HTTP APIs |
+-------+--------+
|
+----------------+v+----------------+
| Model Interface |
| (Natural Language Understanding and Action Generation) |
+----------------+-----------------+
|
+------------+ +----------v-----------+ +--------------+
| App Learner |<-| MCP Context |->| MCP Protocol |
+------^------+ +----------+-----------+ +--------------+
| |
| +----------v-----------+
+-------->| App Deep Explorer |
+----------------------+
|
+---------v----------+
| MCP Server |
+---------+----------+
|
+---------v----------+
| Android Device |
+--------------------+
```
## Tech Stack
- **Backend**: Python, Flask
- **AI Model**: Any model that supports OPEN AI SDK
- **Communication Protocol**: Custom TCP protocol, HTTP RESTful API
- **Storage**: JSON files (application knowledge base)
## Installation Requirements
- Python 3.8+
- Flask
- OpenAI Python SDK (supports any model of OPEN AI sdk)
- Android device or emulator (client application must be installed)
## Installation Steps
1. Clone the repository to your local machine:
```bash
git clone https://github.com/lmee/mcp_for_android.git
cd mcp_for_android
```
2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
```
3. Install the dependencies:
```bash
pip install -r requirements.txt
```
4. Configure the API key:
Update the API key for the model being used in `main.py`, or set the environment variable:
```bash
export DEEPSEEK_API_KEY="your-api-key-here"
```
## Running the Server
Start the MCP Server and Flask application:
```bash
python main.py
```
The server will run on the following ports:
- MCP TCP Server: 8080
- HTTP API Server: 5000
## API Interface Description
### Device Registration
```
POST /register_device
{
"device_id": "your-device-id",
"capabilities": ["click", "swipe", "type_text", ...]
}
```
### Execute Command
```
POST /execute
{
"device_id": "your-device-id",
"command": "Open WeChat and send a message to Zhang San",
"session_id": "optional-session-id"
}
```
### Learning Application
```
POST /learn_app
{
"device_id": "your-device-id",
"package_name": "com.example.app" // Optional, if omitted, learn all applications
}
```
### Text Analysis
```
POST /analyze
{
"text": "Open WeChat to send a message",
"device_id": "your-device-id" // Optional
}
```
### Get System Status
```
GET /status
```
## Client Setup
To use this framework, you need to install the accompanying MCP client application on your Android device. The client is responsible for:
1. Connecting to the MCP server
2. Receiving and executing operation commands
3. Providing device UI status information
4. Supporting application exploration and learning
The client installation steps will be provided in a separate document.
## Application Self-Learning Feature
One of the core features of this framework is the ability to automatically learn the application operation methods on the device. The learning process includes:
1. **Application Discovery**: Scanning the applications installed on the device
2. **UI Exploration**: Launching the application and exploring its UI structure
3. **Element Recognition**: Identifying key UI elements in the application (buttons, input fields, etc.)
4. **Operation Learning**: Learning common operations (searching, playing, navigating, etc.)
5. **Knowledge Storage**: Saving the learned knowledge to the application knowledge base
Once the learning is complete, the system can automatically execute the corresponding application operations based on the user's natural language instructions.
## Example Usage
### 1. Control Devices Using Natural Language
```python
import requests
api_url = "http://localhost:5000/execute"
data = {
"device_id": "my-android-phone",
"command": "Open WeChat and send 'Hello' to Zhang San"
}
response = requests.post(api_url, json=data)
print(response.json())
```
### 2. Learning Specific Applications
```python
import requests
api_url = "http://localhost:5000/learn_app"
data = {
"device_id": "my-android-phone",
"package_name": "com.tencent.mm" # WeChat package name
}
response = requests.post(api_url, json=data)
print(response.json())
```
## Deep Application Exploration
In addition to basic application learning features, the system also provides deep application exploration capabilities, which can:
1. Wait for the application to fully load
2. Detect more types of UI elements
3. Support hierarchical exploration, allowing access to more screens by clicking on key elements
4. Generate a more complete application knowledge graph
Through deep exploration, the system can acquire a more comprehensive understanding of the application, enhancing control accuracy.
## Project Structure
```
android-mcp-framework/
├── mcp/
│ ├── mcp_protocol.py # Protocol definition
│ ├── mcp_interface.py # MCP server implementation
│ ├── model_interface.py # Model interface
│ └── route_handler.py # API route handling
├── app_learn/
│ ├── app_learner.py # Application learning engine
│ └── app_deep_explorer.py # Deep exploration module
├── main.py # Main program
├── requirements.txt # Dependency package list
└── README.md # Project description
```
## Notes
1. This project only provides ideas and solutions; the code is sample code and may not be suitable for out-of-the-box use.
2. This framework requires a compatible Android client to function.
3. Some applications may have anti-scraping or security mechanisms that could restrict automated operations.
4. A valid DeepSeek AI API key is required.
5. The application knowledge base will grow with learning, so ensure there is sufficient storage space.
## Issues to be Resolved
1. The current accuracy of the application's learning still needs improvement.
2. After setting a large number of prompts, the model's inference speed is very slow.
## Contribution Guidelines
We welcome code contributions or questions! Please follow these steps:
1. Fork the project
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Create a Pull Request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Connection Info
You Might Also Like
MarkItDown MCP
Converting files and office documents to Markdown.
Time
Obtaining current time information and converting time between different...
Filesystem
Model Context Protocol Servers
Sequential Thinking
Offers a structured approach to dynamic and reflective problem-solving,...
Git
Model Context Protocol Servers
Context 7
Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors