Content
<!-- markdownlint-disable MD033 MD041 MD024 -->
<p align="center">
<img alt="LOGO" src="https://cdn.jsdelivr.net/gh/MaaAssistantArknights/design@main/logo/maa-logo_512x512.png" width="256" height="256" />
</p>
<div align="center">
# MaaMCP



[](https://github.com/MaaXYZ/MaaFramework)
[](https://pypi.org/project/maa-mcp/)
MCP server based on [MaaFramework](https://github.com/MaaXYZ/MaaFramework)
Provides automation capabilities for Android devices and Windows desktops for AI assistants
[English](README_EN.md) | 中文
</div>
---
## Introduction
MaaMCP is an MCP server that exposes the powerful automation capabilities of MaaFramework through a standardized MCP interface to AI assistants (such as Claude). Through this server, AI assistants can:
- 🤖 **Android Automation** - Connect to and control Android devices/emulators via ADB
- 🖥️ **Windows Automation** - Control Windows desktop applications
- 🎯 **Background Operations** - Screenshots and controls on Windows run in the background without occupying the mouse and keyboard, allowing you to continue using your computer for other tasks
- 🔗 **Multi-Device Collaboration** - Control multiple devices/windows simultaneously to achieve cross-device automation
- 👁️ **Intelligent Recognition** - Use OCR (Optical Character Recognition) to identify text content on the screen
- 🎯 **Precise Operations** - Execute clicks, swipes, text input, key presses, and other actions
- 📸 **Screenshots** - Capture real-time screenshots for visual analysis
Talk is cheap, please see: **[🎞️ Bilibili Video Demonstration](https://www.bilibili.com/video/BV1eGmhBaEZz/)**
## Features
### 🔍 Device Discovery and Connection
- `find_adb_device_list` - Scan for available ADB devices
- `find_window_list` - Scan for available Windows windows
- `connect_adb_device` - Connect to an Android device
- `connect_window` - Connect to a Windows window
### 👀 Screen Recognition
- `screencap_and_ocr` - Optical Character Recognition (OCR) (efficient, recommended for priority use)
- `screencap_only` - Screen capture, then processed by a large model's vision (use as needed, high token cost)
### 🎮 Device Control
- `click` - Click at specified coordinates (supports multi-touch/mouse button selection, long press)
- On Windows, supports specifying mouse buttons: left button, right button, middle button
- `double_click` - Double click at specified coordinates
- `swipe` - Swipe gesture (preferably used for scrolling/paging on Android devices)
- `input_text` - Input text
- `click_key` - Key operation (supports long press)
- On Android, can simulate system keys: Back key (4), Home key (3), Menu key (82), Volume keys, etc.
- On Windows, supports virtual key codes: Enter (13), ESC (27), arrow keys, etc.
- `keyboard_shortcut` - Keyboard shortcuts
- Supports combinations: Ctrl+C, Ctrl+V, Alt+Tab, etc.
- `scroll` - Mouse wheel (Windows only)
### 📝 Pipeline Generation and Execution
- `get_pipeline_protocol` - Retrieve the Pipeline protocol document
- `save_pipeline` - Save Pipeline JSON to a file (supports creation and updates)
- `load_pipeline` - Read an existing Pipeline file
- `run_pipeline` - Execute the Pipeline and return the execution result
- `open_pipeline_in_browser` - Open the Pipeline visualization interface in the browser
## Quick Start
### Installation Method
#### Method 1: Install via uv (Recommended)
First, you need to install [uv](https://docs.astral.sh/uv/#installation)
```bash
uvx maa-mcp
```
#### Method 2: Install via pip
```bash
pip install maa-mcp
```
#### Method 3: Install from Source
1. **Clone the Repository**
```bash
git clone https://github.com/MistEO/MaaMCP.git
cd MaaMCP
```
2. **Install Python Dependencies**
```bash
pip install -e .
```
### Configure Client
In software like Cursor, add the MCP server:
```json
{
"mcpServers": {
"MaaMCP": {
"command": "maa-mcp"
}
}
}
```
or
In software like Cherry Studio, add the MCP command:
```shell
uvx maa-mcp
```
## Usage Example
After the configuration is complete, you can use it like this:
**Android Automation Example:**
```text
Please use the MaaMCP tool to help me connect to the Android device, open Meituan, and order a takeout. I want Chinese food, one serving, around 20 yuan.
```
**Windows Automation Example:**
```text
Please use the MaaMCP tool to show me how to add a rotation effect to this PPT slide.
```
**Pipeline Generation Example:**
```text
Please use the MaaMCP tool to connect my device, help me open settings, go to display settings, and adjust the brightness to 50%.
After the operation is completed, please generate the Pipeline JSON for this process so that it can be run directly later.
```
MaaMCP will automatically:
1. Scan for available devices/windows
2. Establish a connection
3. Automatically download and load OCR resources
4. Execute recognition and operation tasks
## Large Model Prompts
If you want the AI to quickly and efficiently complete automated tasks without seeing detailed explanations during the execution process, you can add the following content to your prompts:
```
# Role: UI Automation Agent
## Workflow Optimization Rules
1. **Minimize Round-Trips**: Your goal is to complete tasks with the fewest interactions possible.
2. **Critical Pattern**: When it comes to form/chat input, you must follow the atomic operation sequence of **[Click Focus -> Input Text -> Send Key]**.
- 🚫 Incorrect Approach: Click first, wait for the result; then Input, wait for the result; then Press Enter.
- ✅ Correct Approach: After `click`, there is no need to wait for a return; directly append `input_text` and `click_key` to the same `tool_calls` list based on logical inference.
## Communication Style
- **NO YAPPING**: Do not reiterate the user's instructions, do not explain your steps.
- **Direct Execution**: Receive instructions -> (internal thinking) -> directly output JSON tool calls.
### Performance Recommendations
To achieve the fastest execution speed, it is recommended to use the **Flash version** of large language models (such as Claude 3 Flash), which can significantly enhance response speed while maintaining a high level of intelligence.
## Workflow
MaaMCP follows a simple operational workflow that supports multi-device/multi-window collaborative work:
```mermaid
graph LR
A[Scan Devices] --> B[Establish Connection]
B --> C[Execute Automation Operations]
```
1. **Scan** - Use `find_adb_device_list` or `find_window_list`
2. **Connect** - Use `connect_adb_device` or `connect_window` (can connect to multiple devices/windows, obtaining multiple controller IDs)
3. **Operate** - Perform automation operations such as OCR, clicks, and swipes on multiple devices/windows by specifying different controller IDs
## Pipeline Generation Functionality
MaaMCP supports allowing AI to convert executed operations into [MaaFramework Pipeline](https://maafw.xyz/docs/3.1-PipelineProtocol) JSON format, achieving **one operation, infinite reuse**.
### Working Principle
```mermaid
graph LR
A[AI performs operations] --> B[Operation completed]
B --> C[AI reads Pipeline documentation]
C --> D[AI intelligently generates Pipeline]
D --> E[Save JSON file]
E --> F[Run validation]
F --> G{Is it successful?}
G -->|Yes| H[Complete]
G -->|No| I[Analyze failure reasons]
I --> J[Modify Pipeline]
J --> F
```
1. **Perform operations** - AI performs automated operations such as OCR, clicking, and swiping normally.
2. **Obtain documentation** - Call `get_pipeline_protocol` to obtain the Pipeline protocol specification.
3. **Intelligently generate** - AI converts **valid operations** into Pipeline JSON based on the documentation specification.
4. **Save file** - Call `save_pipeline` to save the generated Pipeline.
5. **Run validation** - Call `run_pipeline` to verify whether the Pipeline runs correctly.
6. **Iterative optimization** - Analyze failure reasons based on the running results and modify the Pipeline until successful.
### Advantages of Intelligent Generation
Unlike mechanical recording, AI intelligent generation has the following advantages:
- **Only retains successful paths**: If multiple paths were attempted during the operation (for example, first entering menu A and not finding anything, then returning and entering menu B to find the item), AI will only retain the final successful path and discard the failed attempts.
- **Understanding operational intent**: AI can understand the purpose of each operation and generate semantically clear node names.
- **Optimizing recognition conditions**: Smartly set recognition areas and matching conditions based on OCR (Optical Character Recognition) results.
- **Validation and iteration**: Identify issues through running validations, automatically fix them, and enhance robustness.
### Verification and Iterative Optimization
After the Pipeline is generated, the AI will automatically perform verification and optimization:
1. **Run Verification** - Execute the Pipeline to check if it is successful.
2. **Failure Analysis** - If it fails, analyze which specific node encountered an error and the reason.
3. **Intelligent Repair** - Common optimization methods include:
- Adding alternative recognition nodes (adding multiple candidates in the next list)
- Relaxing OCR matching conditions (using regular expressions or partial matching)
- Adjusting the roi (region of interest) recognition area
- Increasing the wait time (post_delay)
- Adding intermediate state detection nodes
4. **Re-Verification** - Run again after modifications until stable success is achieved.
If issues are found in the Pipeline logic itself, the AI can also re-execute the automation operations, combining new and old experiences to generate a more refined Pipeline.
### Example Output
```json
{
"Start Task": {
"recognition": "DirectHit",
"action": "DoNothing",
"next": ["Click Settings"]
},
"Click Settings": {
"recognition": "OCR",
"expected": "Settings",
"action": "Click",
"next": ["Enter Display"]
},
"Enter Display": {
"recognition": "OCR",
"expected": "Display",
"action": "Click",
"next": ["Adjust Brightness"]
},
"Adjust Brightness": {
"recognition": "OCR",
"expected": "Brightness",
"action": "Swipe",
"begin": [200, 500],
"end": [400, 500],
"duration": 200
}
}
```
## Notes
📌 **Windows Automation Limitations**:
- Some games or applications' anti-cheat mechanisms may intercept background control operations.
- If the target application runs with administrator privileges, MaaMCP must also be launched with administrator privileges.
- Operations on minimized windows are not supported; please keep the target window in a non-minimized state.
- If the default background screenshot/input method is unavailable (e.g., screenshots are empty, operations are unresponsive), the AI assistant may attempt to switch to foreground mode, which will take control of the mouse and keyboard.
## Frequently Asked Questions
### OCR Recognition Failure, Error "Failed to load det or rec" or Resource Not Found
When using for the first time, the OCR model files will be automatically downloaded. However, there may be cases of download failure, so please check the data directory:
- Windows: `C:\Users\<username>\AppData\Local\MaaXYZ\MaaMCP\resource\model\ocr\`
- macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/resource/model/ocr/`
- Linux: `~/.local/share/MaaXYZ/MaaMCP/resource/model/ocr/`
1. Check if the model files (`det.onnx`, `rec.onnx`, `keys.txt`) are present in the above directory.
2. Check if there are any resource download exceptions in `model/download.log`.
3. Manually execute `python -c "from maa_mcp.download import download_and_extract_ocr; download_and_extract_ocr()"` to try downloading again.
### About ISSUE
When submitting an issue, please provide the log file. The log file path is as follows:
- Windows: `C:\Users\<username>\AppData\Local\MaaXYZ\MaaMCP\debug\maa.log`
- macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/debug/maa.log`
- Linux: `~/.local/share/MaaXYZ/MaaMCP/debug/maa.log`
## License
This project is licensed under the [GNU AGPL v3](LICENSE) license.
## Acknowledgments
- **[MaaFramework](https://github.com/MaaXYZ/MaaFramework)** - Provides a powerful automation framework
- **[FastMCP](https://github.com/jlowin/fastmcp)** - Simplifies MCP Server development
- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Defines standards for AI tool integration
Connection Info
You Might Also Like
MarkItDown MCP
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
Git
A Model Context Protocol server for Git automation and interaction.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.