Content
<p align="center">
<img alt="LOGO" src="https://cdn.jsdelivr.net/gh/MaaAssistantArknights/design@main/logo/maa-logo_512x512.png" width="256" height="256" />
</p>
<div align="center">
# MaaMCP



[](https://github.com/MaaXYZ/MaaFramework)
[](https://pypi.org/project/maa-mcp/)
MCP Server based on [MaaFramework](https://github.com/MaaXYZ/MaaFramework)
Provides Android device and Windows desktop automation capabilities for AI assistants
[English](README_EN.md) | 中文
</div>
---
## Introduction
MaaMCP is an MCP server that exposes the powerful automation capabilities of MaaFramework to AI assistants (such as Claude) through a standardized MCP interface. With this server, AI assistants can:
- 🤖 **Android Automation** - Connect to and control Android devices/emulators via ADB
- 🖥️ **Windows Automation** - Control Windows desktop applications
- 🎯 **Background Operation** - Screenshots and controls on Windows run in the background without occupying the mouse and keyboard, allowing you to continue using the computer for other tasks
- 🔗 **Multi-Device Collaboration** - Control multiple devices/windows simultaneously to achieve cross-device automation
- 👁️ **Intelligent Recognition** - Use OCR to recognize text content on the screen
- 🎯 **Precise Operation** - Perform operations such as clicking, swiping, text input, and key presses
- 📸 **Screenshot** - Obtain real-time screenshots for visual analysis
Talk is cheap, please see: **[🎞️ Bilibili Video Demonstration](https://www.bilibili.com/video/BV1eGmhBaEZz/)**
## Features
### 🔍 Device Discovery and Connection
- `find_adb_device_list` - Scan for available ADB devices
- `find_window_list` - Scan for available Windows windows
- `connect_adb_device` - Connect to an Android device
- `connect_window` - Connect to a Windows window
### 👀 Screen Recognition
- `screencap_and_ocr` - Optical Character Recognition (efficient, recommended for priority use)
- `screencap_only` - Screenshot, then processed by large model vision (use on demand, large token overhead)
### 🎮 Device Control
- `click` - Click on the specified coordinates (supports multi-touch/mouse button selection, long press)
- Supports specifying mouse buttons on Windows: left, right, middle
- `double_click` - Double-click the specified coordinates
- `swipe` - Swipe gesture (preferred for scrolling/paging on Android devices)
- `input_text` - Input text
- `click_key` - Key operation (supports long press)
- Can simulate system keys on Android: Back key (4), Home key (3), Menu key (82), Volume keys, etc.
- Supports virtual key codes on Windows: Enter (13), ESC (27), arrow keys, etc.
- `keyboard_shortcut` - Keyboard shortcut
- Supports combination keys: Ctrl+C, Ctrl+V, Alt+Tab, etc.
- `scroll` - Mouse wheel (Windows only)
### 📝 Pipeline Generation and Execution
- `get_pipeline_protocol` - Get the Pipeline protocol document
- `save_pipeline` - Save Pipeline JSON to a file (supports creating and updating)
- `load_pipeline` - Read an existing Pipeline file
- `run_pipeline` - Run the Pipeline and return the execution result
- `open_pipeline_in_browser` - Open the Pipeline visualization interface in the browser
## Quick Start
### Installation
#### Method 1: Install via uv (Recommended)
You need to install [uv](https://docs.astral.sh/uv/#installation) first
```bash
uvx maa-mcp
```
#### Method 2: Install via pip
```bash
pip install maa-mcp
```
#### Method 3: Install from source
1. **Clone the repository**
```bash
git clone https://github.com/MistEO/MaaMCP.git
cd MaaMCP
```
2. **Install Python dependencies**
```bash
pip install -e .
```
### Configure Client
In software such as Cursor, add the MCP server:
```json
{
"mcpServers": {
"MaaMCP": {
"command": "maa-mcp"
}
}
}
```
Or
In software such as Cherry Studio, add the MCP command:
```shell
uvx maa-mcp
```
## Usage Examples
After configuration, you can use it like this:
**Android Automation Example:**
```text
Please use the MaaMCP tool to help me connect to an Android device, open Meituan to order a takeaway for me, I want to eat Chinese food, for one person, around 20 yuan
```
**Windows Automation Example:**
```text
Please use the MaaMCP tool to see how to add a rotation effect to this PPT page, show me how to do it
```
**Pipeline Generation Example:**
```text
Please use the MaaMCP tool to connect to my device, help me open the settings, enter the display settings, and adjust the brightness to 50%.
After the operation is completed, help me generate the Pipeline JSON for this process so that it can be run directly later.
```
MaaMCP will automatically:
1. Scan available devices/windows
2. Establish a connection
3. Automatically download and load OCR resources
4. Execute recognition and operation tasks
## Large Model Prompt
If you want AI to complete automation tasks quickly and efficiently, and don't want to see detailed explanations of the running process, you can add the following to your prompt:
```
# Role: UI Automation Agent
## Workflow Optimization Rules
1. **Minimize Round-Trips**: Your goal is to complete the task with the fewest number of interactions.
2. **Critical Pattern**: When it comes to form/chat input, you must follow the atomic operation sequence of **[Click Focus -> Input Text -> Send Key]**.
- 🚫 Wrong way: Click first, wait for the result; then Input, wait for the result; then Press Enter.
- ✅ Correct way: After `click`, there is no need to wait for a return, directly append `input_text` and `click_key` in the same `tool_calls` list according to logic inference.
## Communication Style
- **NO YAPPING**: Do not repeat the user's instructions, do not explain your steps.
- **Direct Execution**: Receive instructions -> (internal thinking) -> directly output JSON tool calls.
```
### Performance Suggestions
For the fastest running speed, it is recommended to use **Flash versions** of large language models (such as Claude 3 Flash), which can significantly improve response speed while maintaining a high level of intelligence.
## Workflow
MaaMCP follows a simple operation process and supports multi-device/multi-window collaboration:
```mermaid
graph LR
A[Scan Devices] --> B[Establish Connection]
B --> C[Execute Automation Operations]
```
1. **Scan** - Use `find_adb_device_list` or `find_window_list`
2. **Connect** - Use `connect_adb_device` or `connect_window` (can connect multiple devices/windows to obtain multiple controller IDs)
3. **Operate** - Perform OCR, click, swipe, and other automation operations on multiple devices/windows by specifying different controller IDs
## Pipeline Generation Function
MaaMCP supports allowing AI to convert executed operations into [MaaFramework Pipeline](https://maafw.xyz/docs/3.1-PipelineProtocol) JSON format, achieving **operate once, reuse infinitely**.
### Working Principle
```mermaid
graph LR
A[AI Executes Operation] --> B[Operation Completed]
B --> C[AI Reads Pipeline Document]
C --> D[AI Intelligently Generates Pipeline]
D --> E[Save JSON File]
E --> F[Run Verification]
F --> G{Successful?}
G -->|Yes| H[Complete]
G -->|No| I[Analyze Failure Reason]
I --> J[Modify Pipeline]
J --> F
```
1. **Execute Operation** - AI normally executes OCR, click, swipe, and other automation operations
2. **Get Document** - Call `get_pipeline_protocol` to get the Pipeline protocol specification
3. **Intelligent Generation** - AI converts **valid operations** into Pipeline JSON according to the document specification
4. **Save File** - Call `save_pipeline` to save the generated Pipeline
5. **Run Verification** - Call `run_pipeline` to verify whether the Pipeline is running normally
6. **Iterative Optimization** - Analyze the cause of failure based on the running results and modify the Pipeline until successful
### Advantages of Intelligent Generation
Unlike mechanical recording, AI intelligent generation has the following advantages:
- **Only Keep Successful Paths**: If multiple paths are tried during the operation (such as entering menu A first and not finding it, then returning and entering menu B to find it), AI will only keep the final successful path and remove the failed attempts
- **Understand Operation Intent**: AI can understand the purpose of each operation and generate node names with clear semantics
- **Optimize Recognition Conditions**: Intelligently set the recognition area and matching conditions based on OCR results
- **Verification and Iteration**: Discover problems through running verification, automatically fix and enhance robustness
### Verification and Iterative Optimization
After the Pipeline is generated, AI will automatically verify and optimize it:
1. **Run Verification** - Execute the Pipeline to check if it is successful
2. **Failure Analysis** - If it fails, analyze which node failed and the reason
3. **Intelligent Repair** - Common optimization methods:
- Add alternative recognition nodes (add multiple candidates to the next list)
- Relax OCR matching conditions (use regular expressions or partial matching)
- Adjust the roi recognition area
- Increase waiting time (post_delay)
- Add intermediate state detection nodes
4. **Re-verification** - Run again after modification until stable success
If it is found that the Pipeline logic itself has problems, AI can also re-execute the automation operation and combine new and old experiences to generate a more complete Pipeline.
### Example Output
```json
{
"Start Task": {
"recognition": "DirectHit",
"action": "DoNothing",
"next": ["Click Settings"]
},
"Click Settings": {
"recognition": "OCR",
"expected": "Settings",
"action": "Click",
"next": ["Enter Display"]
},
"Enter Display": {
"recognition": "OCR",
"expected": "Display",
"action": "Click",
"next": ["Adjust Brightness"]
},
"Adjust Brightness": {
"recognition": "OCR",
"expected": "Brightness",
"action": "Swipe",
"begin": [200, 500],
"end": [400, 500],
"duration": 200
}
}
```
## Precautions
📌 **Windows Automation Limitations**:
- The anti-cheat mechanism of some games or applications may intercept background control operations
- If the target application is running with administrator privileges, MaaMCP also needs to be started with administrator privileges
- It does not support operating on minimized windows, please keep the target window in a non-minimized state
- If the default background screenshot/input method is not available (such as the screenshot is empty, the operation is unresponsive), the AI assistant may try to switch to the foreground method, which will occupy the mouse and keyboard
## Common Problems
### OCR recognition fails, reports "Failed to load det or rec" or prompts that the resource does not exist
The OCR model file will be downloaded automatically for the first time. However, download failures may occur. Please check the data directory:
- Windows: `C:\Users\<Username>\AppData\Local\MaaXYZ\MaaMCP\resource\model\ocr\`
- macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/resource/model/ocr/`
- Linux: `~/.local/share/MaaXYZ/MaaMCP/resource/model/ocr/`
1. Check whether there are model files (`det.onnx`, `rec.onnx`, `keys.txt`) in the above directory
2. Check whether resource download exceptions appear in `model/download.log`
3. Manually execute `python -c "from maa_mcp.download import download_and_extract_ocr; download_and_extract_ocr()"` to try downloading again
### About ISSUE
When submitting a question, please provide the log file. The log file path is as follows:
- Windows: `C:\Users\<Username>\AppData\Local\MaaXYZ\MaaMCP\debug\maa.log`
- macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/debug/maa.log`
- Linux: `~/.local/share/MaaXYZ/MaaMCP/debug/maa.log`
## License
This project uses the [GNU AGPL v3](LICENSE) license.
## Acknowledgements
- **[MaaFramework](https://github.com/MaaXYZ/MaaFramework)** - Provides a powerful automation framework
- **[FastMCP](https://github.com/jlowin/fastmcp)** - Simplifies MCP server development
- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Defines AI tool integration standards
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
firecrawl
Firecrawl MCP Server enables web scraping, crawling, and content extraction.
Time
A Model Context Protocol server for time and timezone conversions.
baidu-map
Baidu Map MCP Server is an open-source LBS solution with geospatial APIs.
docling-mcp
Docling MCP enhances the docling project for improved agentic capabilities.
memento-mcp
Memento MCP is a scalable knowledge graph memory system for LLMs with...