Content

<p align="center"> <img alt="LOGO" src="https://cdn.jsdelivr.net/gh/MaaAssistantArknights/design@main/logo/maa-logo_512x512.png" width="256" height="256" /> </p> <div align="center"> # MaaMCP ![license](https://img.shields.io/github/license/MistEO/MaaMCP) ![activity](https://img.shields.io/github/commit-activity/m/MistEO/MaaMCP?color=%23ff69b4) ![stars](https://img.shields.io/github/stars/MistEO/MaaMCP?style=social) [![MaaFramework](https://img.shields.io/badge/MaaFramework-v5-green)](https://github.com/MaaXYZ/MaaFramework) [![PyPI](https://img.shields.io/pypi/v/maa-mcp?logo=pypi&logoColor=white)](https://pypi.org/project/maa-mcp/) MCP Server based on [MaaFramework](https://github.com/MaaXYZ/MaaFramework) Provides Android device and Windows desktop automation capabilities for AI assistants [English](README_EN.md) | 中文 </div> --- ## Introduction MaaMCP is an MCP server that exposes the powerful automation capabilities of MaaFramework to AI assistants (such as Claude) through a standardized MCP interface. Through this server, AI assistants can: - 🤖 **Android Automation** - Connect to and control Android devices/emulators via ADB - 🖥️ **Windows Automation** - Control Windows desktop applications - 🎯 **Background Operation** - Screenshots and controls on Windows run in the background without occupying the mouse and keyboard, allowing you to continue using your computer for other tasks - 🔗 **Multi-Device Collaboration** - Control multiple devices/windows simultaneously to achieve cross-device automation - 👁️ **Intelligent Recognition** - Use OCR to recognize text content on the screen - 🎯 **Precise Operation** - Perform operations such as clicking, swiping, text input, and key presses - 📸 **Screenshot** - Obtain real-time screenshots for visual analysis Talk is cheap, please see: **[🎞️ Bilibili Video Demonstration](https://www.bilibili.com/video/BV1eGmhBaEZz/)** ## Features ### 🔍 Device Discovery and Connection - `find_adb_device_list` - Scan for available ADB devices - `find_window_list` - Scan for available Windows windows - `connect_adb_device` - Connect to an Android device - `connect_window` - Connect to a Windows window ### 👀 Screen Recognition - `screencap_and_ocr` - Optical Character Recognition (efficient, recommended for priority use) - `screencap_only` - Screenshot, then processed by large model vision (use on demand, large token overhead) ### 🎮 Device Control - `click` - Click on the specified coordinates (supports multi-touch/mouse button selection, long press) - Supports specifying mouse buttons on Windows: left, right, middle - `double_click` - Double-click the specified coordinates - `swipe` - Swipe gesture (preferred for scrolling/paging on Android devices) - `input_text` - Input text - `click_key` - Key operation (supports long press) - Can simulate system keys on Android: Back key (4), Home key (3), Menu key (82), Volume keys, etc. - Supports virtual key codes on Windows: Enter (13), ESC (27), arrow keys, etc. - `keyboard_shortcut` - Keyboard shortcut - Supports combination keys: Ctrl+C, Ctrl+V, Alt+Tab, etc. - `scroll` - Mouse wheel (Windows only) ### 📝 Pipeline Generation and Execution - `get_pipeline_protocol` - Get the Pipeline protocol documentation - `save_pipeline` - Save Pipeline JSON to a file (supports creating and updating) - `load_pipeline` - Read an existing Pipeline file - `run_pipeline` - Run the Pipeline and return the execution result - `open_pipeline_in_browser` - Open the Pipeline visualization interface in a browser ## Quick Start ### Installation #### Method 1: Install via uv (Recommended) Requires installing [uv](https://docs.astral.sh/uv/#installation) first ```bash uvx maa-mcp ``` #### Method 2: Install via pip ```bash pip install maa-mcp ``` #### Method 3: Install from Source Code 1. **Clone the Repository** ```bash git clone https://github.com/MistEO/MaaMCP.git cd MaaMCP ``` 2. **Install Python Dependencies** ```bash pip install -e . ``` ### Configure Client In software such as Cursor, add the MCP server: ```json { "mcpServers": { "MaaMCP": { "command": "maa-mcp" } } } ``` Or In software such as Cherry Studio, add the MCP command: ```shell uvx maa-mcp ``` ## Usage Examples After configuration, you can use it like this: **Android Automation Example:** ```text Please use the MaaMCP tool to help me connect to my Android device, open Meituan to order a takeaway for me, I want to eat Chinese food, for one person, around 20 yuan ``` **Windows Automation Example:** ```text Please use the MaaMCP tool to see how I can add a rotation effect to this PPT page, show me how to do it ``` **Pipeline Generation Example:** ```text Please use the MaaMCP tool to connect to my device, help me open the settings, enter the display settings, and adjust the brightness to 50%. After the operation is complete, help me generate the Pipeline JSON for this process so that it can be run directly later. ``` MaaMCP will automatically: 1. Scan available devices/windows 2. Establish a connection 3. Automatically download and load OCR resources 4. Execute recognition and operation tasks ## Large Model Prompt If you want AI to complete automation tasks quickly and efficiently, and don't want to see detailed explanations of the running process, you can add the following to your prompt: ``` # Role: UI Automation Agent ## Workflow Optimization Rules 1. **Minimize Round-Trips**: Your goal is to complete the task with the fewest number of interactions. 2. **Critical Pattern**: When it comes to form/chat input, you must follow the atomic operation sequence of **[Click Focus -> Input Text -> Send Key]**. - 🚫 Wrong way: Click first, wait for the result; then Input, wait for the result; then Press Enter. - ✅ Correct way: After `click`, there is no need to wait for a return, directly append `input_text` and `click_key` in the same `tool_calls` list according to logic inference. ## Communication Style - **NO YAPPING**: Do not repeat the user's instructions, do not explain your steps. - **Direct Execution**: Receive instructions -> (internal thinking) -> directly output JSON tool calls. ``` ### Performance Suggestions For the fastest running speed, it is recommended to use **Flash versions** of large language models (such as Claude 3 Flash), which can significantly improve response speed while maintaining a high level of intelligence. ## Workflow MaaMCP follows a simple operation process and supports multi-device/multi-window collaboration: ```mermaid graph LR A[Scan Devices] --> B[Establish Connection] B --> C[Execute Automation Operations] ``` 1. **Scan** - Use `find_adb_device_list` or `find_window_list` 2. **Connect** - Use `connect_adb_device` or `connect_window` (can connect multiple devices/windows to obtain multiple controller IDs) 3. **Operate** - Perform OCR, click, swipe, and other automation operations on multiple devices/windows by specifying different controller IDs ## Pipeline Generation Function MaaMCP supports allowing AI to convert executed operations into [MaaFramework Pipeline](https://maafw.xyz/docs/3.1-PipelineProtocol) JSON format, achieving **operate once, reuse infinitely**. ### Working Principle ```mermaid graph LR A[AI Executes Operations] --> B[Operation Completed] B --> C[AI Reads Pipeline Documentation] C --> D[AI Intelligently Generates Pipeline] D --> E[Save JSON File] E --> F[Run Verification] F --> G{Successful?} G -->|Yes| H[Complete] G -->|No| I[Analyze Failure Reason] I --> J[Modify Pipeline] J --> F ``` 1. **Execute Operations** - AI normally executes OCR, click, swipe, and other automation operations 2. **Get Documentation** - Call `get_pipeline_protocol` to get the Pipeline protocol specification 3. **Intelligent Generation** - AI converts **valid operations** into Pipeline JSON according to the documentation specification 4. **Save File** - Call `save_pipeline` to save the generated Pipeline 5. **Run Verification** - Call `run_pipeline` to verify whether the Pipeline runs normally 6. **Iterative Optimization** - Analyze the cause of the failure based on the running results and modify the Pipeline until successful ### Advantages of Intelligent Generation Unlike mechanical recording, AI intelligent generation has the following advantages: - **Only Keep Successful Paths**: If multiple paths are tried during the operation process (such as entering menu A first and not finding it, and then returning and entering menu B to find it), AI will only keep the final successful path and remove the failed attempts - **Understand Operation Intent**: AI can understand the purpose of each operation and generate semantically clear node names - **Optimize Recognition Conditions**: Intelligently set the recognition area and matching conditions based on OCR results - **Verification and Iteration**: Discover problems through running verification, automatically fix and enhance robustness ### Verification and Iterative Optimization After the Pipeline is generated, AI will automatically verify and optimize it: 1. **Run Verification** - Execute the Pipeline to check whether it is successful 2. **Failure Analysis** - If it fails, analyze which node failed and the reason 3. **Intelligent Repair** - Common optimization methods: - Add alternative recognition nodes (add multiple candidates to the next list) - Relax OCR matching conditions (use regular expressions or partial matching) - Adjust the roi recognition area - Increase the waiting time (post_delay) - Add intermediate state detection nodes 4. **Re-verification** - Run again after modification until stable success If it is found that the Pipeline logic itself has problems, AI can also re-execute the automation operation and combine new and old experiences to generate a more complete Pipeline. ### Example Output ```json { "Start Task": { "recognition": "DirectHit", "action": "DoNothing", "next": ["Click Settings"] }, "Click Settings": { "recognition": "OCR", "expected": "Settings", "action": "Click", "next": ["Enter Display"] }, "Enter Display": { "recognition": "OCR", "expected": "Display", "action": "Click", "next": ["Adjust Brightness"] }, "Adjust Brightness": { "recognition": "OCR", "expected": "Brightness", "action": "Swipe", "begin": [200, 500], "end": [400, 500], "duration": 200 } } ``` ## Precautions 📌 **Windows Automation Limitations:** - Anti-cheat mechanisms in some games or applications may intercept background control operations - If the target application is running with administrator privileges, MaaMCP also needs to be started with administrator privileges - It is not supported to operate on minimized windows, please keep the target window in a non-minimized state - If the default background screenshot/input method is unavailable (such as the screenshot is empty, the operation is unresponsive), the AI assistant may try to switch to the foreground method, which will occupy the mouse and keyboard ## Common Issues ### OCR recognition fails, reporting "Failed to load det or rec" or prompting that the resource does not exist The OCR model file will be downloaded automatically for the first time. However, download failures may occur. Please check the data directory: - Windows: `C:\Users\<Username>\AppData\Local\MaaXYZ\MaaMCP\resource\model\ocr\` - macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/resource/model/ocr/` - Linux: `~/.local/share/MaaXYZ/MaaMCP/resource/model/ocr/` 1. Check whether there are model files (`det.onnx`, `rec.onnx`, `keys.txt`) in the above directory 2. Check whether there are resource download exceptions in `model/download.log` 3. Manually execute `python -c "from maa_mcp.download import download_and_extract_ocr; download_and_extract_ocr()"` to try downloading again ### About ISSUE When submitting a problem, please provide the log file. The log file path is as follows: - Windows: `C:\Users\<Username>\AppData\Local\MaaXYZ\MaaMCP\debug\maa.log` - macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/debug/maa.log` - Linux: `~/.local/share/MaaXYZ/MaaMCP/debug/maa.log` ## License This project uses the [GNU AGPL v3](LICENSE) license. ## Acknowledgments - **[MaaFramework](https://github.com/MaaXYZ/MaaFramework)** - Provides a powerful automation framework - **[FastMCP](https://github.com/jlowin/fastmcp)** - Simplifies MCP server development - **[Model Context Protocol](https://modelcontextprotocol.io/)** - Defines AI tool integration standards

MaaMCP

Content

Connection Info

You Might Also Like

markitdown

servers

Time

Filesystem

Sequential Thinking

git

MaaMCP

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

markitdown

servers

Time

Filesystem

Sequential Thinking

git