Content
# Windows MCP.Net
[English](README.en.md) | **中文**
A .NET-based Windows desktop automation MCP (Model Context Protocol) server that provides AI assistants with the ability to interact with the Windows desktop environment.
## 📋 Table of Contents
- [Features](#-features)
- [Use Cases](#-use-cases)
- [Demo Screenshots](#-demo-screenshots)
- [Tech Stack](#️-tech-stack)
- [API Documentation](#-api-documentation)
- [Project Structure](#️-project-structure)
- [Feature Extension Suggestions](#-feature-extension-suggestions)
- [Configuration](#-configuration)
- [Contribution Guidelines](#-contribution-guidelines)
- [Changelog](#-changelog)
- [Support](#-support)
## 🚀 Quick Start
### Prerequisites
- Windows operating system
- .NET 10.0 Runtime or higher
**Important Note**: This project requires .NET 10 to run. Please ensure that .NET 10 is installed on your local machine. If it is not installed, visit the [.NET 10 download page](https://dotnet.microsoft.com/zh-cn/download/dotnet/10.0) to download and install it.
### 1. MCP Client Configuration
Add the following configuration to your MCP client configuration:
#### Using Globally Installed Tools (Recommended)
```json
{
"mcpServers": {
"WindowsMCP.Net": {
"type": "stdio",
"command": "dnx",
"args": ["WindowsMCP.Net@", "--yes"],
"env": {}
}
}
}
```
#### Running Directly from Project Source (Development Mode)
**Method 1: Workspace Configuration**
Create a `.vscode/mcp.json` file in the project root directory:
```json
{
"mcpServers": {
"Windows-MCP.Net-Dev": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "src/Windows-MCP.Net.csproj"],
"cwd": "${workspaceFolder}",
"env": {}
}
}
}
```
**Method 2: User Configuration**
Run `MCP: Open User Configuration` from the VS Code command palette and add:
```json
{
"mcpServers": {
"Windows-MCP.Net-Local": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "src/Windows-MCP.Net.csproj"],
"env": {}
}
}
}
```
> **Note**: Using the project source method is convenient for development and debugging, as changes to the code take effect without needing to reinstall. VS Code version 1.102+ supports automatic discovery and management of MCP servers.
### 2. Installation and Running
#### Method 1: Global Installation (Recommended)
```bash
dotnet tool install --global WindowsMCP.Net
```
#### Method 2: Running from Source
```bash
# Clone the repository
git clone https://github.com/AIDotNet/Windows-MCP.Net.git
cd Windows-MCP.Net
# Build the project
dotnet build
# Run the project
dotnet run --project src/Windows-MCP.Net.csproj
```
### 3. Getting Started
After configuration, restart your MCP client to start using Windows desktop automation features!
## 🚀 Features
### Core Features
- **Application Launch**: Start applications from the Start menu by name
- **PowerShell Integration**: Execute PowerShell commands and return results
- **Desktop State Capture**: Get the current desktop state, including active applications, UI elements, etc.
- **Clipboard Operations**: Copy and paste text content
- **Mouse Operations**: Click, drag, and move the mouse cursor
- **Keyboard Operations**: Text input, key operations, and keyboard shortcuts
- **Window Management**: Resize and reposition windows, switch applications
- **Scrolling Operations**: Scroll at specified coordinates
- **Web Scraping**: Retrieve web content and convert it to Markdown format
- **Browser Operations**: Open specified URLs in the default browser
- **Screenshot Functionality**: Capture the screen and save it to a temporary directory
- **File System Operations**: Create, read, write, copy, move, and delete files and directories
- **OCR Text Recognition**: Extract text from the screen or specified areas, find text locations
- **System Control**: Adjust screen brightness, system volume, screen resolution, and other system settings
- **Wait Control**: Add delays between operations
### Supported Tools
## Desktop Automation Tools
| Tool Name | Function Description |
|-----------|----------------------|
| **LaunchTool** | Launch applications from the Start menu |
| **PowershellTool** | Execute PowerShell commands and return status codes |
| **StateTool** | Capture desktop state information, including applications and UI elements |
| **ClipboardTool** | Clipboard copy and paste operations |
| **ClickTool** | Mouse click operations (supports left, right, middle clicks, single, double, triple clicks) |
| **TypeTool** | Input text at specified coordinates, supports clear and enter |
| **ResizeTool** | Resize and reposition windows |
| **SwitchTool** | Switch to specified application window |
| **ScrollTool** | Scroll at specified coordinates or current mouse position |
| **DragTool** | Drag from source coordinates to target coordinates |
| **MoveTool** | Move the mouse cursor to specified coordinates |
| **ShortcutTool** | Execute keyboard shortcut combinations |
| **KeyTool** | Press a single keyboard key |
| **WaitTool** | Pause execution for a specified number of seconds |
| **ScrapeTool** | Scrape web content and convert it to Markdown format |
| **ScreenshotTool** | Capture the screen and save it to a temporary directory, returning the image path |
| **OpenBrowserTool** | Open specified URLs in the default browser |
## File System Tools
| Tool Name | Function Description |
|-----------|----------------------|
| **ReadFileTool** | Read the content of a specified file |
| **WriteFileTool** | Write content to a file |
| **CreateFileTool** | Create a new file and write specified content |
| **CopyFileTool** | Copy a file to a specified location |
| **MoveFileTool** | Move or rename a file |
| **DeleteFileTool** | Delete a specified file |
| **GetFileInfoTool** | Get file information (size, creation time, etc.) |
| **ListDirectoryTool** | List files and subdirectories in a directory |
| **CreateDirectoryTool** | Create a new directory |
| **DeleteDirectoryTool** | Delete a directory and its contents |
| **SearchFilesTool** | Search for files in a specified directory |
## OCR Tools
| Tool Name | Function Description |
|-----------|----------------------|
| **ExtractTextFromScreenTool** | Use OCR to extract text from the entire screen |
| **ExtractTextFromRegionTool** | Use OCR to extract text from a specified area of the screen |
| **FindTextOnScreenTool** | Use OCR to find specified text on the screen |
| **GetTextCoordinatesTool** | Get the coordinates of text on the screen |
| **ExtractTextFromFileTool** | Use OCR to extract text from image files |
## UI Element Recognition Tools
| Tool Name | Function Description |
|-----------|----------------------|
| **FindElementByTextTool** | Find UI elements by text content |
| **FindElementByClassNameTool** | Find UI elements by class name |
| **FindElementByAutomationIdTool** | Find UI elements by automation ID |
| **GetElementPropertiesTool** | Get property information of elements at specified coordinates |
| **WaitForElementTool** | Wait for specified elements to appear on the interface |
## System Control Tools
| Tool Name | Function Description |
|-----------|----------------------|
| **BrightnessTool** | Adjust screen brightness, supports increase, decrease, and setting specific percentages |
| **VolumeTool** | Adjust system volume, supports increase, decrease, and setting specific percentages |
| **ResolutionTool** | Set screen resolution (high, medium, low) |
## 💡 Use Cases
### 🤖 AI Assistant Desktop Automation
- **Intelligent Customer Service Bot**: AI assistants can automatically operate Windows applications to help users complete complex desktop tasks
- **Voice Assistant Integration**: Control desktop applications through voice commands combined with speech recognition
- **Smart Office Assistant**: AI assistants automatically handle daily office tasks such as document organization and email sending
### 📊 Office Automation
- **Data Entry Automation**: Automatically extract data from web pages or documents and input it into Excel or other applications
- **Report Generation**: Automatically collect system information and screenshots to generate formatted report documents
- **Batch File Processing**: Automatically organize, rename, and categorize large numbers of files and documents
- **Email Automation**: Automatically send periodic reports and notification emails
### 🧪 Software Testing and Quality Assurance
- **UI Automation Testing**: Simulate user operations to automatically test the functionality of desktop applications
- **Regression Testing**: Automatically execute repetitive test cases to ensure software quality
- **Performance Monitoring**: Automatically collect application performance data and generate monitoring reports
- **Bug Reproduction**: Automatically reproduce user-reported issues to assist developers in debugging
### 🎯 Business Process Automation
- **Customer Service**: Automatically handle customer requests and update CRM systems
- **Order Processing**: Automatically collect order information from multiple channels and input it into the system
- **Inventory Management**: Automatically update inventory data and generate restock reminders
- **Financial Reconciliation**: Automatically compare financial data from different systems and mark discrepancies
### 🔍 Data Collection and Analysis
- **Web Data Scraping**: Automatically collect product prices, news, and other information from multiple websites
- **Competitor Analysis**: Regularly collect product information and pricing data from competitors
- **Market Research**: Automatically collect and organize market data to generate analysis reports
- **Social Media Monitoring**: Monitor brand mentions and automatically collect user feedback
### 🎮 Gaming and Entertainment
- **Game Assistance**: Automatically perform repetitive game tasks (please adhere to game rules)
- **Live Streaming Assistant**: Automatically manage live streaming software, switch scenes, and send messages
- **Media Management**: Automatically organize music and video files, updating the media library
### 🏥 Healthcare and Wellness
- **Medical Record Entry**: Automatically convert paper medical records into electronic format
- **Medical Image Analysis**: Combine OCR technology to automatically extract key information from medical reports
- **Appointment Management**: Automatically handle patient appointment requests and update hospital management systems
### 🏫 Education and Training
- **Online Exams**: Automatically grade multiple-choice questions and generate score reports
- **Course Management**: Automatically update course information and send notifications to students
- **Learning Progress Tracking**: Automatically record students' learning activities and generate progress reports
### 🏭 Manufacturing and Logistics
- **Production Data Collection**: Automatically collect data from production equipment and update ERP systems
- **Quality Inspection**: Combine image recognition to automatically inspect product quality
- **Logistics Tracking**: Automatically update cargo status and send tracking information to customers
### 🔧 System Operations and Maintenance
- **Server Monitoring**: Automatically check server status and generate monitoring reports
- **Log Analysis**: Automatically analyze system logs to identify abnormal patterns
- **Backup Management**: Automatically perform data backups and verify backup integrity
- **Software Deployment**: Automate software installation and configuration processes
## 📸 Demo Screenshots
### Text Input Demo
Automatically input text into Notepad using TypeTool:

### Web Search Demo
Open and search web content using ScrapeTool:

### 📹 Demo Video
Complete demonstration of desktop automation operations:
[Web Search Demo](assets/video.mp4)
## 🛠️ Tech Stack
- **.NET 10.0**: Based on the latest .NET framework
- **Model Context Protocol**: Communication using MCP protocol
- **Microsoft.Extensions.Hosting**: Application hosting framework
- **Serilog**: Structured logging
- **HtmlAgilityPack**: HTML parsing and web scraping
- **ReverseMarkdown**: HTML to Markdown conversion
## 🏗️ Project Structure
```
src/
├── Windows-MCP.Net/ # Main project
│ ├── .mcp/ # MCP server configuration
│ │ └── server.json # Server configuration file
│ ├── Exceptions/ # Custom exception classes (to be extended)
│ ├── Interface/ # Service interface definitions
│ │ ├── IDesktopService.cs # Desktop service interface
│ │ ├── IFileSystemService.cs # File system service interface
│ │ └── IOcrService.cs # OCR service interface
│ ├── Models/ # Data models (to be extended)
│ ├── Prompts/ # Prompt templates (to be extended)
│ ├── Services/ # Core service implementations
│ │ ├── DesktopService.cs # Desktop operation service
│ │ ├── FileSystemService.cs # File system service
│ │ └── OcrService.cs # OCR service
│ ├── Tools/ # MCP tool implementations
│ │ ├── Desktop/ # Desktop operation tools
│ │ │ ├── ClickTool.cs # Click tool
│ │ │ ├── ClipboardTool.cs # Clipboard tool
│ │ │ ├── DragTool.cs # Drag tool
│ │ │ ├── GetWindowInfoTool.cs # Window info tool
│ │ │ ├── KeyTool.cs # Key tool
│ │ │ ├── LaunchTool.cs # Launch application tool
│ │ │ ├── MoveTool.cs # Mouse move tool
│ │ │ ├── OpenBrowserTool.cs # Browser open tool
│ │ │ ├── PowershellTool.cs # PowerShell execution tool
│ │ │ ├── ResizeTool.cs # Window resize tool
│ │ │ ├── ScrapeTool.cs # Web scraping tool
│ │ │ ├── ScreenshotTool.cs # Screenshot tool
│ │ │ ├── ScrollTool.cs # Scroll tool
│ │ │ ├── ShortcutTool.cs # Shortcut tool
│ │ │ ├── StateTool.cs # Desktop state tool
│ │ │ ├── SwitchTool.cs # Application switch tool
│ │ │ ├── TypeTool.cs # Text input tool
│ │ │ ├── UIElementTool.cs # UI element operation tool
│ │ │ └── WaitTool.cs # Wait tool
│ │ ├── FileSystem/ # File system tools
│ │ │ ├── CopyFileTool.cs # File copy tool
│ │ │ ├── CreateDirectoryTool.cs # Directory create tool
│ │ │ ├── CreateFileTool.cs # File create tool
│ │ │ ├── DeleteDirectoryTool.cs # Directory delete tool
│ │ │ ├── DeleteFileTool.cs # File delete tool
│ │ │ ├── GetFileInfoTool.cs # File info tool
│ │ │ ├── ListDirectoryTool.cs # Directory list tool
│ │ │ ├── MoveFileTool.cs # File move tool
│ │ │ ├── ReadFileTool.cs # File read tool
│ │ │ ├── SearchFilesTool.cs # File search tool
│ │ │ └── WriteFileTool.cs # File write tool
│ │ └── OCR/ # OCR recognition tools
│ │ ├── ExtractTextFromRegionTool.cs # Region text extraction tool
│ │ ├── ExtractTextFromScreenTool.cs # Screen text extraction tool
│ │ ├── FindTextOnScreenTool.cs # Screen text finding tool
│ │ └── GetTextCoordinatesTool.cs # Text coordinates retrieval tool
│ ├── Program.cs # Program entry point
│ └── Windows-MCP.Net.csproj # Project file
└── Windows-MCP.Net.Test/ # Test project
├── DesktopToolsExtendedTest.cs # Desktop tools extension test
├── FileSystemToolsExtendedTest.cs # File system tools extension test
├── OCRToolsExtendedTest.cs # OCR tools extension test
├── ToolTest.cs # Tool base test
├── UIElementToolTest.cs # UI element tool test
└── Windows-MCP.Net.Test.csproj # Test project file
```
## 🚧 Feature Extension Suggestions
### Planned Features
#### Advanced UI Recognition and Interaction
- **Enhanced UI Element Recognition**: Support for more UI frameworks (WPF, WinForms, UWP)
- **OCR Text Recognition Optimization**: Multi-language support, improved recognition accuracy
- **Intelligent Wait Mechanism**: Dynamically wait for elements to load
#### Enhanced File System Operations
- **Advanced File Search**: Support for content search and regular expression matching
- **Batch File Operations**: Support for batch copying, moving, and renaming
- **File Monitoring**: Real-time monitoring of file system changes
#### System Monitoring and Performance Analysis
- **System Resource Monitoring**: CPU, memory, disk, and network usage
- **Process Management**: Retrieve process lists, monitor performance, and control processes
- **Performance Analysis Reports**: Generate detailed system performance reports
#### Multimedia Processing Capabilities
- **Audio Control**: System volume control and audio device management
- **Image Processing**: Image resizing, cropping, and format conversion
- **Screen Recording**: Support for screen recording and playback
#### Networking and Communication Features
- **Network Diagnostics**: Ping, port scanning, connectivity testing
- **HTTP Client**: Support for RESTful API calls
- **WiFi Management**: WiFi network scanning and connection management
#### Security and Permission Management
- **Permission Checks**: User permission validation and management
- **Data Encryption**: Sensitive data encrypted storage
- **Operation Auditing**: Complete operation logs and audit trails
### Development Roadmap
#### Phase One (High Priority) - Core Feature Enhancements
- ✅ UI Element Recognition Tool (Windows API implementation completed)
- 🔄 File Management Tool Enhancements
- 📋 System Monitoring Tools
- 🔒 Basic Security Tools
#### Phase Two (Medium Priority) - Feature Extensions
- 📋 OCR Text Recognition Optimization
- 📋 Advanced File Search
- 📋 Audio Control Tools
- 📋 Network Diagnostic Tools
- 📋 Excel Operation Support
#### Phase Three (Low Priority) - Advanced Features
- 📋 Image Processing Tools
- 📋 Task Scheduling System
- 📋 Database Operation Support
- 📋 Macro Recording and Playback
## 🔧 Configuration
### Logging Configuration
The project uses Serilog for logging, with log files stored in the `logs/` directory:
- Console Output: Real-time log display
- File Output: Daily rolling, retaining logs for 31 days
- Log Level: Debug and above
### Environment Variables
| Variable Name | Description | Default Value |
|---------------|-------------|---------------|
| `ASPNETCORE_ENVIRONMENT` | Runtime environment | `Production` |
## 📝 License
This project is open-source under the MIT License. Please refer to the [LICENSE](LICENSE) file for details.
## 🔗 Related Links
- [Model Context Protocol](https://modelcontextprotocol.io/)
- [.NET Documentation](https://docs.microsoft.com/dotnet/)
- [Windows API Documentation](https://docs.microsoft.com/windows/win32/)
## 🤝 Contribution Guidelines
We welcome community contributions! If you would like to contribute to the project, please follow these steps:
### Setting Up the Development Environment
1. **Clone the Repository**
```bash
git clone https://github.com/AIDotNet/Windows-MCP.Net.git
cd Windows-MCP.Net
```
2. **Install Dependencies**
```bash
dotnet restore
```
3. **Run Tests**
```bash
dotnet test
```
4. **Build the Project**
```bash
dotnet build
```
### Contribution Process
1. Fork this repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Create a Pull Request
### Code Standards
- Follow C# coding standards
- Add unit tests for new features
- Update relevant documentation
- Ensure all tests pass
### Issue Reporting
When reporting issues, please provide:
- Operating system version
- .NET version
- Detailed error messages
- Steps to reproduce
## 📞 Support
If you encounter issues or have suggestions, please:
1. Check [Issues](https://github.com/xuzeyu91/Windows-MCP.Net/issues)
2. Create a new Issue
3. Participate in discussions
4. Check the [Wiki](https://github.com/xuzeyu91/Windows-MCP.Net/wiki) for more help
---
**Note**: This tool requires appropriate Windows permissions to perform desktop automation operations. Please ensure it is used in a trusted environment.
**Disclaimer**: When using this tool for automation operations, please comply with relevant laws and software usage agreements. The developers are not responsible for any consequences arising from misuse of the tool.
You Might Also Like
UI-TARS-desktop
UI-TARS-desktop is part of the TARS Multimodal AI Agent stack.
inbox-zero
Inbox Zero is an open source AI email assistant to help you manage emails...
DesktopCommanderMCP
Desktop Commander MCP allows AI-driven file management and terminal command...
kmcp
kmcp is an MCP Server designed for efficient management and deployment.
deep-research-mcp
An AI research assistant for deep topic exploration and report generation.
mcp-manager
MCP Manager is a desktop app for managing MCP servers on MacOS, enhancing...