Content
# Windows MCP.Net
[English](README.en.md) | **中文**
A .NET-based Windows desktop automation MCP (Model Context Protocol) server that provides AI assistants with the ability to interact with the Windows desktop environment.
## 📋 Table of Contents
- [Features](#-features)
- [Use Cases](#-use-cases)
- [Demo Screenshots](#-demo-screenshots)
- [Technology Stack](#️-technology-stack)
- [API Documentation](#-api-documentation)
- [Project Structure](#️-project-structure)
- [Feature Extension Suggestions](#-feature-extension-suggestions)
- [Configuration](#-configuration)
- [Contribution Guide](#-contribution-guide)
- [Update Log](#-update-log)
- [Support](#-support)
## 🚀 Quick Start
### Prerequisites
- Windows operating system
- .NET 10.0 Runtime or later
**Important Note**: This project requires .NET 10 to run. Please ensure that .NET 10 is installed locally. If not, please visit the [.NET 10 download page](https://dotnet.microsoft.com/zh-cn/download/dotnet/10.0) to download and install it.
### 1. MCP Client Configuration
Add the following configuration to your MCP client configuration:
#### Using Globally Installed Tools (Recommended)
```json
{
"mcpServers": {
"WindowsMCP.Net": {
"type": "stdio",
"command": "dnx",
"args": ["WindowsMCP.Net@", "--yes"],
"env": {}
}
}
}
```
#### Running Directly from Project Source Code (Development Mode)
**Method 1: Workspace Configuration**
Create a `.vscode/mcp.json` file in the project root directory:
```json
{
"mcpServers": {
"Windows-MCP.Net-Dev": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "src/Windows-MCP.Net.csproj"],
"cwd": "${workspaceFolder}",
"env": {}
}
}
}
```
**Method 2: User Configuration**
Run `MCP: Open User Configuration` through the VS Code command palette and add:
```json
{
"mcpServers": {
"Windows-MCP.Net-Local": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "src/Windows-MCP.Net.csproj"],
"env": {}
}
}
}
```
> **Note**: Using the project source code method is convenient for development and debugging. Changes to the code take effect without reinstallation. VS Code 1.102+ supports automatic discovery and management of MCP servers.
### 2. Installation and Running
#### Method 1: Global Installation (Recommended)
```bash
dotnet tool install --global WindowsMCP.Net
```
#### Method 2: Run from Source Code
```bash
# Clone the repository
git clone https://github.com/AIDotNet/Windows-MCP.Net.git
cd Windows-MCP.Net
# Build the project
dotnet build
# Run the project
dotnet run --project src/Windows-MCP.Net.csproj
```
### 3. Get Started
After completing the configuration, restart your MCP client to start using Windows desktop automation features!
## 🚀 Features
### Core Features
- **Application Launch**: Launch applications from the Start menu by name
- **PowerShell Integration**: Execute PowerShell commands and return results
- **Desktop State Capture**: Get the current desktop state, including active applications, UI elements, etc.
- **Clipboard Operations**: Copy and paste text content
- **Mouse Operations**: Click, drag, move the mouse cursor
- **Keyboard Operations**: Text input, key presses, shortcut combinations
- **Window Management**: Adjust window size, position, switch applications
- **Scroll Operations**: Perform scroll operations at specified coordinates
- **Webpage Scraping**: Get webpage content and convert it to Markdown format
- **Browser Operations**: Open the specified URL in the default browser
- **Screenshot Function**: Capture the screen and save it to a temporary directory
- **File System Operations**: Create, read, write, copy, move, delete files and directories
- **OCR Text Recognition**: Extract text from the screen or a specified area, find text positions
- **System Control**: Adjust screen brightness, system volume, screen resolution, and other system settings
- **Wait Control**: Add delays between operations
### Supported Tools
## Desktop Tools
| Tool Name | Description |
|---------|----------|
| **LaunchTool** | Launch applications from the Start menu |
| **PowershellTool** | Execute PowerShell commands and return the status code |
| **StateTool** | Capture desktop state information, including applications and UI elements |
| **ClipboardTool** | Clipboard copy and paste operations |
| **ClickTool** | Mouse click operations (supports left, right, middle buttons, single, double, triple clicks) |
| **TypeTool** | Enter text at specified coordinates, supports clearing and carriage return |
| **ResizeTool** | Adjust window size and position |
| **SwitchTool** | Switch to the specified application window |
| **ScrollTool** | Scroll at specified coordinates or the current mouse position |
| **DragTool** | Drag from source coordinates to target coordinates |
| **MoveTool** | Move the mouse cursor to specified coordinates |
| **ShortcutTool** | Execute keyboard shortcut combinations |
| **KeyTool** | Press a single keyboard key |
| **WaitTool** | Pause execution for the specified number of seconds |
| **ScrapeTool** | Scrape webpage content and convert it to Markdown format |
| **ScreenshotTool** | Capture the screen and save it to a temporary directory, return the image path |
| **OpenBrowserTool** | Open the specified URL in the default browser |
## FileSystem Tools
| Tool Name | Description |
|---------|----------|
| **ReadFileTool** | Read the content of the specified file |
| **WriteFileTool** | Write content to a file |
| **CreateFileTool** | Create a new file and write the specified content |
| **CopyFileTool** | Copy a file to the specified location |
| **MoveFileTool** | Move or rename a file |
| **DeleteFileTool** | Delete the specified file |
| **GetFileInfoTool** | Get file information (size, creation time, etc.) |
| **ListDirectoryTool** | List files and subdirectories in a directory |
| **CreateDirectoryTool** | Create a new directory |
| **DeleteDirectoryTool** | Delete a directory and its contents |
| **SearchFilesTool** | Search for files in the specified directory |
## OCR Tools
| Tool Name | Description |
|---------|----------|
| **ExtractTextFromScreenTool** | Use OCR to extract text from the entire screen |
| **ExtractTextFromRegionTool** | Use OCR to extract text from a specified area of the screen |
| **FindTextOnScreenTool** | Use OCR to find the specified text on the screen |
| **GetTextCoordinatesTool** | Get the coordinate position of the text on the screen |
| **ExtractTextFromFileTool** | Use OCR to extract text from an image file |
## UI Element Recognition Tools
| Tool Name | Description |
|---------|----------|
| **FindElementByTextTool** | Find UI elements by text content |
| **FindElementByClassNameTool** | Find UI elements by class name |
| **FindElementByAutomationIdTool** | Find UI elements by automation ID |
| **GetElementPropertiesTool** | Get property information of the element at the specified coordinates |
| **WaitForElementTool** | Wait for the specified element to appear on the interface |
## SystemControl Tools
| Tool Name | Description |
|---------|----------|
| **BrightnessTool** | Adjust screen brightness, supports increasing, decreasing, and setting a specific percentage |
| **VolumeTool** | Adjust system volume, supports increasing, decreasing, and setting a specific percentage |
| **ResolutionTool** | Set screen resolution (high, medium, low) |
## 💡 Use Cases
### 🤖 AI Assistant Desktop Automation
- **Intelligent Customer Service Robot**: AI assistants can automatically operate Windows applications to help users complete complex desktop tasks
- **Voice Assistant Integration**: Combined with voice recognition, control desktop applications through voice commands
- **Intelligent Office Assistant**: AI assistants automatically handle daily office tasks, such as document organization, email sending, etc.
### 📊 Office Automation
- **Data Entry Automation**: Automatically extract data from web pages or documents and enter it into Excel or other applications
- **Report Generation**: Automatically collect system information, screenshots, and generate formatted report documents
- **Batch File Processing**: Automatically organize, rename, and classify large numbers of files and documents
- **Email Automation**: Automatically send periodic reports and notification emails
### 🧪 Software Testing and Quality Assurance
- **UI Automation Testing**: Simulate user operations to automatically test the functionality of desktop applications
- **Regression Testing**: Automatically execute repetitive test cases to ensure software quality
- **Performance Monitoring**: Automatically collect application performance data and generate monitoring reports
- **Bug Reproduction**: Automatically reproduce user-reported issues to assist developers in debugging
### 🎯 Business Process Automation
- **Customer Service**: Automatically process customer requests and update the CRM system
- **Order Processing**: Automatically collect order information from multiple channels and enter it into the system
- **Inventory Management**: Automatically update inventory data and generate replenishment reminders
- **Financial Reconciliation**: Automatically compare financial data from different systems and mark differences
### 🔍 Data Collection and Analysis
- **Webpage Data Scraping**: Automatically collect product prices, news, and other information from multiple websites
- **Competitive Product Analysis**: Regularly collect product information and price data from competitors
- **Market Research**: Automatically collect and organize market data to generate analysis reports
- **Social Media Monitoring**: Monitor brand mentions and automatically collect user feedback
### 🎮 Games and Entertainment
- **Game Assistance**: Automatically perform repetitive game tasks (please comply with game rules)
- **Live Streaming Assistant**: Automatically manage live streaming software, switch scenes, and send messages
- **Media Management**: Automatically organize music and video files, update the media library
### 🏥 Healthcare
- **Medical Record Entry**: Automatically convert paper medical records into electronic format
- **Medical Image Analysis**: Combined with OCR technology, automatically extract key information from medical reports
- **Appointment Management**: Automatically process patient appointment requests and update the hospital management system
### 🏫 Education and Training
- **Online Exams**: Automatically grade multiple-choice questions and generate grade reports
- **Course Management**: Automatically update course information and send notifications to students
- **Learning Progress Tracking**: Automatically record students' learning activities and generate progress reports
### 🏭 Manufacturing and Logistics
- **Production Data Collection**: Automatically collect data from production equipment and update the ERP system
- **Quality Inspection**: Combined with image recognition, automatically detect product quality
- **Logistics Tracking**: Automatically update the status of goods and send tracking information to customers
### 🔧 System Maintenance
- **Server Monitoring**: Automatically check server status and generate monitoring reports
- **Log Analysis**: Automatically analyze system logs and identify abnormal patterns
- **Backup Management**: Automatically perform data backups and verify backup integrity
- **Software Deployment**: Automate the software installation and configuration process
## 📸 Demo Screenshots
### Text Input Demo
Automatically enter text in Notepad through TypeTool:

### Webpage Search Demo
Open and search webpage content using ScrapeTool:

### 📹 Demo Video
Complete desktop automation operation demo:
[Webpage Search Demo](assets/video.mp4)
## 🛠️ Technology Stack
- **.NET 10.0**: Based on the latest .NET framework
- **Model Context Protocol**: Use MCP protocol for communication
- **Microsoft.Extensions.Hosting**: Application hosting framework
- **Serilog**: Structured logging
- **HtmlAgilityPack**: HTML parsing and webpage scraping
- **ReverseMarkdown**: HTML to Markdown conversion
## 🏗️ Project Structure
```
src/
├── Windows-MCP.Net/ # Main project
│ ├── .mcp/ # MCP server configuration
│ │ └── server.json # Server configuration file
│ ├── Exceptions/ # Custom exception classes (to be extended)
│ ├── Interface/ # Service interface definitions
│ │ ├── IDesktopService.cs # Desktop service interface
│ │ ├── IFileSystemService.cs # File system service interface
│ │ └── IOcrService.cs # OCR service interface
│ ├── Models/ # Data models (to be extended)
│ ├── Prompts/ # Prompt templates (to be extended)
│ ├── Services/ # Core service implementations
│ │ ├── DesktopService.cs # Desktop operation service
│ │ ├── FileSystemService.cs # File system service
│ │ └── OcrService.cs # OCR service
│ ├── Tools/ # MCP tool implementations
│ │ ├── Desktop/ # Desktop operation tools
│ │ │ ├── ClickTool.cs # Click tool
│ │ │ ├── ClipboardTool.cs # Clipboard tool
│ │ │ ├── DragTool.cs # Drag tool
│ │ │ ├── GetWindowInfoTool.cs # Window information tool
│ │ │ ├── KeyTool.cs # Key tool
│ │ │ ├── LaunchTool.cs # Launch application tool
│ │ │ ├── MoveTool.cs # Mouse movement tool
│ │ │ ├── OpenBrowserTool.cs # Browser opening tool
│ │ │ ├── PowershellTool.cs # PowerShell execution tool
│ │ │ ├── ResizeTool.cs # Window resizing tool
│ │ │ ├── ScrapeTool.cs # Webpage scraping tool
│ │ │ ├── ScreenshotTool.cs # Screenshot tool
│ │ │ ├── ScrollTool.cs # Scroll tool
│ │ │ ├── ShortcutTool.cs # Shortcut tool
│ │ │ ├── StateTool.cs # Desktop state tool
│ │ │ ├── SwitchTool.cs # Application switching tool
│ │ │ ├── TypeTool.cs # Text input tool
│ │ │ ├── UIElementTool.cs # UI element operation tool
│ │ │ └── WaitTool.cs # Wait tool
│ │ ├── FileSystem/ # File system tools
│ │ │ ├── CopyFileTool.cs # File copying tool
│ │ │ ├── CreateDirectoryTool.cs # Directory creation tool
│ │ │ ├── CreateFileTool.cs # File creation tool
│ │ │ ├── DeleteDirectoryTool.cs # Directory deletion tool
│ │ │ ├── DeleteFileTool.cs # File deletion tool
│ │ │ ├── GetFileInfoTool.cs # File information tool
│ │ │ ├── ListDirectoryTool.cs # Directory listing tool
│ │ │ ├── MoveFileTool.cs # File moving tool
│ │ │ ├── ReadFileTool.cs # File reading tool
│ │ │ ├── SearchFilesTool.cs # File searching tool
│ │ │ └── WriteFileTool.cs # File writing tool
│ │ └── OCR/ # OCR recognition tools
│ │ ├── ExtractTextFromRegionTool.cs # Region text extraction tool
│ │ ├── ExtractTextFromScreenTool.cs # Screen text extraction tool
│ │ ├── FindTextOnScreenTool.cs # Screen text finding tool
│ │ └── GetTextCoordinatesTool.cs # Text coordinate acquisition tool
│ ├── Program.cs # Program entry point
│ └── Windows-MCP.Net.csproj # Project file
└── Windows-MCP.Net.Test/ # Test project
├── DesktopToolsExtendedTest.cs # Desktop tool extended tests
├── FileSystemToolsExtendedTest.cs # File system tool extended tests
├── OCRToolsExtendedTest.cs # OCR tool extended tests
├── ToolTest.cs # Tool basic tests
├── UIElementToolTest.cs # UI element tool tests
└── Windows-MCP.Net.Test.csproj # Test project file
```
## 🚧 Feature Extension Suggestions
### Planned Features
#### Advanced UI Recognition and Interaction
- **Enhanced UI Element Recognition**: Support more UI frameworks (WPF, WinForms, UWP)
- **Optimized OCR Text Recognition**: Multilingual support, improved recognition accuracy
- **Intelligent Waiting Mechanism**: Dynamically wait for elements to finish loading
#### Enhanced File System Operations
- **Advanced File Search**: Support content search, regular expression matching
- **Batch File Operations**: Support batch copying, moving, renaming
- **File Monitoring**: Real-time monitoring of file system changes
#### System Monitoring and Performance Analysis
- **System Resource Monitoring**: CPU, memory, disk, network usage
- **Process Management**: Process list retrieval, performance monitoring, process control
- **Performance Analysis Report**: Generate detailed system performance reports
#### Multimedia Processing Capabilities
- **Audio Control**: System volume control, audio device management
- **Image Processing**: Image scaling, cropping, format conversion
- **Screen Recording**: Support screen recording and playback
#### Network and Communication Functions
- **Network Diagnosis**: Ping, port scanning, connectivity testing
- **HTTP Client**: Support RESTful API calls
- **WiFi Management**: WiFi network scanning and connection management
#### Security and Permissions Management
- **Permission Check**: User permission verification and management
- **Data Encryption**: Sensitive data encryption storage
- **Operation Audit**: Complete operation logs and audit trails
### Development Roadmap
#### Phase 1 (High Priority) - Core Function Enhancement
- ✅ UI element recognition tool (Windows API implementation completed)
- 🔄 File management tool enhancement
- 📋 System monitoring tool
- 🔒 Basic security tools
#### Phase 2 (Medium Priority) - Function Expansion
- 📋 OCR text recognition optimization
- 📋 Advanced file search
- 📋 Audio control tool
- 📋 Network diagnosis tool
- 📋 Excel operation support
#### Phase 3 (Low Priority) - Advanced Features
- 📋 Image processing tool
- 📋 Task scheduling system
- 📋 Database operation support
- 📋 Macro recording and playback
## 🔧 Configuration
### Log Configuration
The project uses Serilog for logging, and the log files are saved in the `logs/` directory:
- Console output: Real-time log display
- File output: Rolling daily, retaining for 31 days
- Log level: Debug and above
### Environment Variables
| Variable Name | Description | Default Value |
|---------------|-------------|---------------|
| `ASPNETCORE_ENVIRONMENT` | Running environment | `Production` |
## 📝 License
This project is open source under the MIT license. See the [LICENSE](LICENSE) file for details.
## 🔗 Related Links
- [Model Context Protocol](https://modelcontextprotocol.io/)
- [.NET Documentation](https://docs.microsoft.com/dotnet/)
- [Windows API Documentation](https://docs.microsoft.com/windows/win32/)
## 🤝 Contribution Guide
We welcome community contributions! If you would like to contribute to the project, please follow these steps:
### Development Environment Setup
1. **Clone the repository**
```bash
git clone https://github.com/AIDotNet/Windows-MCP.Net.git
cd Windows-MCP.Net
```
2. **Install dependencies**
```bash
dotnet restore
```
3. **Run tests**
```bash
dotnet test
```
4. **Build the project**
```bash
dotnet build
```
### Contribution Process
1. Fork this repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Create a Pull Request
### Code Specifications
- Follow the C# coding standards
- Add unit tests for new features
- Update relevant documentation
- Ensure all tests pass
### Issue Reporting
When reporting an issue, please provide:
- Operating system version
- .NET version
- Detailed error message
- Steps to reproduce
## 📞 Support
If you encounter problems or have suggestions, please:
1. Check [Issues](https://github.com/xuzeyu91/Windows-MCP.Net/issues)
2. Create a new Issue
3. Participate in discussions
4. Check [Wiki](https://github.com/xuzeyu91/Windows-MCP.Net/wiki) for more help
---
**Note**: This tool requires appropriate Windows permissions to perform desktop automation operations. Please ensure that it is used in a trusted environment.
**Disclaimer**: When using this tool for automation, please comply with relevant laws, regulations, and software usage agreements. The developers are not responsible for any liability arising from misuse of the tool.
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
everything-claude-code
Complete Claude Code configuration collection - agents, skills, hooks,...
awesome-claude-skills
A curated list of awesome Claude Skills, resources, and tools for...
antigravity-awesome-skills
The Ultimate Collection of 130+ Agentic Skills for Claude...
memU
MemU is a memory framework for LLM and AI agents, organizing multimodal...
pipedream
Connect APIs, remarkably fast. Free for developers.