Content

# UI Vision Analyzer MCP Server [![npm version](https://badge.fury.io/js/auu-uivision-mcp.svg)](https://badge.fury.io/js/auu-uivision-mcp) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org/) A Model Context Protocol (MCP) server that analyzes software user interface screenshots using Google's Gemini AI vision capabilities. This server provides detailed descriptions of UI elements, layout structure, functionality, and accessibility information. ## Features - **UI Analysis**: Comprehensive analysis of software interfaces including buttons, forms, navigation, and layout - **Multiple Input Sources**: Support for local files, base64 data, and image URLs - **Flexible Prompts**: Use default UI analysis prompts or provide custom analysis instructions - **Multiple Models**: Support for various Gemini models (2.0 Flash, 1.5 Pro, 1.5 Flash, etc.) - **Format Support**: PNG, JPEG, GIF, WebP, BMP image formats - **Size Validation**: Configurable image size limits with validation - **Error Handling**: Comprehensive error reporting and validation ## Installation ### Global Installation ```bash npm install -g auu-uivision-mcp ``` ### Local Installation ```bash npm install auu-uivision-mcp ``` ### Direct Usage with npx ```bash # Set API key and run export GEMINI_API_KEY=your_gemini_api_key_here npx auu-uivision-mcp # Or on Windows: set GEMINI_API_KEY=your_gemini_api_key_here npx auu-uivision-mcp ``` ### Development Installation ```bash git clone https://github.com/superauu/auu-uivision-mcp.git cd auu-uivision-mcp npm install npm run build ``` ## Configuration ### Required Environment Variables You must set the `GEMINI_API_KEY` environment variable before running the server. #### Linux/macOS: ```bash export GEMINI_API_KEY=your_gemini_api_key_here npx auu-uivision-mcp ``` #### Windows (Command Prompt): ```cmd set GEMINI_API_KEY=your_gemini_api_key_here npx auu-uivision-mcp ``` #### Windows (PowerShell): ```powershell $env:GEMINI_API_KEY="your_gemini_api_key_here" npx auu-uivision-mcp ``` **Get your API key from**: https://makersuite.google.com/app/apikey ### Optional Environment Variables ```bash # Configure the default Gemini model to use # Available models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro export GEMINI_MODEL=gemini-2.0-flash # Maximum image size in bytes (default: 10MB) export MAX_IMAGE_SIZE=20971520 # Supported image formats (default: png,jpg,jpeg,gif,webp,bmp) export SUPPORTED_FORMATS=png,jpg,jpeg,gif,webp,bmp ``` ## Usage ### MCP Client Configuration Add this server to your MCP client configuration: #### Claude Desktop ```json { "mcpServers": { "uivision-analyzer": { "command": "auu-uivision-mcp", "env": { "GEMINI_API_KEY": "your_api_key_here" } } } } ``` #### Cline (VS Code Extension) ```json { "mcpServers": { "uivision-analyzer": { "command": "npx", "args": ["auu-uivision-mcp"], "env": { "GEMINI_API_KEY": "your_api_key_here" } } } } ``` ### Available Tools #### analyze_ui_screenshot Analyzes a software UI screenshot and provides detailed description. **Parameters:** - `image_path` (string, optional): Local file path to the screenshot - `image_base64` (string, optional): Base64 encoded image data - `image_url` (string, optional): URL of the image to analyze - `prompt` (string, optional): Custom analysis prompt - `model` (string, optional): Gemini model to use **Example Usage:** ```javascript // Analyze local file { "tool": "analyze_ui_screenshot", "arguments": { "image_path": "/path/to/screenshot.png", "model": "gemini-2.0-flash" } } // Analyze image from URL { "tool": "analyze_ui_screenshot", "arguments": { "image_url": "https://example.com/screenshot.jpg", "prompt": "Focus on accessibility issues and color contrast" } } // Analyze base64 image { "tool": "analyze_ui_screenshot", "arguments": { "image_base64": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...", "model": "gemini-1.5-pro" } } ``` **Response Format:** ```json { "description": "Overall description of the interface and its purpose", "elements": [ { "type": "button", "description": "Primary call-to-action button with 'Get Started' text", "position": { "x": 250, "y": 400, "width": 120, "height": 40 }, "text": "Get Started", "interactive": true } ], "layout": { "structure": "Centered card layout", "organization": "Vertical flow with clear visual hierarchy", "responsiveness": "Appears to be responsive with adaptive containers", "visualHierarchy": "Clear with prominent headline and supporting elements" }, "functionality": [ "User registration/signup workflow", "Social login integration", "Form validation and error handling" ], "accessibility": { "colorContrast": "Good contrast ratios for text readability", "textReadability": "Clear fonts with appropriate sizing", "navigationClarity": "Logical tab order and keyboard navigation", "altTextStatus": "Images appear to have descriptive alt text" } } ``` ## Development ### Project Structure ``` src/ ├── index.ts # Main MCP server entry point ├── gemini-client.ts # Gemini API integration ├── image-processor.ts # Image handling utilities ├── config.ts # Environment configuration └── types.ts # TypeScript type definitions ``` ### Scripts ```bash # Development with auto-reload npm run dev:watch # Development without auto-reload npm run dev # Build for production npm run build # Start production server npm start ``` ### Environment Setup 1. Copy `.env.example` to `.env` 2. Add your Gemini API key 3. Install dependencies: `npm install` 4. Build the project: `npm run build` ## Supported Gemini Models - `gemini-2.5-pro` - Latest high-quality model with advanced reasoning capabilities - `gemini-2.0-flash` (default) - Fast, efficient for most UI analysis tasks - `gemini-1.5-pro` - Higher quality analysis, slower processing - `gemini-1.5-flash` - Balanced speed and quality - `gemini-1.0-pro` - Legacy model support ### Model Selection via Environment Variables You can set the default model using the `GEMINI_MODEL` environment variable: ```env # Use the latest high-quality model GEMINI_MODEL=gemini-2.5-pro # Or use the fast default model GEMINI_MODEL=gemini-2.0-flash ``` You can also specify a different model per request using the `model` parameter: ```javascript { "tool": "analyze_ui_screenshot", "arguments": { "image_path": "/path/to/screenshot.png", "model": "gemini-2.5-pro" // Override default model for this request } } ``` ## Image Requirements - **Formats**: PNG, JPEG, GIF, WebP, BMP - **Maximum Size**: 10MB (configurable) - **Recommended Resolution**: 1920x1080 or higher for best results - **Content**: Clear screenshots without excessive compression artifacts ## Error Handling The server provides detailed error messages for common issues: - **Missing API Key**: Configure `GEMINI_API_KEY` environment variable - **Invalid Image**: Unsupported format or corrupted file - **Size Limits**: Image exceeds maximum allowed size - **Network Errors**: Failed to download images from URLs - **API Errors**: Gemini API quota limits or service issues ## API Rate Limits - Gemini API has usage quotas and rate limits - Consider implementing caching for repeated analysis - Monitor your API usage in the Google Cloud Console ## Troubleshooting ### Common Issues 1. **"GEMINI_API_KEY environment variable is required"** - Set the `GEMINI_API_KEY` in your `.env` file or environment variables - Get an API key from https://makersuite.google.com/app/apikey 2. **"Failed to connect to Gemini API"** - Verify your API key is valid and active - Check network connectivity - Ensure API is enabled in your Google Cloud project 3. **"Image size exceeds maximum allowed size"** - Reduce image size or increase `MAX_IMAGE_SIZE` limit - Compress images before analysis 4. **"Unsupported image format"** - Use supported formats: PNG, JPEG, GIF, WebP, BMP - Convert images to supported format before analysis ### Debug Mode Enable debug logging by setting: ```bash DEBUG=uivision:* auu-uivision-mcp ``` ## License MIT License - see LICENSE file for details. ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests if applicable 5. Submit a pull request ## Support - Create issues on GitHub for bug reports - Check the documentation for common solutions - Review the error messages for specific guidance ## Changelog ### v1.0.0 - Initial release - UI screenshot analysis with Gemini API - Support for multiple image input sources - Configurable models and parameters - Comprehensive error handling

auu-uivision-mcp

Content

Connection Info

You Might Also Like

awesome-mcp-servers

git

Appwrite

TrendRadar

oh-my-opencode

chatbox

auu-uivision-mcp

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

awesome-mcp-servers

git

Appwrite

TrendRadar

oh-my-opencode

chatbox