Content
# UI Vision Analyzer MCP Server
[](https://badge.fury.io/js/auu-uivision-mcp)
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org/)
A Model Context Protocol (MCP) server that analyzes software user interface screenshots using Google's Gemini AI vision capabilities. This server provides detailed descriptions of UI elements, layout structure, functionality, and accessibility information.
## Features
- **UI Analysis**: Comprehensive analysis of software interfaces including buttons, forms, navigation, and layout
- **Multiple Input Sources**: Support for local files, base64 data, and image URLs
- **Flexible Prompts**: Use default UI analysis prompts or provide custom analysis instructions
- **Multiple Models**: Support for various Gemini models (2.0 Flash, 1.5 Pro, 1.5 Flash, etc.)
- **Format Support**: PNG, JPEG, GIF, WebP, BMP image formats
- **Size Validation**: Configurable image size limits with validation
- **Error Handling**: Comprehensive error reporting and validation
## Installation
### Global Installation
```bash
npm install -g auu-uivision-mcp
```
### Local Installation
```bash
npm install auu-uivision-mcp
```
### Direct Usage with npx
```bash
# Set API key and run
export GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp
# Or on Windows:
set GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp
```
### Development Installation
```bash
git clone https://github.com/superauu/auu-uivision-mcp.git
cd auu-uivision-mcp
npm install
npm run build
```
## Configuration
### Required Environment Variables
You must set the `GEMINI_API_KEY` environment variable before running the server.
#### Linux/macOS:
```bash
export GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp
```
#### Windows (Command Prompt):
```cmd
set GEMINI_API_KEY=your_gemini_api_key_here
npx auu-uivision-mcp
```
#### Windows (PowerShell):
```powershell
$env:GEMINI_API_KEY="your_gemini_api_key_here"
npx auu-uivision-mcp
```
**Get your API key from**: https://makersuite.google.com/app/apikey
### Optional Environment Variables
```bash
# Configure the default Gemini model to use
# Available models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
export GEMINI_MODEL=gemini-2.0-flash
# Maximum image size in bytes (default: 10MB)
export MAX_IMAGE_SIZE=20971520
# Supported image formats (default: png,jpg,jpeg,gif,webp,bmp)
export SUPPORTED_FORMATS=png,jpg,jpeg,gif,webp,bmp
```
## Usage
### MCP Client Configuration
Add this server to your MCP client configuration:
#### Claude Desktop
```json
{
"mcpServers": {
"uivision-analyzer": {
"command": "auu-uivision-mcp",
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}
```
#### Cline (VS Code Extension)
```json
{
"mcpServers": {
"uivision-analyzer": {
"command": "npx",
"args": ["auu-uivision-mcp"],
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}
```
### Available Tools
#### analyze_ui_screenshot
Analyzes a software UI screenshot and provides detailed description.
**Parameters:**
- `image_path` (string, optional): Local file path to the screenshot
- `image_base64` (string, optional): Base64 encoded image data
- `image_url` (string, optional): URL of the image to analyze
- `prompt` (string, optional): Custom analysis prompt
- `model` (string, optional): Gemini model to use
**Example Usage:**
```javascript
// Analyze local file
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_path": "/path/to/screenshot.png",
"model": "gemini-2.0-flash"
}
}
// Analyze image from URL
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_url": "https://example.com/screenshot.jpg",
"prompt": "Focus on accessibility issues and color contrast"
}
}
// Analyze base64 image
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_base64": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
"model": "gemini-1.5-pro"
}
}
```
**Response Format:**
```json
{
"description": "Overall description of the interface and its purpose",
"elements": [
{
"type": "button",
"description": "Primary call-to-action button with 'Get Started' text",
"position": { "x": 250, "y": 400, "width": 120, "height": 40 },
"text": "Get Started",
"interactive": true
}
],
"layout": {
"structure": "Centered card layout",
"organization": "Vertical flow with clear visual hierarchy",
"responsiveness": "Appears to be responsive with adaptive containers",
"visualHierarchy": "Clear with prominent headline and supporting elements"
},
"functionality": [
"User registration/signup workflow",
"Social login integration",
"Form validation and error handling"
],
"accessibility": {
"colorContrast": "Good contrast ratios for text readability",
"textReadability": "Clear fonts with appropriate sizing",
"navigationClarity": "Logical tab order and keyboard navigation",
"altTextStatus": "Images appear to have descriptive alt text"
}
}
```
## Development
### Project Structure
```
src/
├── index.ts # Main MCP server entry point
├── gemini-client.ts # Gemini API integration
├── image-processor.ts # Image handling utilities
├── config.ts # Environment configuration
└── types.ts # TypeScript type definitions
```
### Scripts
```bash
# Development with auto-reload
npm run dev:watch
# Development without auto-reload
npm run dev
# Build for production
npm run build
# Start production server
npm start
```
### Environment Setup
1. Copy `.env.example` to `.env`
2. Add your Gemini API key
3. Install dependencies: `npm install`
4. Build the project: `npm run build`
## Supported Gemini Models
- `gemini-2.5-pro` - Latest high-quality model with advanced reasoning capabilities
- `gemini-2.0-flash` (default) - Fast, efficient for most UI analysis tasks
- `gemini-1.5-pro` - Higher quality analysis, slower processing
- `gemini-1.5-flash` - Balanced speed and quality
- `gemini-1.0-pro` - Legacy model support
### Model Selection via Environment Variables
You can set the default model using the `GEMINI_MODEL` environment variable:
```env
# Use the latest high-quality model
GEMINI_MODEL=gemini-2.5-pro
# Or use the fast default model
GEMINI_MODEL=gemini-2.0-flash
```
You can also specify a different model per request using the `model` parameter:
```javascript
{
"tool": "analyze_ui_screenshot",
"arguments": {
"image_path": "/path/to/screenshot.png",
"model": "gemini-2.5-pro" // Override default model for this request
}
}
```
## Image Requirements
- **Formats**: PNG, JPEG, GIF, WebP, BMP
- **Maximum Size**: 10MB (configurable)
- **Recommended Resolution**: 1920x1080 or higher for best results
- **Content**: Clear screenshots without excessive compression artifacts
## Error Handling
The server provides detailed error messages for common issues:
- **Missing API Key**: Configure `GEMINI_API_KEY` environment variable
- **Invalid Image**: Unsupported format or corrupted file
- **Size Limits**: Image exceeds maximum allowed size
- **Network Errors**: Failed to download images from URLs
- **API Errors**: Gemini API quota limits or service issues
## API Rate Limits
- Gemini API has usage quotas and rate limits
- Consider implementing caching for repeated analysis
- Monitor your API usage in the Google Cloud Console
## Troubleshooting
### Common Issues
1. **"GEMINI_API_KEY environment variable is required"**
- Set the `GEMINI_API_KEY` in your `.env` file or environment variables
- Get an API key from https://makersuite.google.com/app/apikey
2. **"Failed to connect to Gemini API"**
- Verify your API key is valid and active
- Check network connectivity
- Ensure API is enabled in your Google Cloud project
3. **"Image size exceeds maximum allowed size"**
- Reduce image size or increase `MAX_IMAGE_SIZE` limit
- Compress images before analysis
4. **"Unsupported image format"**
- Use supported formats: PNG, JPEG, GIF, WebP, BMP
- Convert images to supported format before analysis
### Debug Mode
Enable debug logging by setting:
```bash
DEBUG=uivision:* auu-uivision-mcp
```
## License
MIT License - see LICENSE file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## Support
- Create issues on GitHub for bug reports
- Check the documentation for common solutions
- Review the error messages for specific guidance
## Changelog
### v1.0.0
- Initial release
- UI screenshot analysis with Gemini API
- Support for multiple image input sources
- Configurable models and parameters
- Comprehensive error handling
Connection Info
You Might Also Like
awesome-mcp-servers
A collection of MCP servers.
git
A Model Context Protocol server for Git automation and interaction.
Appwrite
Build like a team of hundreds
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)