Content
# MCP as a Judge ⚖️
mcp-name: io.github.OtherVibes/mcp-as-a-judge
<div align="left">
<img src="assets/mcp-as-a-judge.png" alt="MCP as a Judge Logo" width="200">
</div>
> MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.
[](https://opensource.org/license/mit/)
[](https://www.python.org/downloads/)
[](https://modelcontextprotocol.io/)
[](https://github.com/OtherVibes/mcp-as-a-judge/actions/workflows/ci.yml)
[](https://github.com/OtherVibes/mcp-as-a-judge/actions/workflows/release.yml)
[](https://pypi.org/project/mcp-as-a-judge/)
**MCP as a Judge** is a **behavioral MCP** that strengthens AI coding assistants by requiring explicit LLM evaluations for:
- Research, system design, and planning
- Code changes, testing, and task-completion verification
It enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.
> If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.
## Key problems with AI coding assistants and LLMs
- Treat LLM output as ground truth; skip research and use outdated information
- Reinvent the wheel instead of reusing libraries and existing code
- Cut corners: code below engineering standards and weak tests
- Make unilateral decisions when requirements are ambiguous or plans change
- Security blind spots: missing input validation, injection risks/attack vectors, least‑privilege violations, and weak defensive programming
## **Vibe coding doesn’t have to be frustrating**
### What it enforces
- Evidence‑based research and reuse (best practices, libraries, existing code)
- Plan‑first delivery aligned to user requirements
- Human‑in‑the‑loop decisions for ambiguity and blockers
- Quality gates on code and tests (security, performance, maintainability)
### Key capabilities
- Intelligent code evaluation via MCP [sampling](https://modelcontextprotocol.io/docs/learn/client-concepts#sampling); enforces software‑engineering standards and flags security/performance/maintainability risks
- Comprehensive plan/design review: validates architecture, research depth, requirements fit, and implementation approach
- User‑driven decisions via MCP [elicitation](https://modelcontextprotocol.io/docs/learn/client-concepts#elicitation): clarifies requirements, resolves obstacles, and keeps choices transparent
- Security validation in system design and code changes
### Tools and how they help
| Tool | What it solves |
|------|-----------------|
| `set_coding_task` | Creates/updates task metadata; classifies task_size; returns next-step workflow guidance |
| `get_current_coding_task` | Recovers the latest task_id and metadata to resume work safely |
| `judge_coding_plan` | Validates plan/design; requires library selection and internal reuse maps; flags risks |
| `judge_code_change` | Reviews unified Git diffs for correctness, reuse, security, and code quality |
| `judge_testing_implementation` | Validates tests using real runner output and optional coverage |
| `judge_coding_task_completion` | Final gate ensuring plan, code, and tests approvals before completion |
| `raise_missing_requirements` | Elicits missing details and decisions to unblock progress |
| `raise_obstacle` | Engages the user on trade‑offs, constraints, and enforced changes |
## 🚀 **Quick Start**
### **Requirements & Recommendations**
#### **MCP Client Prerequisites**
MCP as a Judge is heavily dependent on **MCP Sampling** and **MCP Elicitation** features for its core functionality:
- **[MCP Sampling](https://modelcontextprotocol.io/docs/learn/client-concepts#sampling)** - Required for AI-powered code evaluation and judgment
- **[MCP Elicitation](https://modelcontextprotocol.io/docs/learn/client-concepts#elicitation)** - Required for interactive user decision prompts
#### **System Prerequisites**
- **Docker Desktop** / **Python 3.13+** - Required for running the MCP server
#### **Supported AI Assistants**
| AI Assistant | Platform | MCP Support | Status | Notes |
|---------------|----------|-------------|---------|-------|
| **GitHub Copilot** | Visual Studio Code | ✅ Full | **Recommended** | Complete MCP integration with sampling and elicitation |
| **Claude Code** | - | ⚠️ Partial | Requires LLM API key | [Sampling Support feature request](https://github.com/anthropics/claude-code/issues/1785)<br>[Elicitation Support feature request](https://github.com/anthropics/claude-code/issues/2799) |
| **Cursor** | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
| **Augment** | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
| **Qodo** | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
**✅ Recommended setup:** GitHub Copilot + VS Code — full MCP sampling; no API key needed.
**⚠️ Critical:** For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set `LLM_API_KEY`. Without it, the server cannot evaluate plans or code. See [LLM API Configuration](#-llm-api-configuration-optional).
**💡 Tip:** Prefer large context models (≥ 1M tokens) for better analysis and judgments.
### If the MCP server isn’t auto‑used
For troubleshooting, visit the [FAQs section](#faq).
## 🔧 **MCP Configuration**
Configure **MCP as a Judge** in your MCP-enabled client:
### **Method 1: Using Docker (Recommended)**
#### One‑click install for VS Code (MCP)
[](https://insiders.vscode.dev/redirect/mcp/install?name=mcp-as-a-judge&inputs=%5B%5D&config=%7B%22command%22%3A%22docker%22%2C%22args%22%3A%5B%22run%22%2C%22-i%22%2C%22--rm%22%2C%22--pull%3Dalways%22%2C%22ghcr.io%2Fothervibes%2Fmcp-as-a-judge%3Alatest%22%5D%7D)
Notes:
- VS Code controls the sampling model; select it via “MCP: List Servers → mcp-as-a-judge → Configure Model Access”.
1. **Configure MCP Settings:**
Add this to your MCP client configuration file:
```json
{
"command": "docker",
"args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"],
"env": {
"LLM_API_KEY": "your-openai-api-key-here",
"LLM_MODEL_NAME": "gpt-4o-mini"
}
}
```
**📝 Configuration Options (All Optional):**
- **LLM_API_KEY**: Optional for GitHub Copilot + VS Code (has built-in MCP sampling)
- **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)
- The `--pull=always` flag ensures you always get the latest version automatically
Then manually update when needed:
```bash
# Pull the latest version
docker pull ghcr.io/othervibes/mcp-as-a-judge:latest
```
### **Method 2: Using uv**
1. **Install the package:**
```bash
uv tool install mcp-as-a-judge
```
2. **Configure MCP Settings:**
The MCP server may be automatically detected by your MCP‑enabled client.
**📝 Notes:**
- **No additional configuration needed for GitHub Copilot + VS Code** (has built-in MCP sampling)
- LLM_API_KEY is optional and can be set via environment variable if needed
3. **To update to the latest version:**
```bash
# Update MCP as a Judge to the latest version
uv tool upgrade mcp-as-a-judge
```
### Select a sampling model in VS Code
- Open Command Palette (Cmd/Ctrl+Shift+P) → “MCP: List Servers”
- Select the configured server “mcp-as-a-judge”
- Choose “Configure Model Access”
- Check your preferred model(s) to enable sampling
## 🔑 **LLM API Configuration (Optional)**
For [AI assistants without full MCP sampling support](#supported-ai-assistants) you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.
- Set `LLM_API_KEY` (unified key). Vendor is auto-detected; optionally set `LLM_MODEL_NAME` to override the default.
### **Supported LLM Providers**
| Rank | Provider | API Key Format | Default Model | Notes |
|------|----------|----------------|---------------|-------|
| **1** | **OpenAI** | `sk-...` | `gpt-4.1` | Fast and reliable model optimized for speed |
| **2** | **Anthropic** | `sk-ant-...` | `claude-sonnet-4-20250514` | High-performance with exceptional reasoning |
| **3** | **Google** | `AIza...` | `gemini-2.5-pro` | Most advanced model with built-in thinking |
| **4** | **Azure OpenAI** | `[a-f0-9]{32}` | `gpt-4.1` | Same as OpenAI but via Azure |
| **5** | **AWS Bedrock** | AWS credentials | `anthropic.claude-sonnet-4-20250514-v1:0` | Aligned with Anthropic |
| **6** | **Vertex AI** | Service Account JSON | `gemini-2.5-pro` | Enterprise Gemini via Google Cloud |
| **7** | **Groq** | `gsk_...` | `deepseek-r1` | Best reasoning model with speed advantage |
| **8** | **OpenRouter** | `sk-or-...` | `deepseek/deepseek-r1` | Best reasoning model available |
| **9** | **xAI** | `xai-...` | `grok-code-fast-1` | Latest coding-focused model (Aug 2025) |
| **10** | **Mistral** | `[a-f0-9]{64}` | `pixtral-large` | Most advanced model (124B params) |
### **Client-Specific Setup**
#### **Cursor**
1. **Open Cursor Settings:**
- Go to `File` → `Preferences` → `Cursor Settings`
- Navigate to the `MCP` tab
- Click `+ Add` to add a new MCP server
2. **Add MCP Server Configuration:**
```json
{
"command": "uv",
"args": ["tool", "run", "mcp-as-a-judge"],
"env": {
"LLM_API_KEY": "your-openai-api-key-here",
"LLM_MODEL_NAME": "gpt-4.1"
}
}
```
**📝 Configuration Options:**
- **LLM_API_KEY**: Required for Cursor (limited MCP sampling)
- **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)
#### **Claude Code**
1. **Add MCP Server via CLI:**
```bash
# Set environment variables first (optional model override)
export LLM_API_KEY="your_api_key_here"
export LLM_MODEL_NAME="claude-3-5-haiku" # Optional: faster/cheaper model
# Add MCP server
claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge
```
2. **Alternative: Manual Configuration:**
- Create or edit `~/.config/claude-code/mcp_servers.json`
```json
{
"command": "uv",
"args": ["tool", "run", "mcp-as-a-judge"],
"env": {
"LLM_API_KEY": "your-anthropic-api-key-here",
"LLM_MODEL_NAME": "claude-3-5-haiku"
}
}
```
**📝 Configuration Options:**
- **LLM_API_KEY**: Required for Claude Code (limited MCP sampling)
- **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)
#### **Other MCP Clients**
For other MCP-compatible clients, use the standard MCP server configuration:
```json
{
"command": "uv",
"args": ["tool", "run", "mcp-as-a-judge"],
"env": {
"LLM_API_KEY": "your-openai-api-key-here",
"LLM_MODEL_NAME": "gpt-5"
}
}
```
**📝 Configuration Options:**
- **LLM_API_KEY**: Required for most MCP clients (except GitHub Copilot + VS Code)
- **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)
## 🔒 **Privacy & Flexible AI Integration**
### **🔑 MCP Sampling (Preferred) + LLM API Key Fallback**
**Primary Mode: MCP Sampling**
- All judgments are performed using **MCP Sampling** capability
- No need to configure or pay for external LLM API services
- Works directly with your MCP-compatible client's existing AI model
- **Currently supported by:** GitHub Copilot + VS Code
**Fallback Mode: LLM API Key**
- When MCP sampling is not available, the server can use LLM API keys
- Supports multiple providers via LiteLLM: OpenAI, Anthropic, Google, Azure, Groq, Mistral, xAI
- Automatic vendor detection from API key patterns
- Default model selection per vendor when no model is specified
### **🛡️ Your Privacy Matters**
- The server runs **locally** on your machine
- **No data collection** - your code and conversations stay private
- **No external API calls when using MCP Sampling**. If you set `LLM_API_KEY` for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.
- Complete control over your development workflow and sensitive information
## 🤝 **Contributing**
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### **Development Setup**
```bash
# Clone the repository
git clone https://github.com/OtherVibes/mcp-as-a-judge.git
cd mcp-as-a-judge
# Install dependencies with uv
uv sync --all-extras --dev
# Install pre-commit hooks
uv run pre-commit install
# Run tests
uv run pytest
# Run all checks
uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src
```
## © Concepts and Methodology
© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.
## Prior Art and Attribution
While “LLM‑as‑a‑judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task‑centric workflow enforcement (plan → code → test → completion), explicit LLM‑based validations, and human‑in‑the‑loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.
## ❓ FAQ
### How is “MCP as a Judge” different from rules/subagents in IDE assistants (GitHub Copilot, Cursor, Claude Code)?
| Feature | IDE Rules | Subagents | MCP as a Judge |
|---------|-----------|-----------|----------------|
| Static behavior guidance | ✓ | ✓ | ✗ |
| Custom system prompts | ✓ | ✓ | ✓ |
| Project context integration | ✓ | ✓ | ✓ |
| Specialized task handling | ✗ | ✓ | ✓ |
| Active quality gates | ✗ | ✗ | ✓ |
| Evidence-based validation | ✗ | ✗ | ✓ |
| Approve/reject with feedback | ✗ | ✗ | ✓ |
| Workflow enforcement | ✗ | ✗ | ✓ |
| Cross-assistant compatibility | ✗ | ✗ | ✓ |
- References: [GitHub Copilot Custom Instructions](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions), [Cursor Rules](https://docs.cursor.com/en/context/@-symbols/@-cursor-rules), [Claude Code Subagents](https://docs.anthropic.com/en/docs/claude-code/sub-agents)
### How does the Judge workflow relate to the tasklist? Why do we need both?
- Tasklist = planning/organization: tracks tasks, priorities, and status. It doesn’t guarantee engineering quality or readiness.
- Judge workflow = quality gates: enforces approvals for plan/design, code diffs, tests, and final completion. It demands real evidence (e.g., unified Git diffs and raw test output) and returns structured approvals and required improvements.
- Together: Use the tasklist to organize work; use the Judge to decide when each stage is actually ready to proceed. The server also emits next_tool guidance to keep progress moving through the gates.
### If the Judge isn’t used automatically, how do I force it?
- In your prompt: "use mcp-as-a-judge" or "Evaluate plan/code/test using the MCP server mcp-as-a-judge".
- VS Code: Command Palette → "MCP: List Servers" → ensure "mcp-as-a-judge" is listed and enabled.
- Ensure the MCP server is running and, in your client, the judge tools are enabled/approved.
### How do I select models for sampling in VS Code?
- Open Command Palette (Cmd/Ctrl+Shift+P) → "MCP: List Servers"
- Select "mcp-as-a-judge" → "Configure Model Access"
- Check your preferred model(s) to enable sampling
## 📄 **License**
This project is licensed under the MIT License (see [LICENSE](LICENSE)).
## 🙏 **Acknowledgments**
- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic
- [LiteLLM](https://github.com/BerriAI/litellm) for unified LLM API integration
---
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.