Content
# AgentShield
Security auditor for AI agent configurations. Scans Claude Code setups for vulnerabilities, misconfigs, and injection risks.
The AI agent ecosystem is growing fast — but security isn't keeping pace. 12% of one major agent marketplace contains malicious skills. A CVSS 8.8 CVE affected 17,500+ internet-facing instances. Developers install community skills, connect MCP servers, and configure hooks without any automated way to audit the security of their setup.
AgentShield scans your `.claude/` directory and agent configuration files to detect vulnerabilities before they become exploits. It also includes **MiniClaw** — a minimal, sandboxed AI agent runtime that demonstrates what a secure-by-default agent looks like.
Built at the [Claude Code Hackathon](https://cerebralvalley.ai/e/claude-code-hackathon) (Cerebral Valley x Anthropic, Feb 2026).
## Quick Start
```bash
# Scan your Claude Code config (no install required)
npx agentshield scan
# Or install globally
npm install -g agentshield
agentshield scan
# Scan a specific directory
agentshield scan --path /path/to/.claude
# Auto-fix safe issues
agentshield scan --fix
# Generate an HTML security report
agentshield scan --format html > report.html
# Run Opus 4.6 adversarial analysis with real-time streaming
agentshield scan --opus --stream
# Generate a secure baseline config
agentshield init
```
## Features
### Static Analysis (`agentshield scan`)
Rule-based scanning with 16 rules across 5 categories, graded A-F with a 0-100 numeric score.
### Auto-Fix Engine (`--fix`)
Automatically applies safe fixes for detected issues:
- Replaces hardcoded secrets with `${ENV_VAR}` references
- Tightens wildcard permissions (`Bash(*)` → scoped `Bash(git *)`, `Bash(npm *)`)
- Generic string-replacement transforms for other fixable patterns
Only `auto: true` fixes are applied. Files are never overwritten without a matching pattern.
### Secure Init (`agentshield init`)
Generates a hardened `.claude/` directory with:
- **settings.json** — scoped permissions (no `Bash(*)`) and safety hooks
- **CLAUDE.md** — security best practices for the AI agent
- **mcp.json** — empty MCP config placeholder
Existing files are never overwritten.
### Opus 4.6 Deep Analysis (`--opus`)
Three-agent adversarial pipeline powered by Claude Opus 4.6:
1. **Red Team (Attacker)** — finds exploitable attack vectors and multi-step attack chains
2. **Blue Team (Defender)** — recommends concrete hardening measures with exact config changes
3. **Auditor** — synthesizes both perspectives into a final risk assessment with a numeric score
In non-streaming mode, Red Team and Blue Team run in **parallel** via `Promise.all` for speed. In streaming mode (`--stream`), agents run sequentially with real-time token output — live spinners show progress, verbose mode (`-v`) streams every token to stdout.
```bash
# Run with Opus analysis
agentshield scan --opus
# Stream Opus analysis in real-time
agentshield scan --opus --stream
# Verbose streaming (see full agent reasoning)
agentshield scan --opus --stream -v
```
Requires `ANTHROPIC_API_KEY` environment variable.
### MiniClaw — Secure Agent Runtime
A minimal, sandboxed AI agent that demonstrates secure-by-default design. Single HTTP endpoint, isolated filesystem, whitelist-based tool authorization, prompt injection filtering. See the [MiniClaw section](#miniclaw) below for full documentation.
```typescript
import { startMiniClaw } from 'ecc-agentshield/miniclaw';
const { server, stop } = startMiniClaw();
// Listening on http://localhost:3847
```
### GitHub Action
Add AgentShield to your CI pipeline:
```yaml
- name: AgentShield Security Scan
uses: affaan-m/agentshield@v1
with:
path: "."
min-severity: "medium"
fail-on-findings: "true"
```
**Inputs:**
| Input | Default | Description |
|-------|---------|-------------|
| `path` | `.` | Path to scan |
| `min-severity` | `medium` | Minimum severity: critical, high, medium, low, info |
| `fail-on-findings` | `true` | Fail the action if findings are detected |
| `format` | `terminal` | Output format |
**Outputs:**
| Output | Description |
|--------|-------------|
| `score` | Numeric security score (0-100) |
| `grade` | Letter grade (A-F) |
| `total-findings` | Total number of findings |
| `critical-count` | Number of critical findings |
The action writes a markdown job summary and emits GitHub annotations (warnings/errors) inline on affected files.
## What It Catches
### Secrets Detection
- Hardcoded API keys (Anthropic, OpenAI, AWS, GitHub, Slack)
- Exposed database connection strings
- Bearer tokens and private key material
- Environment variables echoed to terminal
### Permission Audit
- Overly permissive allow rules (`Bash(*)`, `Write(*)`)
- Missing deny lists for dangerous operations
- `--dangerously-skip-permissions` usage
- Contradictory allow/deny entries
### MCP Server Security
- High-risk servers (filesystem, shell, database, browser)
- Hardcoded secrets in MCP environment configs
- `npx -y` supply chain risks (auto-install without confirmation)
- Missing server descriptions
### Hook Analysis
- Command injection via variable interpolation (`${file}`)
- Data exfiltration through external HTTP requests
- Silent error suppression (`2>/dev/null`, `|| true`)
- Missing PreToolUse security hooks
### Agent Config Review
- Agents with unnecessary Bash access or write permissions
- Prompt injection surface in agent definitions that process external content
- Auto-run instructions in CLAUDE.md (prompt injection vector)
## Example Output
```
AgentShield Security Report
2026-02-11
Grade: F (0/100)
Score Breakdown
Secrets ░░░░░░░░░░░░░░░░░░░░ 0
Permissions █████░░░░░░░░░░░░░░░ 23
Hooks ░░░░░░░░░░░░░░░░░░░░ 0
MCP Servers ██░░░░░░░░░░░░░░░░░░ 10
Agents ████████████████░░░░ 80
Summary
Files scanned: 4
Findings: 35 total — 10 critical, 12 high, 8 medium, 1 low, 4 info
Auto-fixable: 1 (use --fix)
```
## Output Formats
| Format | Flag | Use Case |
|--------|------|----------|
| Terminal | `--format terminal` (default) | Interactive use, demos |
| JSON | `--format json` | CI pipelines, programmatic use |
| Markdown | `--format markdown` | Documentation, PRs |
| HTML | `--format html` | Shareable self-contained report |
### HTML Report (`--format html`)
Generates a single self-contained HTML file with all CSS inlined — no external dependencies. Dark theme inspired by GitHub dark mode. Includes grade badge, score breakdown bars, categorized findings with severity indicators, and auto-fix status.
```bash
agentshield scan --format html > report.html
open report.html
```
## CLI Reference
```
agentshield scan [options] Scan a configuration directory
-p, --path <path> Path to scan (default: ~/.claude or cwd)
-f, --format <format> Output: terminal, json, markdown, html
--fix Auto-apply safe fixes
--opus Enable Opus 4.6 multi-agent analysis
--stream Stream Opus analysis in real-time
--min-severity <severity> Filter: critical, high, medium, low, info
-v, --verbose Show detailed output
agentshield init [options] Generate secure baseline config
```
## Architecture
```
src/
├── index.ts CLI entry point (commander)
├── action.ts GitHub Action entry point
├── types.ts Type system + Zod schemas
├── scanner/
│ ├── discovery.ts Config file discovery
│ └── index.ts Scan orchestrator
├── rules/
│ ├── index.ts Rule registry
│ ├── secrets.ts Secret detection (11 patterns)
│ ├── permissions.ts Permission audit (3 rules)
│ ├── mcp.ts MCP server security (4 rules)
│ ├── hooks.ts Hook security analysis (4 rules)
│ └── agents.ts Agent config review (3 rules)
├── reporter/
│ ├── score.ts Scoring engine (A-F grades)
│ ├── terminal.ts Color terminal output
│ ├── json.ts JSON + Markdown output
│ └── html.ts Self-contained HTML report
├── fixer/
│ ├── transforms.ts Fix transforms (secret, permission, generic)
│ └── index.ts Fix engine orchestrator
├── init/
│ └── index.ts Secure config generator
├── opus/
│ ├── prompts.ts Attacker/Defender/Auditor system prompts
│ ├── pipeline.ts Three-agent Opus 4.6 pipeline
│ └── render.ts Opus analysis rendering
└── miniclaw/
├── types.ts Core type system (immutable, readonly)
├── sandbox.ts Sandbox lifecycle + path validation
├── router.ts Prompt sanitization + output filtering
├── tools.ts Whitelist-based tool authorization
├── server.ts HTTP server with rate limiting + CORS
├── dashboard.tsx React dashboard component
└── index.ts Entry point and re-exports
```
## Security Rules
| Category | Rules | Severity Range |
|----------|-------|----------------|
| Secrets | 2 (11 patterns) | Critical - High |
| Permissions | 3 | Critical - Medium |
| MCP Servers | 4 | Critical - Info |
| Hooks | 4 | Critical - Medium |
| Agents | 3 | High - Info |
| **Total** | **16** | |
## MiniClaw
MiniClaw is a minimal, secure, sandboxed AI agent runtime — the antithesis of multi-channel orchestration platforms. Where platforms like OpenClaw expose many attack surfaces (Telegram, Discord, email, community plugins), MiniClaw presents a **single HTTP endpoint** backed by an **isolated sandbox**.
**Design mantra**: Minimal attack surface, maximum security, simple to deploy.
| Principle | Typical Agent Platform | MiniClaw |
|-----------|----------------------|----------|
| Access points | Many (Telegram, X, Discord, email) | One (single HTTP endpoint) |
| Execution | Host machine, broad access | Containerized, sandboxed |
| Skills | Unvetted community marketplace | Manually audited, local only |
| Network exposure | Multiple ports, services | Minimal, single entry point |
| Blast radius | Everything agent can access | Sandboxed to session directory |
### Quick Start
```typescript
import { startMiniClaw } from 'ecc-agentshield/miniclaw';
// Start with secure defaults (localhost:3847, no network, safe tools only)
const { server, stop } = startMiniClaw();
// Or customize configuration
const { server, stop } = startMiniClaw({
sandbox: { networkPolicy: 'localhost' },
server: { port: 4000, rateLimit: 20 },
});
// Embed in an existing app (no HTTP server)
import { createMiniClawSession, routePrompt } from 'ecc-agentshield/miniclaw';
const session = await createMiniClawSession();
const response = await routePrompt({ sessionId: session.id, prompt: 'Read index.ts' }, session);
```
### Security Model
**Defense in depth** — four layers, each independently enforced:
1. **Server Layer** — Rate limiting (10 req/min per IP), CORS restriction, request size cap (10KB), security headers, localhost-only binding
2. **Prompt Router** — Strips 12+ injection pattern categories: system prompt overrides, identity reassignment, jailbreak attempts, tool invocation syntax, data exfiltration URLs, zero-width Unicode, base64-encoded payloads. Output filtering removes leaked system prompt content.
3. **Tool Whitelist** — Three-tier authorization:
- **Safe** (auto-approved): read, search, list
- **Guarded** (session opt-in): write, edit, glob
- **Restricted** (disabled by default): bash, network, external API
- Unknown tools are **blocked by default** (fail-closed)
4. **Sandbox** — Isolated filesystem per session. Path traversal blocked, symlink escape detection, allowed extensions whitelist, 10MB file size limit, 5-minute session timeout. No network access by default.
### API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/api/prompt` | Send a prompt, receive a response |
| `POST` | `/api/session` | Create a new sandboxed session |
| `GET` | `/api/session` | Get current session info |
| `DELETE` | `/api/session/:id` | Destroy session and cleanup sandbox |
| `GET` | `/api/events/:sessionId` | Retrieve security audit events |
| `GET` | `/api/health` | Health check |
### Dashboard
MiniClaw includes a React dashboard component for interactive use:
```tsx
import { MiniClawDashboard } from 'ecc-agentshield/miniclaw/dashboard';
<MiniClawDashboard endpoint="http://localhost:3847" />
```
Features: dark theme, prompt input, streaming response display, session status indicator, security events panel, tool whitelist categorized by risk level. Requires React 18+ as a peer dependency.
### Configuration Defaults
```typescript
// Sandbox defaults — maximum security posture
{
rootPath: '/tmp/miniclaw-sandboxes',
maxFileSize: 10_485_760, // 10MB
allowedExtensions: ['.ts', '.tsx', '.js', '.jsx', '.json', '.md', '.txt', ...],
networkPolicy: 'none', // No network access
maxDuration: 300_000, // 5 minutes
}
// Server defaults — locked to localhost
{
port: 3847,
hostname: 'localhost', // Never 0.0.0.0 by default
corsOrigins: ['http://localhost:3847', 'http://localhost:3000'],
rateLimit: 10, // 10 req/min per IP
maxRequestSize: 10_240, // 10KB
}
```
## Development
```bash
# Install dependencies
npm install
# Development mode
npm run dev scan --path examples/vulnerable
# Run tests (202 tests)
npm test
# Run tests with coverage
npm run test:coverage
# Type check
npm run typecheck
# Build
npm run build
# Demo scan (vulnerable examples)
npm run scan:demo
```
## License
MIT
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.