Content
<p align="center">
<img src="https://img.shields.io/badge/OWASP_WSTG-109_Tests-blue?style=for-the-badge" alt="WSTG Tests">
<img src="https://img.shields.io/badge/PortSwigger-31_Technique_Guides-FF6633?style=for-the-badge" alt="PortSwigger Guides">
<img src="https://img.shields.io/badge/MCP_Tools-68+-00B4D8?style=for-the-badge" alt="MCP Tools">
<img src="https://img.shields.io/badge/Security_Tools-27-orange?style=for-the-badge" alt="Security Tools">
<img src="https://img.shields.io/badge/WAF_Bypass-12_Vendors-red?style=for-the-badge" alt="WAF Bypass">
<img src="https://img.shields.io/badge/Zero_False_Positives-Evidence_Based-brightgreen?style=for-the-badge" alt="Evidence Based">
<img src="https://img.shields.io/badge/License-Apache_2.0-green?style=for-the-badge" alt="License">
</p>
# AutoPentest
**An agentic pentesting MCP server that automates web application penetration testing using the full OWASP Web Security Testing Guide and PortSwigger Web Security Academy technique references.**
Point it at a target — it crawls your app, maps every endpoint, then spawns role-specialized agents (Scout, Analyzer, Exploiter, Reporter) to test for XSS, SQLi, SSRF, SSTI, IDOR and more. No false positives — every finding is backed by real, reproducible evidence with quality gates enforcing proof at every phase. Includes 31 PortSwigger technique guides, adaptive WAF evasion for 12 vendors, cross-phase vulnerability chaining, and risk-weighted endpoint prioritization. Run it with Claude Code, the API, or go fully offline using Ollama models.
> **Think of it as:** A senior pentester's methodology encoded into an MCP server — 109 OWASP tests, 31 PortSwigger attack technique guides, 68+ MCP tools, 27 security tools, 4 specialized agent roles, 7 structured phases, automated quality assurance, and a zero-context final review.
---
<p align="center">
<img src="cli-output.gif" alt="AutoPentest CLI Output" width="800">
</p>
## Table of Contents
- [Why AutoPentest?](#why-autopentest)
- [Architecture](#architecture)
- [Features](#features)
- [Agent Role System](#agent-role-system)
- [Quick Start](#quick-start)
- [Usage](#usage)
- [Testing Phases](#testing-phases)
- [Security Tools](#security-tools)
- [WSTG Knowledge Base](#wstg-knowledge-base)
- [PortSwigger Technique Guides](#portswigger-technique-guides)
- [Quality Assurance System](#quality-assurance-system)
- [Benchmarking](#benchmarking)
- [Example Report](#example-report)
- [Configuration](#configuration)
- [Multi-Domain Testing](#multi-domain-testing)
- [Crash Recovery](#crash-recovery)
- [Project Structure](#project-structure)
- [Requirements](#requirements)
- [FAQ](#faq)
- [Disclaimer](#disclaimer)
---
## Why AutoPentest?
Manual penetration testing is thorough but slow. Automated scanners are fast but shallow. AutoPentest bridges the gap:
| Capability | Manual Pentest | Automated Scanner | AutoPentest |
|------------|:-:|:-:|:-:|
| Full OWASP WSTG coverage | Depends on tester | Partial | **109 tests** |
| Business logic testing | Yes | No | **Yes** |
| Multi-step exploitation | Yes | Limited | **Yes** |
| Vulnerability chaining | Yes | No | **Yes** |
| Evidence-based findings | Yes | Template output | **Reproducible curl commands** |
| Consistent quality | Varies | Yes | **Phase gates + Final Judge** |
| Speed | Days | Minutes | **Hours** |
| Cross-domain auth (SSO/OIDC) | Manual setup | Usually fails | **Automated handling** |
---
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ LLM Orchestrator (Claude) │
│ │
│ Reads CLAUDE.md workflow, manages phases, │
│ spawns role-specialized subagents │
└──────────┬──────────┬──────────┬──────────┬─────────────────┘
│ │ │ │
┌─────▼────┐ ┌───▼─────┐ ┌──▼───────┐ ┌▼─────────┐
│ Scout │ │Analyzer │ │Exploiter │ │ Reporter │
│ (recon) │ │ (vuln │ │ (proof) │ │ (QA / │
│ │ │ disc.) │ │ │ │ judge) │
└──────────┘ └─────────┘ └──────────┘ └──────────┘
│ │ │ │
│ MCP │ │ MCP │
▼ ▼ ▼ ▼
┌──────────────────────────┐ ┌──────────────────────┐
│ WSTG MCP Server │ │ Playwright MCP │
│ (68+ tools) │ │ (Browser Testing) │
│ │ │ │
│ ◦ 109 WSTG tests │ │ ◦ DOM XSS proof │
│ ◦ 31 technique guides │ │ ◦ Clickjacking │
│ ◦ Task tree │ │ ◦ JS-rendered auth │
│ ◦ Knowledge graph │ └──────────────────────┘
│ ◦ WAF evasion │
│ ◦ Tool output parser │
│ ◦ Results verification │ docker exec
│ ◦ Context compression │ │
│ ◦ Endpoint priority │ ▼
│ ◦ Quality gates │ ┌──────────────────────┐
│ ◦ Report generation │ │ autopentest-tools │
└──────────────────────────┘ │ (Docker Container) │
│ │
│ 27 security tools: │
│ nuclei, sqlmap, │
│ dalfox, katana, │
│ ffuf, nmap ... │
│ │
│ Burp proxy │
│ passthrough │
└──────────────────────┘
```
**How it works:**
1. **Claude Code** reads `CLAUDE.md` for the complete pentest methodology and orchestrates the 7-phase workflow
2. **Role-specialized subagents** (Scout, Analyzer, Exploiter, Reporter) execute focused tasks with dedicated prompt templates, tool guidance, and anti-patterns
3. **WSTG MCP Server** (68+ tools) provides OWASP test procedures, 31 PortSwigger technique guides, hierarchical task tree, knowledge graph, WAF evasion, endpoint prioritization, results verification, context compression, quality gates, and report generation
4. **Docker Container** runs all 27 security tools — traffic optionally routes through Burp Suite for passive monitoring
5. **Playwright MCP** handles browser-based testing (DOM XSS, clickjacking, JS-rendered login pages)
---
## Features
### Comprehensive OWASP Coverage
- **109 WSTG test cases** across 12 categories — from information gathering to API testing
- Each test includes step-by-step CLI procedures, context-specific payloads, detection criteria, and severity rubrics
- Tests are prioritized (MUST/SHOULD) with conditional triggers so nothing relevant is skipped
### 31 PortSwigger Attack Technique Guides
- Sourced from [PortSwigger Web Security Academy](https://portswigger.net/web-security) — detection methods, exploitation techniques, payloads, cheat sheets, and WAF bypass patterns
- Organized by vulnerability class (SQLi, XSS, SSRF, JWT, OAuth, etc.) for direct use during testing
- Integrated into every testing phase — agents automatically load the relevant technique guide before testing each vulnerability class
- Database/platform-specific payload tables (Oracle vs MySQL vs PostgreSQL vs MSSQL for SQLi, Jinja2 vs Twig vs Freemarker for SSTI, etc.)
- WAF bypass patterns organized by bypass level (basic → intermediate → advanced)
### 27 Pre-Configured Security Tools
- All tools pre-installed in a single Docker image — `make setup` and you're ready
- Tools organized by phase: discovery, injection testing, authentication, cryptography, API testing
- Automatic Burp Suite proxy integration for passive traffic monitoring
### Structured 7-Phase Workflow
- **Phase 0:** Application Discovery & Mapping
- **Phase 1:** Information Gathering & Reconnaissance
- **Phase 2:** Configuration & Deployment Testing
- **Phase 3:** Identity, Authentication, Authorization & Session Management
- **Phase 4:** Input Validation Testing (pipelined XSS/SQLi/SSRF pipelines)
- **Phase 5:** Error Handling, Cryptography, Business Logic, Client-Side & API Testing
- **Phase 6:** Coverage Verification & Reporting
- **Phase 7:** Final Judge Review & Remediation
### Quality Assurance System
- **Automated phase gates** — each phase must pass quality checks before proceeding
- **Quality Reviewer** subagent at every phase transition identifies gaps and suggests improvements
- **Final Judge** — a zero-context agent reviews the entire engagement cold, like an external QA reviewer
- **Exhaustion gates** — "not vulnerable" requires proof of sufficient testing effort (minimum techniques and bypass attempts)
### Evidence-Based Findings
- Every finding requires reproducible curl commands and full request/response evidence
- **Three-tier classification:** EXPLOITED (proven impact), POTENTIAL (blocked by control), FALSE_POSITIVE (control holds)
- **Anti-hallucination framework** — "no exploit = no finding" enforced at every level
- Evidence checklists per vulnerability class verified before any finding is logged
### Role-Specialized Subagents
- **4 dedicated roles** with focused prompt templates, tool guidance, and anti-patterns:
- **Scout** — reconnaissance only, maps attack surface without sending payloads (Phase 0-1)
- **Analyzer** — identifies potential sinks with canary/witness payloads, builds exploitation queues (Phase 2-5 analysis)
- **Exploiter** — consumes Analyzer output, proves exploitation with evidence, logs confirmed findings (Phase 4 exploitation)
- **Reporter** — quality review and Final Judge, reviews data without sending requests (QA + post-report)
- Validation checkpoint between analysis and exploitation prevents wasted effort
- Each role has explicit allowed/restricted tool lists and input/output contracts
### Pipelined Exploitation (Phase 4)
- 3 independent **two-stage pipelines** run in parallel: XSS, Injection (SQLi/CMDi), SSRF/SSTI
- Each pipeline: Analyzer (discover → analyze → queue) → validation checkpoint → Exploiter (exploit → log)
- Each pipeline loads its PortSwigger technique guide for detection methods, cheat sheets, and WAF bypass patterns
- WAF intelligence shared across all pipelines
- Context-aware witness payloads for 13 sink types
### Adaptive WAF Evasion
- **Automatic WAF fingerprinting** from response headers, body, and status codes — identifies 12 WAF vendors (Cloudflare, AWS WAF, Akamai, Imperva, ModSecurity, F5, FortiWeb, Sucuri, Barracuda, Wordfence, NAXSI, Citrix)
- **Vendor-specific bypass payloads** organized by complexity level (basic → intermediate → advanced)
- WAF intelligence shared across all agents via deliverable system
- Agents automatically identify WAF on first block response and switch to tailored bypass payloads
### Cross-Phase Knowledge Graph
- **Entity-relationship graph** tracks endpoints, parameters, technologies, findings, cookies, domains, and user roles
- **Automated vulnerability chaining** via BFS path finding with 7 predefined chain patterns:
- XSS + missing CSP, XSS + weak cookie (no HttpOnly), Open redirect + OAuth callback
- IDOR + admin role, SSRF + cloud metadata, No lockout + no MFA, CORS + sensitive endpoint
- Severity upgrades when chaining materially increases impact
- Populated throughout testing, queried after Phase 4 for chain discovery
### Hierarchical Task Tree
- Persistent tree structure (phases as branches, tests as leaves) prevents LLM depth-first bias and context loss
- Main agent maintains strategic macro view; subagents update only their assigned leaf nodes
- Auto-propagation: when all children complete, parent auto-completes
- Phase-level completion percentages for informed decision-making
### Endpoint Risk Prioritization
- Score and sort endpoints by risk for prioritized testing — highest risk tested first
- Scoring factors: parameter count, technology risk indicators, taint chain confidence, tool convergence, auth requirements, injectable parameter names
- Integrated into Phase 0 endpoint map generation
### Tool Output Parsing
- **13 built-in parsers** for common CLI tools (nmap, nuclei, sqlmap, ffuf, httpx, whatweb, testssl, nikto, dalfox, katana, gau, wapiti, commix)
- Condenses raw tool output 3-5x while preserving key findings, endpoints, and errors
- Configurable verbosity: summary (~15 lines), detailed (~50 lines), full (complete parsed output)
### CLI Tool Results Verification
- Automatic validation of CLI tool output quality — detects empty output, proxy errors, permission issues, and suspicious results
- **10 per-tool validators** (nmap, nuclei, sqlmap, ffuf, feroxbuster, testssl, dalfox, wapiti, katana, httpx) with corrected command suggestions
- When a tool produces empty or suspicious output, the validator suggests fixes (e.g., add `-Pn` for nmap, remove proxy env vars, try different flags)
- Integrated into the tool execution workflow — agents call `verify_tool_result()` after every CLI tool run
### Progressive Context Compression
- **Phase summaries** (~500-800 words) auto-generated when phase gates pass — capturing findings, coverage, tool results, and attack surface in compressed form
- Prevents context degradation in long-running engagements by replacing raw historical data with structured summaries
- `get_engagement_summary()` combines all phase summaries into a single overview for injecting into new subagent prompts
- Summaries stored as deliverables — accessible by any downstream agent without requiring full engagement history
### Counterfactual Analysis (Second-Pass Discovery)
- After an Analyzer completes with vulnerabilities found, a **second Analyzer** is spawned with instructions to "assume those vulns are patched"
- The counterfactual Analyzer searches for **additional** vulnerabilities: different endpoints, different parameters, different injection contexts, logic flaws
- Results are appended to the existing exploitation queue (automatic merge with deduplication by endpoint+parameter and auto-incrementing IDs)
- Based on PenHeal ablation research showing +71% vulnerability coverage with counterfactual prompting
### Multi-Domain Support
- Automatic SSO/OAuth/OIDC/SAML detection and handling
- Per-domain scope registration, crawling, and testing
- Cookie jar management for cross-domain session persistence
- 6-level authentication failure escalation (alternative grants → PKCE → headless browser → token extraction → user provision → unauthenticated)
### Crash-Safe Engagement Management
- Append-only `findings.md` and `progress.log` survive crashes
- Git workspace checkpointing with rollback capability
- **Auto-resume on interruption** — `resume-prompt.md` auto-generated at every checkpoint with full context (target, credentials, current phase, remaining tests, scope). Paste into a new session to continue exactly where you left off
- Mid-phase checkpoint granularity — tracks which tests within a phase are completed, not just phase-level state
- Full audit trail of every MCP tool call with timestamps
### Professional Reporting
- Markdown reports with executive summary, findings by severity, test coverage matrix, and tool coverage
- Per-category coverage percentages and gap analysis
- Vulnerability chaining analysis documented
- Final Judge observations and quality notes included
---
## Agent Role System
AutoPentest uses 4 specialized agent roles instead of generic subagents. Each role has a dedicated prompt template with focused tool guidance, input/output contracts, and anti-patterns.
| Role | Template | Purpose | Phases |
|------|----------|---------|--------|
| **Scout** | `templates/agent-roles/scout.md` | Reconnaissance and attack surface mapping | Phase 0-1, source code discovery |
| **Analyzer** | `templates/agent-roles/analyzer.md` | Vulnerability discovery with canary/witness payloads | Phase 2-5 analysis |
| **Exploiter** | `templates/agent-roles/exploiter.md` | Exploitation proof with evidence | Phase 4 exploitation |
| **Reporter** | `templates/agent-roles/reporter.md` | Quality review and Final Judge | Phase transitions, post-report |
### How the Pipeline Works
Phase 4 (highest-impact testing) uses a two-stage pipeline per vulnerability class:
```
┌──────────────────────────────────────────────────────────────┐
│ Pipeline 1: XSS │
│ │
│ Analyzer (75 turns) Exploiter (75 turns) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Discover endpoints │ │ Load Analyzer queue │ │
│ │ Send canary payloads│─────▶│ Attempt exploitation│ │
│ │ Build exploit queue │ gate │ Prove impact │ │
│ │ Save deliverable │ │ Log findings │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ ▲ │
│ validate_exploitation_queue() │
└──────────────────────────────────────────────────────────────┘
```
Three pipelines (XSS, Injection, SSRF/SSTI) run in parallel. The validation checkpoint between Analyzer and Exploiter ensures only well-formed exploitation queues proceed.
### Role Boundaries
Each role has explicit tool restrictions enforced through prompts:
- **Scouts** cannot call `log_finding()` or send attack payloads
- **Analyzers** can log configuration findings (missing headers, weak cookies) but not injection-class findings
- **Exploiters** cannot create new queues — they consume what the Analyzer produced
- **Reporters** cannot send HTTP requests to the target — they review data only
For CTF challenges and small apps (<3 input endpoints), a legacy monolithic pipeline is available as a fallback.
---
## Quick Start
### Prerequisites
- [Docker](https://docs.docker.com/get-docker/) (Docker Desktop on macOS/Windows, Docker Engine on Linux)
- [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code) with an active Anthropic API key
- [uv](https://docs.astral.sh/uv/) (Python package manager for the MCP server)
- [Node.js](https://nodejs.org/) (for Playwright MCP server)
- **Optional:** [Burp Suite Professional](https://portswigger.net/burp/pro) for passive traffic monitoring
### Installation
```bash
# 1. Clone the repository
git clone https://github.com/bhavsec/autopentest-ai.git
cd autopentest-ai
# 2. Install Python dependencies for the MCP server
cd server && uv sync && cd ..
# 3. Build Docker image and start the tools container
make setup
```
That's it. All 27 security tools are now installed and ready inside the Docker container.
### Verify Installation
```bash
# Check all tools are installed
make verify-tools
# Expected output:
# [+] nuclei: installed
# [+] httpx: installed
# [+] katana: installed
# ... (27 tools total)
```
### Start Testing
```bash
# Launch Claude Code in the project directory
claude
```
Then tell Claude what to test:
```
Run a full WSTG assessment against https://target.example.com
```
---
## Usage
### Option A: Interactive Mode
Launch Claude Code and provide the target:
```
Run a full pentest against https://app.example.com
Credentials: admin / P@ssw0rd123
```
Claude will ask for any missing information (like credentials) and begin the 7-phase workflow.
### Option B: Config-Driven Mode (Recommended)
Create a YAML config file for repeatable, consistent assessments:
```yaml
# configs/my-target.yaml
target:
url: https://app.example.com
scope:
- app.example.com
- api.example.com
exclude:
- cdn.example.com
authentication:
login_type: form
login_url: https://app.example.com/login
credentials:
username: testuser@example.com
password: secret123
login_flow:
- "Type $username into the email field"
- "Type $password into the password field"
- "Click the 'Sign In' button"
success_condition:
type: url_contains
value: "/dashboard"
rules:
avoid:
- description: "Do not test logout"
type: path
url_path: "/logout"
focus:
- description: "Prioritize API endpoints"
type: path
url_path: "/api"
reporting:
tester_name: "Security Team"
```
Then in Claude Code:
```
Load the config from configs/my-target.yaml and run the pentest
```
### Option C: Targeted Testing
Run specific WSTG tests against specific endpoints:
```
Run WSTG-INPV-05 (SQL Injection) against https://app.example.com/search?q=
```
```
Test https://app.example.com for CORS misconfiguration (WSTG-CONF-13)
```
```
Run all authentication tests (WSTG-ATHN) against https://app.example.com
```
### Option D: Resume an Interrupted Engagement
```
Resume engagement pentest-2026-02-11-myapp
```
---
## Testing Phases
### Phase 0: Application Discovery & Mapping
The critical foundation phase. Claude autonomously:
1. **Pre-flight checks** — verifies target reachability, detects redirects and cross-domain auth
2. **Launches 10+ background tools** in parallel (katana, ffuf, nuclei, whatweb, gau, nmap, feroxbuster, wapiti, httpx)
3. **Recursive crawling** — follows links to depth 2-3, parses HTML/JS for endpoints
4. **Directory brute-forcing** — common paths + technology-specific wordlists
5. **Tool result ingestion** — reads all background tool outputs and merges into unified endpoint map
6. **Builds structured endpoint inventory** with parameters, auth requirements, and priority rankings
**Output:** A complete endpoint map organized by domain, ready for systematic testing.
### Phase 1-2: Reconnaissance & Configuration
- Server fingerprinting, technology detection, metadata review
- Security header analysis (HSTS, CSP, CORS, X-Frame-Options)
- TLS configuration testing, admin interface discovery
- HTTP methods testing, file extension handling
### Phase 3: Authentication, Authorization & Session Management
- **Role/privilege lattice** built before testing (maps guards, middleware, and bypass tests)
- IDOR testing with multiple alternate IDs per endpoint
- CSRF testing on every state-changing endpoint
- Session fixation, hijacking, and token analysis
- JWT vulnerability testing (if applicable)
- OAuth/OIDC weakness testing (if applicable)
### Phase 4: Input Validation (Highest Impact)
Three independent two-stage pipelines run in parallel, each using the Analyzer→Exploiter role split:
| Pipeline | Vulnerability Classes | Tools | Technique Guides |
|----------|----------------------|-------|-----------------|
| XSS Pipeline | Reflected XSS, Stored XSS, DOM XSS | dalfox, Playwright | XSS, DOM |
| Injection Pipeline | SQL Injection, Command Injection, NoSQL Injection | sqlmap, commix, nosqli | SQLI, CMDI, NOSQLI |
| SSRF/SSTI Pipeline | SSRF, SSTI, Path Traversal | sstimap, ssrfmap | SSRF, SSTI, PTRAV |
Each pipeline: **Analyzer** (discover → analyze → build exploitation queue) → validation checkpoint → **Exploiter** (attempt exploitation → prove impact → log findings). WAF evasion intelligence is shared across all pipelines.
### Phase 5: Error Handling, Crypto, Business Logic, Client-Side & APIs
- Stack trace and error message disclosure
- TLS/SSL testing via testssl.sh
- Business logic bypass (workflow circumvention, request forgery)
- Client-side testing (clickjacking, open redirects, DOM manipulation)
- GraphQL and REST API testing
- Vulnerability chaining analysis across all findings
### Phase 6: Reporting
- Coverage verification (test coverage + tool coverage)
- Finding deduplication and severity calibration
- Markdown report generation with executive summary, findings, coverage matrices
### Phase 7: Final Judge Review
A zero-context agent reviews the entire engagement cold — no knowledge of testing decisions or difficulties. It examines:
- **Coverage integrity** — rubber-stamped tests, missing endpoints
- **N/A cascade detection** — categories with excessive "not applicable" markings
- **Finding quality** — evidence completeness, severity consistency, chaining opportunities
- **Tool utilization** — tools run but output never reviewed, lazy skip reasons
- **Missed attack surface** — untested endpoints, untested parameters, untested domains
The verdict (PASS/CONDITIONAL_PASS/FAIL) triggers specific remediation actions before the report is delivered.
---
## Security Tools
### Discovery & Reconnaissance (Phase 0)
| Tool | Purpose | Key Flags |
|------|---------|-----------|
| **katana** | Web crawler with JS rendering | `-jc` for JavaScript crawling |
| **httpx** | HTTP probing, tech detection | `-tech-detect -status-code -title` |
| **ffuf** | Directory/parameter fuzzing | `-w wordlist -mc all -fc 404` |
| **feroxbuster** | Recursive directory enumeration | `--smart --auto-tune` |
| **nuclei** | Template-based vuln scanner | `-t cves/ -t misconfigurations/` |
| **nikto** | Web server misconfiguration | `-Tuning 1234567890` |
| **whatweb** | Technology fingerprinting | `--aggression 3` |
| **nmap** | Port and service scanning | `-sV -sC --top-ports 1000` |
| **gau** | Historical URL discovery | `--blacklist png,jpg,gif` |
| **subfinder** | Subdomain enumeration | `-silent -all` |
### Injection Testing (Phase 4)
| Tool | Purpose | Key Flags |
|------|---------|-----------|
| **sqlmap** | SQL injection (all techniques) | `--batch --risk 3 --level 5` |
| **dalfox** | XSS scanning & exploitation | `--skip-bav --deep-domxss` |
| **commix** | Command injection | `--batch --all` |
| **sstimap** | Server-Side Template Injection | `-u <url>` |
| **ssrfmap** | SSRF exploitation | `-r request.txt` |
| **nosqli** | NoSQL injection | `-u <url>` |
| **crlfuzz** | CRLF injection / HTTP splitting | `-u <url>` |
| **smuggler** | HTTP request smuggling | `-u <url>` |
### Authentication & Session (Phase 3)
| Tool | Purpose | Key Flags |
|------|---------|-----------|
| **hydra** | Credential brute-force | `-L users.txt -P pass.txt` |
| **jwt_tool** | JWT token analysis & exploitation | `-t <token> -M at` |
### Cryptography & APIs (Phase 5)
| Tool | Purpose | Key Flags |
|------|---------|-----------|
| **testssl.sh** | TLS/SSL configuration testing | `--severity HIGH --sneaky` |
| **graphql-cop** | GraphQL security testing | `-t <url>` |
| **websocat** | WebSocket testing | `ws://<url>` |
### Infrastructure (Phase 2)
| Tool | Purpose |
|------|---------|
| **corscanner** | CORS misconfiguration scanning |
| **dnsreaper** | Subdomain takeover detection |
### Browser Automation
| Tool | Purpose |
|------|---------|
| **Playwright** | DOM XSS proof, clickjacking, JS-rendered login, client-side storage inspection |
---
## WSTG Knowledge Base
109 test cases across 12 OWASP categories, each with CLI-specific procedures:
| Code | Category | Tests | Examples |
|------|----------|:-----:|---------|
| **INFO** | Information Gathering | 10 | Search engine discovery, server fingerprinting, metadata review |
| **CONF** | Configuration & Deployment | 14 | Security headers, CORS, CSP, HSTS, admin interfaces |
| **IDNT** | Identity Management | 5 | Role definitions, registration, account enumeration |
| **ATHN** | Authentication | 11 | Default creds, lockout, auth bypass, MFA, password policy |
| **ATHZ** | Authorization | 5 | Directory traversal, auth bypass, privilege escalation, IDOR |
| **SESS** | Session Management | 11 | Cookie attributes, CSRF, session fixation/hijacking, JWT |
| **INPV** | Input Validation | 20 | XSS, SQLi, CMDi, SSTI, SSRF, path traversal, XXE, LDAP |
| **ERRH** | Error Handling | 2 | Error messages, stack traces |
| **CRYP** | Cryptography | 4 | TLS config, padding oracle, weak encryption |
| **BUSL** | Business Logic | 10 | Workflow bypass, request forgery, file upload, rate limits |
| **CLNT** | Client-Side | 14 | DOM XSS, clickjacking, open redirects, WebSockets, storage |
| **APIT** | API Testing | 3 | GraphQL, REST, SOAP |
Each test file includes:
- Step-by-step CLI procedures (curl commands, tool invocations)
- Payloads organized by bypass level (basic, intermediate, advanced)
- Detection criteria with severity assessment rubrics
- Remediation guidance with references
---
## PortSwigger Technique Guides
31 attack technique reference guides sourced from [PortSwigger Web Security Academy](https://portswigger.net/web-security), organized by vulnerability class for direct use during real pentesting engagements.
### What's Included
| Code | Category | WSTG Mapping | Key Content |
|------|----------|-------------|-------------|
| **SQLI** | SQL Injection | INPV-05 | UNION/blind/error/time-based/OOB techniques, database-specific cheat sheets (Oracle, MySQL, PostgreSQL, MSSQL), WAF bypass |
| **XSS** | Cross-Site Scripting | INPV-01, INPV-02, CLNT-01 | Reflected/stored/DOM contexts, tag & event handler payloads, CSP bypass, filter evasion |
| **CMDI** | OS Command Injection | INPV-12 | Separator characters, blind techniques (time-delay, OOB), OS-specific payloads |
| **SSTI** | Server-Side Template Injection | INPV-18 | Jinja2/Twig/Freemarker/Velocity/ERB detection & exploitation, sandbox escapes |
| **SSRF** | Server-Side Request Forgery | INPV-19 | URL scheme tricks, IP obfuscation, DNS rebinding, cloud metadata, filter bypass |
| **PTRAV** | Path Traversal | INPV-04 | Encoding variations, null byte injection, wrapper bypass |
| **XXE** | XML External Entities | INPV-07 | File retrieval, SSRF via XXE, blind XXE with OOB, parameter entities |
| **AUTHN** | Authentication | ATHN-01 to ATHN-07 | Brute force, 2FA bypass, password reset poisoning, credential stuffing |
| **AUTHZ** | Access Control | ATHZ-01 to ATHZ-04 | IDOR, privilege escalation, horizontal/vertical bypass, referer-based controls |
| **JWT** | JSON Web Tokens | SESS-10 | Algorithm confusion (none/HS256→RS256), kid injection, JWK/JKU exploitation |
| **OAUTH** | OAuth 2.0 | ATHZ-05 | Authorization code theft, open redirect, scope upgrade, CSRF on OAuth flows |
| **CSRF** | Cross-Site Request Forgery | SESS-05 | Token bypass, SameSite bypass, referer validation bypass |
| **SMUGGLE** | HTTP Request Smuggling | INPV-15 | CL.TE, TE.CL, TE.TE, HTTP/2 downgrade, request tunneling |
| **DOM** | DOM-Based Vulnerabilities | CLNT-01 | Sources/sinks, DOM clobbering, prototype pollution gadgets |
| **CORS** | Cross-Origin Resource Sharing | CONF-13, CLNT-07 | Origin reflection, null origin, subdomain trust exploitation |
| **NOSQLI** | NoSQL Injection | INPV-05 | MongoDB operator injection, JavaScript injection, blind extraction |
| **GRAPHQL** | GraphQL | APIT-01 | Introspection, field suggestion, batching attacks, authorization bypass |
| **RACE** | Race Conditions | BUSL-04 | Limit overrun, TOCTOU, single-endpoint races, last-frame sync |
| **UPLOAD** | File Upload | BUSL-08, BUSL-09 | Extension bypass, content-type manipulation, web shells, polyglot files |
| **HOST** | Host Header Injection | INPV-17 | Password reset poisoning, cache poisoning, routing-based SSRF |
Plus 11 more: CLICK, WS, CACHEPOIS, CACHEDEC, DESER, INFO, BUSL, PROTO, API, LLM, SKILLS.
### How They're Used
Technique guides are integrated into every testing phase via the `get_technique_guide()` MCP tool:
```
Phase 2 → CORS guide for CONF-13 testing
Phase 3 → AUTHN, AUTHZ, CSRF, JWT, OAUTH guides for auth/session testing
Phase 4 → SQLI, XSS, CMDI, SSTI, SSRF, PTRAV, XXE guides for input validation
Phase 5 → DOM, CLICK, GRAPHQL, RACE, UPLOAD guides for client-side & business logic
```
Each parallel testing agent automatically loads its relevant technique guide before testing, providing:
- **Detection payloads** — what to inject to identify the vulnerability
- **Exploitation techniques** — organized by attack method with step-by-step procedures
- **Cheat sheets** — database/platform-specific syntax tables for quick reference
- **WAF bypass patterns** — encoding, obfuscation, and filter evasion strategies
### Adding Custom Guides
See [`docs/adding-knowledge-base-resources.md`](docs/adding-knowledge-base-resources.md) for instructions on adding new technique guides to the knowledge base.
---
## Quality Assurance System
AutoPentest has a multi-layered QA system that prevents shallow testing:
### 1. Phase Gates (Automated)
After each phase, `phase_gate_check()` validates:
- All MUST-priority tests were executed
- Minimum coverage thresholds are met
- Tool coverage is adequate
- No critical gaps exist
**Blocked phases cannot proceed** until all issues are resolved.
### 2. Quality Reviewer (Per-Phase)
A subagent spawned at every phase transition that:
- Checks for 16 known anti-patterns (rubber-stamping, N/A cascades, finding inflation)
- Identifies untested endpoints and parameters
- Suggests vulnerability chaining opportunities
- Recommends alternative approaches for blocked tests
### 3. Final Judge (Post-Report)
A zero-context agent that reviews the completed engagement with fresh eyes:
- Analyzes coverage integrity across all domains
- Detects N/A cascades and their root causes
- Validates finding quality and evidence completeness
- Identifies missed attack surface
- Issues a verdict: **PASS**, **CONDITIONAL_PASS**, or **FAIL**
### 4. Exhaustion Gates
Marking a vulnerability as "not exploitable" requires proof of effort:
| Vuln Class | Min Techniques | Min Bypass Attempts |
|------------|:-:|:-:|
| XSS | 3 | 5 |
| SQL Injection | 3 | 5 |
| Command Injection | 3 | 5 |
| SSTI | 2 | 3 |
| SSRF | 3 | 5 |
| Path Traversal | 3 | 5 |
### 5. Evidence Checklists
Before logging any finding, evidence requirements are verified:
- Reproducible curl command
- Full HTTP request and response
- Proof of actual exploitation (not theoretical impact)
- Correct classification tier (EXPLOITED vs POTENTIAL)
### 6. Live Engagement Logging
Every MCP tool call is automatically logged to `engagements/<eid>/logs.txt` with full arguments, results, and execution duration. Run `tail -f logs.txt` in a separate terminal to watch all agent activity in real time. 100% coverage via automatic tool wrapper — no manual instrumentation needed.
### 7. Phase Gate Timing
Phase gates enforce minimum 60-second intervals between calls (15s in CTF mode), preventing premature phase completion. Inter-gate work verification warns if fewer than 3 work events occur between consecutive gates.
---
## Benchmarking
AutoPentest includes integration with the [XBOW Validation Benchmarks](https://github.com/xbow-engineering/validation-benchmarks) — 104 CTF-style Docker challenges used as the industry standard for benchmarking AI pentest agents.
### Benchmark Scores (Reference)
| Agent | Score | Source |
|-------|:-----:|--------|
| Shannon | 96.2% | KeygraphHQ (2024) |
| PentestGPT | 86.5% | USENIX Sec 2024 |
### Usage
```bash
# Setup (one-time)
cd benchmarks/xbow && make setup
# Solve with AutoPentest (MCP server + CLAUDE.md + CTF mode)
make solve ID=XBEN-001-24
# Solve with raw Claude (baseline — no MCP, no methodology)
make solve ID=XBEN-001-24 RAW=1
# Solve by vulnerability tag
make solve-tag TAG=sqli
# Solve all 104 challenges
make solve-all
# Full baseline run for comparison
make solve-all RAW=1
# Score the latest run
make score
# Compare autopentest vs raw runs side-by-side
make compare
```
The solver has two modes:
- **autopentest** (default): Runs Claude Code from the project root, loading `.mcp.json` (MCP server with 68+ tools) and `CLAUDE.md` (pentest methodology). Measures AutoPentest's full capability.
- **raw** (`RAW=1`): Runs bare Claude Code with no MCP server or methodology. Baseline for measuring AutoPentest's value-add over raw LLM capability.
Each challenge is a Docker Compose app with a flag injected at build time. Flag extraction from Claude's output determines pass/fail. Results are scored per-challenge, per-tag, and per-difficulty-level.
### CTF Mode
For CTF challenges and small apps, enable CTF mode for relaxed quality gates:
```yaml
mode: ctf
target:
url: https://target.com
```
CTF mode reduces phase gate timing (15s vs 60s), skips QA Reviewer requirements, and halves completion thresholds — while maintaining finding quality and evidence standards.
---
## Example Report
A complete example report from a pentest against [PortSwigger's Gin & Juice Shop](https://ginandjuice.shop) (a deliberately vulnerable application) is included in the repository:
**[View Full Report](engagements/pentest-2026-ginandjuice/report.md)**
### What the Report Includes
The report demonstrates AutoPentest's output against a real target with 23 findings across all severity levels:
| Severity | Count | Examples |
|----------|:-----:|---------|
| Critical | 2 | UNION-based SQL injection with full data extraction, access control bypass via X-Original-URL header |
| High | 5 | Reflected XSS via JS string escape bypass, IDOR on order details, XXE with local file read, DOM XSS via prototype pollution |
| Medium | 6 | Missing security headers, no account lockout, missing CSP, CRLF injection, DOM-based open redirect |
| Low | 5 | Infrastructure info disclosure, EOL AngularJS, insecure ALB cookies, weak TLS config |
| Informational | 5 | Consolidated duplicates and secondary evidence for primary findings |
### Report Structure
```
1. Executive Summary — Target scope, finding summary, domain architecture
2. Detailed Findings — Each finding with description, evidence (curl commands), and remediation
3. Vulnerability Chaining — Cross-finding analysis (e.g., XSS + no CSP = severity upgrade)
4. Test Coverage Matrix — Per-category WSTG coverage (100% across 12 categories)
5. Tool Coverage Matrix — 27/27 tools tracked, 8 actively run
```
### Sample Finding (SQL Injection)
From the report — a Critical SQL injection finding with full exploitation evidence:
```
FINDING-017: SQL Injection in /catalog category parameter — Full Data Extraction
Severity: Critical
WSTG Reference: WSTG-INPV-05
The category parameter is vulnerable to UNION-based SQL injection.
The attacker can:
1. Inject a single quote to cause a 500 error (confirming injection)
2. Use UNION SELECT with 8 columns to extract arbitrary data
3. Enumerate tables: PRODUCTS, TRACKING, USERS
4. Extract credentials from the USERS table
Evidence (reproducible curl command):
curl -sk "https://ginandjuice.shop/catalog?category='+UNION+SELECT+1,USERNAME,PASSWORD,
1,1,USERNAME,1,USERNAME+FROM+USERS+LIMIT+10--"
```
Every finding includes reproducible curl commands, full request/response evidence, and actionable remediation guidance.
---
## Configuration
### Engagement Config (YAML)
Config-driven pentests skip interactive questions and ensure consistency:
```yaml
target:
url: https://app.example.com
scope: [app.example.com, api.example.com]
authentication:
login_type: sso # form | sso | api | manual | none
login_url: https://app.example.com/login
credentials:
username: testuser
password: secret123
sso:
provider: keycloak # keycloak | auth0 | okta | azure_ad
auth_domain: auth.example.com
realm: myrealm
client_id: my-app
rules:
avoid:
- { type: path, url_path: "/logout", description: "Skip logout" }
- { type: endpoint, method: DELETE, url_path: "/api/admin/*", description: "No destructive admin ops" }
focus:
- { type: path, url_path: "/api", description: "Prioritize API" }
reporting:
tester_name: "Security Team"
```
### MCP Server Configuration
The `.mcp.json` file registers two MCP servers:
```json
{
"mcpServers": {
"wstg-pentest": {
"command": "uv",
"args": ["--directory", "./server", "run", "server.py"]
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
}
}
```
### Burp Suite Integration (Optional)
For passive traffic monitoring through Burp Suite Professional:
1. Start Burp Suite and enable the proxy on **all interfaces** (`0.0.0.0:8080`)
2. The Docker container automatically routes traffic through `host.docker.internal:8080`
3. All HTTP requests appear in Burp's proxy history for manual review
---
## Multi-Domain Testing
AutoPentest has first-class support for applications with multiple domains (e.g., a SPA frontend + API backend + SSO provider):
### Automatic Detection
During Phase 0, AutoPentest detects cross-domain authentication by following login redirects:
```
app.example.com → redirects to → auth.example.com/login
→ after login → app.example.com/callback
```
All domains are automatically registered in scope with their type (app, auth_provider, api, cdn).
### Per-Domain Testing
Every WSTG test is evaluated per domain — not just the primary:
- Discovery tools (katana, ffuf, nuclei) run against **all** domains
- Input validation tools (sqlmap, dalfox) target endpoints on **every** domain with server-side processing
- A test is "not applicable" only when **no** domain has the tested feature
### Cross-Domain Authentication
Supported SSO protocols:
- **OAuth 2.0 / OIDC** (Authorization Code, PKCE, Password Grant, Client Credentials)
- **SAML** (SP-initiated flow)
- **Keycloak**, **Auth0**, **Okta**, **Azure AD**
- **Custom SSO** (redirect chain following with cookie jar)
Authentication escalation procedure (6 levels) ensures testing can proceed even with complex auth flows.
---
## Crash Recovery
AutoPentest is designed to survive interruptions:
### Automatic Checkpointing
- Phase gates auto-save checkpoints on PASS
- `git_checkpoint()` creates git snapshots of the engagement workspace
- Append-only logs (`findings.md`, `progress.log`) survive crashes
### Auto-Resume via resume-prompt.md (Recommended)
Every checkpoint and phase gate automatically generates `engagements/<eid>/resume-prompt.md` — a complete, self-contained prompt with everything a fresh session needs:
- Target URL, authentication credentials, and scope domains
- Current phase and which specific tests remain (mid-phase precision)
- Cookie jar status and re-authentication instructions
- Avoid/focus rules and endpoint map references
**To resume after an interruption:**
1. Open a new Claude Code session
2. Paste the contents of `engagements/<eid>/resume-prompt.md`
3. Claude picks up exactly where it left off — no manual context needed
### Resume from Checkpoint (Alternative)
```
Resume engagement pentest-2026-02-11-myapp
```
This restores:
- All findings and test tracking data
- Coverage statistics and phase gate results
- Scope registrations and deliverables
- Mid-phase remaining tests (not just phase-level state)
- Instructions for what to do next
### Manual Checkpoints
Save at any time:
```
Save a checkpoint before starting Phase 4 exploitation
```
### Rollback on Failure
If a phase produces bad results, roll back to the previous checkpoint:
```
Roll back the engagement to the last checkpoint
```
---
## Project Structure
```
autopentest-ai/
├── CLAUDE.md # Master pentest workflow (drives Claude Code)
├── .mcp.json # MCP server configuration
├── Dockerfile # Multi-stage Docker build (27 tools)
├── docker-compose.yml # Docker Compose alternative
├── Makefile # setup, start, stop, verify-tools, shell
│
├── server/
│ ├── server.py # FastMCP server (68+ MCP tools)
│ ├── task_tree.py # Hierarchical task tree (6 MCP tools)
│ ├── tool_parsers.py # Tool output parsing (2 MCP tools, 13 parsers)
│ ├── endpoint_priority.py # Endpoint risk prioritization (2 MCP tools)
│ ├── waf_evasion.py # Adaptive WAF evasion (3 MCP tools, 12 vendors)
│ ├── knowledge_graph.py # Cross-phase knowledge graph (5 MCP tools)
│ ├── tool_verification.py # CLI tool results verification (1 MCP tool, 10 validators)
│ ├── context_compression.py # Progressive context compression (2 MCP tools)
│ └── pyproject.toml # Python dependencies
│
├── knowledge-base/
│ ├── web-security-testing-guide/ # OWASP WSTG knowledge base (109 test procedures)
│ │ ├── 01-information-gathering/ # 10 tests (WSTG-INFO-01 → 10)
│ │ ├── 02-configuration/ # 14 tests (WSTG-CONF-01 → 14)
│ │ ├── 03-identity-management/ # 5 tests (WSTG-IDNT-01 → 05)
│ │ ├── 04-authentication/ # 11 tests (WSTG-ATHN-01 → 11)
│ │ ├── 05-authorization/ # 5 tests (WSTG-ATHZ-01 → 05)
│ │ ├── 06-session-management/ # 11 tests (WSTG-SESS-01 → 11)
│ │ ├── 07-input-validation/ # 20 tests (WSTG-INPV-01 → 20)
│ │ ├── 08-error-handling/ # 2 tests (WSTG-ERRH-01 → 02)
│ │ ├── 09-cryptography/ # 4 tests (WSTG-CRYP-01 → 04)
│ │ ├── 10-business-logic/ # 10 tests (WSTG-BUSL-01 → 10)
│ │ ├── 11-client-side/ # 14 tests (WSTG-CLNT-01 → 14)
│ │ └── 12-api-testing/ # 3 tests (WSTG-APIT-01 → 03)
│ └── portswigger-academy/ # 31 PortSwigger attack technique guides
│ ├── sql-injection.md # UNION, blind, error-based, OOB, WAF bypass
│ ├── cross-site-scripting.md # Reflected, stored, DOM, CSP bypass, filter evasion
│ ├── ssrf.md # URL schemes, cloud metadata, DNS rebinding
│ ├── ssti.md # Jinja2, Twig, Freemarker sandbox escapes
│ ├── jwt.md # Algorithm confusion, kid injection, JWK exploitation
│ ├── oauth.md # Auth code theft, redirect exploitation, scope upgrade
│ └── ... (31 total) # One per vulnerability class
│
├── templates/ # Testing guides and procedures
│ ├── input-validation-guide.md # Phase 4 step-by-step procedures
│ ├── testing-strategies.md # Test matrices, chaining, parallel strategy
│ ├── cli-tools-guide.md # Tool setup and Docker management
│ ├── tools.md # Per-tool command reference
│ ├── quality-gates.md # Phase quality checklists and anti-patterns
│ ├── cross-domain-auth-guide.md # SSO/OIDC/SAML procedures
│ ├── source-code-analysis.md # Security-focused code review template
│ ├── pipelined-testing.md # Phase 4 pipelined exploitation strategy
│ ├── agent-roles/ # Role-specialized subagent templates
│ │ ├── README.md # Role index and selection guide
│ │ ├── scout.md # Reconnaissance role (Phase 0-1)
│ │ ├── analyzer.md # Vulnerability discovery role (Phase 2-5)
│ │ ├── exploiter.md # Exploitation proof role (Phase 4)
│ │ └── reporter.md # QA review + Final Judge role
│ ├── shared/
│ │ ├── honesty-framework.md # Anti-hallucination guardrails
│ │ ├── exploit-classification.md # Three-tier finding classification
│ │ ├── reproducibility.md # Evidence format requirements
│ │ └── scope-rules.md # Avoid/focus rule templates
│ └── wordlists/ # Tech-specific fuzzing wordlists
│
├── benchmarks/
│ └── xbow/ # XBOW benchmark suite (104 CTF challenges)
│ ├── runner.py # Challenge orchestration
│ ├── solver.py # Automated solver (Claude Code CLI)
│ ├── Makefile # solve, solve-all, score, compare
│ └── results/ # Run reports
│
├── docs/
│ ├── ROADMAP.md # Competitive analysis + improvement roadmap
│ └── adding-knowledge-base-resources.md # Guide for adding new technique guides
│
├── configs/
│ ├── example-config.yaml # Example engagement configuration
│ └── config-schema.md # YAML schema documentation
│
├── scripts/
│ ├── install-tools.sh # Docker build + container start
│ ├── browser-auth.py # Headless Chromium auth (JS-rendered logins)
│ ├── pkce-auth.py # OAuth 2.0 PKCE flow automation
│ └── status.sh # Engagement status dashboard
│
└── engagements/ # Runtime output (git-ignored)
└── <engagement-id>/
├── logs.txt # Live engagement log (tail -f to watch)
├── findings.md # Append-only findings log
├── progress.log # Timestamped event log
├── resume-prompt.md # Auto-resume prompt (paste into new session)
├── report.md # Final pentest report
├── cookies.txt # Cross-domain cookie jar
└── tool-output/ # Raw CLI tool outputs
```
---
## Requirements
| Requirement | Version | Notes |
|-------------|---------|-------|
| Docker | 20.10+ | Docker Desktop on macOS/Windows |
| Claude Code | Latest | `npm install -g @anthropic-ai/claude-code` |
| uv | 0.1+ | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |
| Node.js | 18+ | For Playwright MCP server |
| Python | 3.10+ | Managed by uv (no manual install needed) |
| Burp Suite Pro | Latest | **Optional** — for passive traffic monitoring |
**Supported platforms:** macOS (Apple Silicon & Intel), Linux (x86_64 & ARM64)
---
## FAQ
**Q: Does this replace a human penetration tester?**
No. AutoPentest automates the systematic, methodology-driven parts of a pentest. It excels at coverage (ensuring nothing is missed) and consistency (every test follows the same procedure). However, complex business logic, creative exploitation chains, and context-dependent risk assessment still benefit from human expertise. Think of it as a force multiplier.
**Q: How long does a full assessment take?**
It depends on the application's size and complexity. A typical medium-sized web app (50-100 endpoints) takes a few hours. Multi-domain applications with SSO take longer. The pipelined Phase 4 architecture parallelizes the most time-intensive testing.
**Q: Can I run this without Burp Suite?**
Yes. Burp Suite is optional and used only for passive traffic monitoring. All HTTP requests go through `docker exec curl` and all security tools run inside the Docker container. Without Burp, you lose the ability to review traffic in Burp's proxy history, but all testing functionality works.
**Q: What are the PortSwigger technique guides?**
31 attack reference guides covering detection, exploitation techniques, payloads, cheat sheets, and WAF bypass patterns — sourced from PortSwigger Web Security Academy. During testing, agents automatically load the relevant guide (e.g., the SQLi guide when testing for SQL injection) for comprehensive technique and payload reference. See [`docs/adding-knowledge-base-resources.md`](docs/adding-knowledge-base-resources.md) to add your own guides.
**Q: How do I add custom wordlists or payloads?**
Place wordlists in `templates/wordlists/` and they'll be available inside the Docker container via the volume mount. The WSTG test files in `knowledge-base/` can also be customized with additional payloads. To add new attack technique guides, follow the instructions in [`docs/adding-knowledge-base-resources.md`](docs/adding-knowledge-base-resources.md).
**Q: Can I test applications behind a VPN?**
Yes. The Docker container inherits your host's network (on Linux with `--network host`) or reaches the host via `host.docker.internal` (on macOS/Windows). If your VPN is running on the host, the container can reach VPN-protected targets.
**Q: What happens if a pentest is interrupted (crash, usage limit, timeout)?**
AutoPentest automatically generates a `resume-prompt.md` file at every checkpoint with everything needed to continue. Open a new Claude Code session, paste the contents of `engagements/<eid>/resume-prompt.md`, and testing resumes exactly where it left off — including mid-phase progress, credentials, scope, and remaining tests.
**Q: What about rate limiting?**
AutoPentest includes three-tier error classification (Transient/Rate Limit/Permanent) with automatic backoff. If the target rate-limits requests, tools automatically slow down. You can also set avoid rules in the config to skip specific endpoints.
**Q: What are the agent roles?**
AutoPentest uses 4 specialized roles (Scout, Analyzer, Exploiter, Reporter) instead of generic subagents. Each role has a dedicated prompt template with focused tool guidance, restricted tool lists, and anti-patterns. This prevents agents from conflating reconnaissance, analysis, exploitation, and reporting — improving focus and failure isolation. See [`templates/agent-roles/README.md`](templates/agent-roles/README.md) for the full role index.
**Q: How does WAF evasion work?**
When a payload gets blocked (403, block page), AutoPentest automatically fingerprints the WAF vendor from response characteristics, then loads vendor-specific bypass payloads organized by complexity level. 12 WAF vendors are supported (Cloudflare, AWS WAF, Akamai, Imperva, ModSecurity, F5, and more). WAF intelligence is shared across all agents via the deliverable system.
**Q: What is counterfactual analysis?**
After the first analysis pass finds vulnerabilities, AutoPentest can spawn a second Analyzer that assumes all known vulnerabilities are patched. This forces the agent to look for different attack vectors — different endpoints, parameters, injection contexts, and logic flaws. The results are merged into the existing exploitation queue with automatic deduplication. This technique is based on academic research (PenHeal ablation study) showing +71% vulnerability coverage improvement.
**Q: How does results verification work?**
When CLI tools (nmap, nuclei, sqlmap, etc.) produce empty or suspicious output, the `verify_tool_result()` tool detects common issues (proxy errors, permission denied, wrong flags) and suggests corrected commands. This prevents agents from silently counting broken tool runs as "completed" — a common failure mode in automated pentesting.
**Q: How does vulnerability chaining work?**
The knowledge graph tracks entities (endpoints, parameters, findings, cookies, domains) and relationships discovered during testing. After Phase 4, `find_chains()` uses BFS to discover multi-hop attack paths and checks 7 predefined chain patterns (e.g., XSS + missing CSP, SSRF + cloud metadata, IDOR + admin role). Chains that increase impact trigger automatic severity upgrades.
---
## Disclaimer
**This tool is intended for authorized security testing only.** Only use AutoPentest against applications you have explicit permission to test. Unauthorized access to computer systems is illegal. The authors are not responsible for any misuse of this tool.
Always ensure you have:
- Written authorization from the application owner
- A clearly defined scope of what can and cannot be tested
- An understanding of the testing environment (production vs staging)
- Appropriate avoid rules configured for destructive or sensitive endpoints
---
<p align="center">
Built with <a href="https://modelcontextprotocol.io">Model Context Protocol</a>
</p>
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
everything-claude-code
Complete Claude Code configuration collection - agents, skills, hooks,...
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.