Content
# ZeroToken
<!-- mcp-name: io.github.AMOS144/zerotoken -->
[](https://github.com/AMOS144/zerotoken/actions/workflows/ci.yml)
[](https://pypi.org/project/zerotoken/)
[](https://pypi.org/project/zerotoken/)
[](https://pepy.tech/project/zerotoken)
[](https://opensource.org/licenses/MIT)
**ZeroToken - Record once, automate forever.**
> Lightweight MCP for AI agent browser automation. Record once, replay forever — cut token cost and speed up repetitive tasks.
## Tool List
## Using ZeroToken with OpenClaw
ZeroToken is the browser execution layer for OpenClaw, suitable for **record-once-and-replay-forever** automation tasks (e.g., daily login, timing crawl). The following describes the complete integration process.
### Why is HTTP mode necessary?
When OpenClaw calls MCP through MCPorter, if stdio (command mode) is used, **a new process is created for each tool call**, causing the browser instance to be destroyed and the state to be lost. Therefore, the **Streamable HTTP mode** must be used: ZeroToken resides as an HTTP service, and OpenClaw connects through the URL, keeping the browser state within the same session.
### Integration Steps (Complete Process)
#### 1. Install ZeroToken
```bash
# via pip or uv
pip install zerotoken
# or
uv add zerotoken
# Install Playwright browser (required)
playwright install chromium
```
If installed via MCPorter to OpenClaw:
```bash
npm install -g mcporter
mcporter install zerotoken --target openclaw --configure
```
Also, execute `playwright install chromium` after installation.
#### 2. Start HTTP service in the background and keep it resident
Run in the terminal (do not close):
```bash
zerotoken-mcp-http
```
Listens to `http://0.0.0.0:8000/mcp` by default. Specify the port:
```bash
zerotoken-mcp-http --port 8001
# or
ZEROTOKEN_HTTP_PORT=8001 zerotoken-mcp-http
```
#### 3. Configure openclaw.json
In `~/.openclaw/openclaw.json` (or `openclaw.json` in the project), configure ZeroToken as a **URL** in `mcpServers`, not as a command:
```json
{
"mcpServers": {
"zerotoken": {
"url": "http://localhost:8000/mcp"
}
}
}
```
If a non-default port is used, modify the port number in the URL.
#### 4. Install zerotoken-openclaw Skill
Place the Skill in one of OpenClaw's skills directories:
or via ClawHub: `clawhub install zerotoken-openclaw`
or copy from this repository:
```bash
cp -r skills/zerotoken-openclaw ~/.openclaw/skills/
```
#### 5. Enable in OpenClaw
Enable the MCP server named `zerotoken` in OpenClaw and ensure the `zerotoken-openclaw` Skill is loaded. The Agent can then call browser tools through MCP.
### Typical Workflow
1. **Record trajectory**: The user describes a task (e.g., "log in to a station daily and pull a report"), and the Agent calls `browser_init` → `browser_open` / `browser_click` / `browser_input`, etc. → `trajectory_complete` to complete a recording.
2. **Generate script**: For repetitive/timing tasks, the Agent calls `trajectory_to_script(task_id)` to convert the trajectory into a replayable script.
3. **Bind timing task**: The Agent calls `script_binding_set(binding_key=job_id, script_task_id=task_id)` to bind OpenClaw's job_id to the script.
4. **Execute timing**: When OpenClaw triggers a timing task, the Agent calls `run_script_by_job_id(binding_key=job_id)` to execute in one step, without LLM step-by-step reasoning, and with low token consumption.
See `skills/zerotoken-openclaw/SKILL.md` and `docs/skills.md` for more details.
### Frequently Asked Questions
- **Browser state lost, each operation like the first time**: Indicates that the command mode is still being used. Ensure (1) `zerotoken-mcp-http` is running in the background; (2) `zerotoken` is configured as a `url` rather than a `command` in `openclaw.json`.
- **Connection failed / MCP unavailable**: Confirm `zerotoken-mcp-http` is started and the port is correct (default 8000), and the URL matches the configuration.
## Core Philosophy
### Problem
When AI Agents directly control browsers to execute repetitive tasks, they consume a large number of tokens for reasoning each time, which is costly and slow.
### ZeroToken Solution
1. **Operation execution**: AI performs step-by-step reasoning through ReAct mode, calling MCP atomic capabilities to complete browser operations.
2. **Trajectory recording**: The system records complete operation trajectories (including page state, screenshots, execution results, and fuzzy point markers).
3. **AI prompt export**: Trajectories can be exported in AI-friendly formats, including fuzzy point descriptions that need to be judged, for further analysis by Skills or other modules.
## Core Features
- **Layered architecture** - Transport → Handler → Service → Domain → Repository/Infrastructure five layers, Pydantic v2 strong type model.
- **Complete trajectory recording** - Each operation records steps, page state, screenshots, and structured OperationRecord.
- **Multi-tab and iframe** - Built-in BrowserContextManager supports multi-tab switching, iframe entry/exit, independent fingerprint, and state.
- **Rich browser atomic operations** - 26 browser_* tools: navigation, click/input/hover/keyboard/scroll/drag, file upload/download, JS evaluation, screenshot, etc.
- **Script Engine** - Nested step trees + VarsEnvironment support if/loop/assign process control, variable passing, whitelist AST secure expressions, and loop protection.
- **Step-as-Unit error handling** - Arbitrary step failure/need for judgment pauses (pause), AI decides retry/skip/patch/abort through resolution.
- **Recording exploration mode** - `trajectory_explore_start/stop` isolates AI trial and error paths to avoid polluting trajectories.
- **Token optimization** - DOM intelligent pruning, screenshot compression/cropping/quality reduction, page state summarization, multi-level token consumption reduction.
- **SQLite storage** - scripts/trajectories/sessions/fingerprints/bindings/runtime are split into repositories by responsibility, with versioned migration.
- **Script lifecycle management** - script_deprecate/restore/health, automatically track continuous failures, and calculate success rates.
- **Task binding** - script_bind maps external job_id to scripts, facilitating OpenClaw and other timing task schedules.
- **Stability enhancement** - SmartSelector (multiple alternatives + unstable mode filtering), SmartWait (multi-wait condition cascading), ErrorRecovery (exponential backoff + selector variation + iframe search).
- **Adaptive element positioning** - Save element fingerprint upon first hit (auto_save), reposition based on similarity when selector fails after revision (adaptive), no need to change code.
- **Anti-crawling/cloud shield response** - `browser_init(stealth=true)` enables stealthy startup and fingerprint disguise, reducing the probability of being identified as an automated browser.
- **MCP protocol** - stdio + Streamable HTTP dual transport, handlers/ modular registration.
## Stability Enhancement
### Unstable Factor Analysis
```
Selector failure (60%) Dynamic ID, class name changes, DOM structure changes
Timing issues (25%) Element not loaded, network requests, animation not executed
Environment changes (10%) Viewport changes, user state, Cookie influence
Other factors (5%) Pop-up interference, resource loading failure
```
### Solution
**1. SmartSelector - Intelligent selector generation**
- Automatically generate multiple alternative selectors
- Priority: data-testid > id > aria > CSS > XPath
- Detect and filter unstable class names (e.g., `el-*`, `ant-*`, `Mui-*`)
**2. SmartWait - Intelligent waiting strategy**
- Multiple waiting conditions: selector, visible, networkidle, text, function
- Cascading waiting support
- Page stability detection
**3. ErrorRecovery - Error recovery mechanism**
- Automatically detect error types
- Selector variation attempts
- Exponential backoff retries
- Iframe element search
## System Architecture
Five-layer architecture, dependency direction **unidirectional downward**: Handler → Service → Domain / Repository / Infrastructure, never in reverse.
```
┌─────────────────────────────────────────────────────────────────┐
│ Transport MCP stdio / Streamable HTTP │
├─────────────────────────────────────────────────────────────────┤
│ Handler handlers/{browser,trajectory,script}_handlers │
│ (tool registration + parameter validation + scheduling) │
├─────────────────────────────────────────────────────────────────┤
│ Service BrowserService TrajectoryService ScriptService│
│ (business orchestration, no framework dependency) │
├─────────────────────────────────────────────────────────────────┤
│ Domain Pydantic models (OperationRecord, Trajectory, │
│ Script, Session, Resolution, ...) │
├──────────────────────────────────┬──────────────────────────────┤
│ Repository │ Infrastructure │
│ Protocol abstraction + SQLite implementation │ browser/ ActionPipeline │
│ ScriptRepo / TrajectoryRepo / │ + actions/ │
│ SessionRepo / RuntimeRepo / │ + stability/ │
│ FingerprintRepo / BindingRepo │ engine/ ScriptEngine │
│ │ + flow_control │
│ │ + data_flow │
│ │ optimizers/ DOM/Screenshot/ │
│ │ StateSummary │
└──────────────────────────────────┴──────────────────────────────┘
```
### Mermaid Architecture Diagram
```mermaid
graph TB
subgraph "Client"
A[AI Agent / OpenClaw / Cursor]
end
subgraph "Transport"
T1[MCP stdio]
T2[Streamable HTTP]
end
subgraph "Handler"
H1[browser_handlers]
H2[trajectory_handlers]
H3[script_handlers]
end
subgraph "Service"
S1[BrowserService]
S2[TrajectoryService]
S3[ScriptService]
end
subgraph "Infrastructure"
I1[ActionPipeline + actions/]
I2[Stability: Selector/Wait/Recovery/Adaptive]
I3[ScriptEngine: flow_control + data_flow]
I4[Optimizers: DOM / Screenshot / StateSummary]
end
subgraph "Repository"
R1[ScriptRepo]
R2[TrajectoryRepo]
R3[SessionRepo]
R4[RuntimeRepo]
R5[FingerprintRepo]
R6[BindingRepo]
end
subgraph "Storage / Browser"
DB[(SQLite zerotoken.db)]
PW[Playwright / Chromium]
end
A --> T1
A --> T2
T1 --> H1 & H2 & H3
T2 --> H1 & H2 & H3
H1 --> S1
H2 --> S2
H3 --> S3
S1 --> I1 --> I2
S2 --> R2
S3 --> I3 --> I4
S3 --> R1 & R3 & R4 & R6
S1 --> R5
I1 --> PW
R1 & R2 & R3 & R4 & R5 & R6 --> DB
```
## Installation
**OpenClaw users**: Complete steps see above "[Using ZeroToken with OpenClaw](#using-zerotoken-with-openclaw)".
**Cursor and other IDEs**: Install and use stdio mode, configure `command: "zerotoken-mcp"` or `command: "uv", args: ["run", "zerotoken-mcp"]` on the client.
### Local development / pip installation
```bash
# Clone the project
git clone https://github.com/AMOS144/zerotoken.git
cd zerotoken
# Install dependencies
uv sync
# or pip install
pip install zerotoken
# Install Playwright browser
playwright install chromium
```
## Quick Start
### 1. Start MCP Server
| Scenario | Command | Description |
|------|------|------|
| **OpenClaw** | `zerotoken-mcp-http` | Resident in the background, `openclaw.json` configured `url: "http://localhost:8000/mcp"`. See "[Using ZeroToken with OpenClaw](#using-zerotoken-with-openclaw)". |
| **Cursor and other IDEs** | `zerotoken-mcp` or launched by the client | stdio mode, configure `command: "zerotoken-mcp"`. |
```bash
# OpenClaw: HTTP mode (background resident)
zerotoken-mcp-http
# Cursor: stdio mode
zerotoken-mcp
```
### 2. AI Agent calls browser tools through MCP
Example process:
```
# Initialize browser (if anti-crawling/cloud shield, pass stealth=true)
→ browser_init(headless=true)
← {"success": true, "config": {...}}
# Start trajectory recording
→ trajectory_start(task_id="login_task", goal="login system")
← {"success": true, "task_id": "login_task"}
# Execute Browser Actions (Automatically Recorded to Trajectory)
→ browser_open(url="https://example.com/login")
← {
"step": 1,
"action": "open",
"params": {"url": "https://example.com/login"},
"result": {"success": true, "title": "Login"},
"page_state": {"url": "...", "title": "..."},
"screenshot": "base64..."
}
→ browser_input(selector="#username", text="testuser")
→ browser_input(selector="#password", text="secret123")
→ browser_click(selector="#submit-btn")
# Complete Trajectory and Get AI Prompt (with Fuzzy Point Marking)
→ trajectory_complete(export_for_ai=true)
← {
"success": true,
"ai_prompt": "Task Goal: Login System\n\nOperation History:\n[Step 1] open(...)\n[Step 2] click(...) [Requires Judgment: CAPTCHA Needs Recognition]"
}
After receiving `ai_prompt`, AI can process steps marked as "Requires Judgment" using Skills or custom logic. It is recommended to use `trajectory_list` to view saved trajectories and `trajectory_delete(task_id)` to avoid excessive recording. Browser-like tools can pass `include_screenshot: false` to reduce response volume. In case of failure, a structured error is returned with `code` and `retryable` for model retry. For key elements, pass `auto_save: true` to save fingerprints, and after refactoring, pass `adaptive: true` for automatic repositioning.
### 3. More Browser Actions
**Multi-Tabs and Iframes**:
```
→ browser_new_tab(url="https://second.example.com") # Returns tab_id
→ browser_list_tabs() # List all tabs
→ browser_switch_tab(tab_id=1)
→ browser_enter_iframe(selector="iframe#payment")
→ browser_click(selector="#confirm") # Within iframe
→ browser_exit_iframe()
→ browser_close_tab(tab_id=1)
```
**Files / Keyboard / Advanced Interactions**:
```
→ browser_upload(selector="input[type=file]", file_path="/tmp/a.pdf")
→ browser_download(selector="a.export") # Returns save path
→ browser_keyboard(key="Control+S")
→ browser_hover(selector=".menu")
→ browser_drag_drop(source="#item", target="#bin")
→ browser_scroll(direction="down", amount=800)
→ browser_evaluate(expression="document.title")
```
**Exploration Mode (Trial and Error without Polluting Trajectory)**:
```
→ trajectory_explore_start(reason="Try Different Entrances")
→ browser_click(...) # These steps won't enter the formal trajectory
→ browser_click(...)
→ trajectory_explore_stop(keep="none") # none / last / all
→ browser_click("#correct-entry") # Back to formal recording
```
**Script Generation and Playback**:
```
→ script_generate(task_id="login_demo") # Generate script from trajectory
→ script_run(task_id="login_demo", vars={"user": "..."}) # Deterministic playback
# Failure/require judgment will pause → session
→ script_resume(session_id="...", resolution={"type": "retry"})
# or skip / patch / abort
→ script_health(task_id="login_demo") # Continuous failure/success rate
→ script_deprecate(task_id="login_demo", reason="Refactored and Invalid")
→ script_restore(task_id="login_demo")
```
**Binding External job_id (OpenClaw Scheduled Tasks)**:
```
→ script_bind(binding_key="daily-report", script_task_id="report_v3",
default_vars={"date": "today"})
→ script_run(... ) # Indirectly trigger via binding_key
```
## Python API
Primary usage is through MCP tool calls. You can also directly use the underlying service in Python:
```python
from zerotoken import BrowserService, TrajectoryService, ScriptService
from zerotoken.repository.sqlite import open_sqlite_repos
# Open repositories split by responsibility (sharing a SQLite file)
repos = open_sqlite_repos("zerotoken.db")
browser = BrowserService()
await browser.init(headless=True)
trajectory = TrajectoryService(trajectory_repo=repos.trajectory)
trajectory.start("task_001", goal="Login System")
trajectory.bind(browser)
await browser.open("https://example.com/login")
await browser.click("#submit")
result = trajectory.complete(export_for_ai=True)
print(result["ai_prompt"])
await browser.close()
```
> Public exports: `BrowserService` / `TrajectoryService` / `ScriptService`, and all Pydantic models (`OperationRecord` / `Trajectory` / `Script` / `Resolution` etc.). Specific function signatures see `zerotoken/services/`.
## OperationRecord Structure
Each browser action returns a detailed OperationRecord:
```json
{
"step": 1,
"action": "click",
"params": {
"selector": "#submit-btn",
"timeout": 30000
},
"result": {
"success": true,
"navigated": true,
"new_url": "https://example.com/dashboard"
},
"page_state": {
"url": "https://example.com/dashboard",
"title": "Dashboard",
"timestamp": "2024-01-01T12:00:00"
},
"screenshot": "base64_encoded_image_data",
"fuzzy_point": {
"requires_judgment": true,
"reason": "CAPTCHA Needs Recognition",
"hint": "AI Vision"
},
"timestamp": "2024-01-01T12:00:00"
}
```
`fuzzy_point` is an optional field, existing only in steps requiring AI/human judgment. When exporting AI prompts, markers like `[Requires Judgment: {reason}]` are appended.
## Project Structure
```
zerotoken/
├── zerotoken/ # Library main body
│ ├── __init__.py # Public exports of services and Pydantic models
│ ├── models/ # Domain layer - Pydantic v2 models
│ │ ├── operation.py # OperationRecord, PageState, ActionType, ...
│ │ ├── trajectory.py # Trajectory, TrajectoryMetadata
│ │ ├── script.py # Script, ScriptStep, StepHint
│ │ └── session.py # PauseEvent, Resolution, RuntimeState
│ ├── services/ # Service layer - business orchestration
│ │ ├── browser_service.py
│ │ ├── trajectory_service.py # Including RecordingMode (exploration mode)
│ │ └── script_service.py # Including deprecate/restore/health
│ ├── repository/ # Repository layer - split by responsibility
│ │ ├── protocols.py # Protocol abstractions
│ │ ├── sqlite.py # SQLite implementation
│ │ └── migrations.py # Versioned migrations
│ ├── browser/ # Infrastructure - browser
│ │ ├── context.py # BrowserContextManager (multi-tabs)
│ │ ├── pipeline.py # ActionPipeline
│ │ ├── actions/ # navigate/interact/extract/page_mgmt/iframe/file_ops
│ │ ├── stability/ # middleware/selector/wait/recovery/adaptive
│ │ └── stealth.py
│ ├── engine/ # Infrastructure - script engine
│ │ ├── script_engine_v2.py # Executor
│ │ ├── flow_control.py # if/loop/assign
│ │ ├── data_flow.py # VarsEnvironment + secure expressions
│ │ └── script_generator.py # Trajectory to script
│ ├── optimizers/ # Infrastructure - token optimization
│ │ ├── dom_pruner.py
│ │ ├── screenshot_opt.py
│ │ └── state_summary.py
│ ├── benchmark/ # Performance benchmark suite
├── handlers/ # Handler layer - MCP tool registration
│ ├── browser_handlers.py # 26 browser_* tools
│ ├── trajectory_handlers.py # trajectory_* tools
│ └── script_handlers.py # script_* / run / resume / session / binding
├── mcp_server.py # Entry: MCP stdio
├── mcp_server_http.py # Entry: Streamable HTTP
├── benchmark_cli.py # Benchmark CLI
├── tests/ # unit + integration
└── zerotoken.db # SQLite database (generated at runtime)
```
## Usage Scenarios
1. **AI Agent Browser Automation** - OpenClaw, LLM Agent, etc.
2. **RPA Process Automation** - Repetitive web operation recording and playback
3. **Data Collection** - Scheduled web data scraping
4. **Automated Testing** - Record test steps and playback
**OpenClaw Matching Skill**: See [docs/skills.md](docs/skills.md) for using trajectories for scheduled/repeated tasks and reducing token consumption.
## Community
Join **ZT Agent Club** QQ group to discuss ZeroToken and AI Agent automation:

- Group ID: 942359087
- Scan QR code to join the group chat
## Contributing
Feel free to raise issues and PRs, see [CONTRIBUTING.md](CONTRIBUTING.md).
## License
MIT License, see [LICENSE](LICENSE).
---
**ZeroToken** - Record once, automate forever.
Connection Info
You Might Also Like
everything-claude-code
Complete Claude Code configuration collection - agents, skills, hooks,...
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
cc-switch
All-in-One Assistant for Claude Code, Codex & Gemini CLI across platforms.
servers
Model Context Protocol Servers
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.