Content

# ZeroToken  [![CI](https://github.com/AMOS144/zerotoken/actions/workflows/ci.yml/badge.svg)](https://github.com/AMOS144/zerotoken/actions/workflows/ci.yml) [![PyPI version](https://img.shields.io/pypi/v/zerotoken.svg)](https://pypi.org/project/zerotoken/) [![Python versions](https://img.shields.io/pypi/pyversions/zerotoken.svg)](https://pypi.org/project/zerotoken/) [![Downloads](https://static.pepy.tech/badge/zerotoken/month)](https://pepy.tech/project/zerotoken) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) **ZeroToken - Record once, automate forever.** > Lightweight MCP for AI agent browser automation. Record once, replay forever — cut token cost and speed up repetitive tasks. ## Tool List ## Using ZeroToken with OpenClaw ZeroToken is the browser execution layer for OpenClaw, suitable for **record-once-and-replay-forever** automation tasks (e.g., daily login, timing crawl). The following describes the complete integration process. ### Why is HTTP mode necessary? When OpenClaw calls MCP through MCPorter, if stdio (command mode) is used, **a new process is created for each tool call**, causing the browser instance to be destroyed and the state to be lost. Therefore, the **Streamable HTTP mode** must be used: ZeroToken resides as an HTTP service, and OpenClaw connects through the URL, keeping the browser state within the same session. ### Integration Steps (Complete Process) #### 1. Install ZeroToken ```bash # via pip or uv pip install zerotoken # or uv add zerotoken # Install Playwright browser (required) playwright install chromium ``` If installed via MCPorter to OpenClaw: ```bash npm install -g mcporter mcporter install zerotoken --target openclaw --configure ``` Also, execute `playwright install chromium` after installation. #### 2. Start HTTP service in the background and keep it resident Run in the terminal (do not close): ```bash zerotoken-mcp-http ``` Listens to `http://0.0.0.0:8000/mcp` by default. Specify the port: ```bash zerotoken-mcp-http --port 8001 # or ZEROTOKEN_HTTP_PORT=8001 zerotoken-mcp-http ``` #### 3. Configure openclaw.json In `~/.openclaw/openclaw.json` (or `openclaw.json` in the project), configure ZeroToken as a **URL** in `mcpServers`, not as a command: ```json { "mcpServers": { "zerotoken": { "url": "http://localhost:8000/mcp" } } } ``` If a non-default port is used, modify the port number in the URL. #### 4. Install zerotoken-openclaw Skill Place the Skill in one of OpenClaw's skills directories: or via ClawHub: `clawhub install zerotoken-openclaw` or copy from this repository: ```bash cp -r skills/zerotoken-openclaw ~/.openclaw/skills/ ``` #### 5. Enable in OpenClaw Enable the MCP server named `zerotoken` in OpenClaw and ensure the `zerotoken-openclaw` Skill is loaded. The Agent can then call browser tools through MCP. ### Typical Workflow 1. **Record trajectory**: The user describes a task (e.g., "log in to a station daily and pull a report"), and the Agent calls `browser_init` → `browser_open` / `browser_click` / `browser_input`, etc. → `trajectory_complete` to complete a recording. 2. **Generate script**: For repetitive/timing tasks, the Agent calls `trajectory_to_script(task_id)` to convert the trajectory into a replayable script. 3. **Bind timing task**: The Agent calls `script_binding_set(binding_key=job_id, script_task_id=task_id)` to bind OpenClaw's job_id to the script. 4. **Execute timing**: When OpenClaw triggers a timing task, the Agent calls `run_script_by_job_id(binding_key=job_id)` to execute in one step, without LLM step-by-step reasoning, and with low token consumption. See `skills/zerotoken-openclaw/SKILL.md` and `docs/skills.md` for more details. ### Frequently Asked Questions - **Browser state lost, each operation like the first time**: Indicates that the command mode is still being used. Ensure (1) `zerotoken-mcp-http` is running in the background; (2) `zerotoken` is configured as a `url` rather than a `command` in `openclaw.json`. - **Connection failed / MCP unavailable**: Confirm `zerotoken-mcp-http` is started and the port is correct (default 8000), and the URL matches the configuration. ## Core Philosophy ### Problem When AI Agents directly control browsers to execute repetitive tasks, they consume a large number of tokens for reasoning each time, which is costly and slow. ### ZeroToken Solution 1. **Operation execution**: AI performs step-by-step reasoning through ReAct mode, calling MCP atomic capabilities to complete browser operations. 2. **Trajectory recording**: The system records complete operation trajectories (including page state, screenshots, execution results, and fuzzy point markers). 3. **AI prompt export**: Trajectories can be exported in AI-friendly formats, including fuzzy point descriptions that need to be judged, for further analysis by Skills or other modules. ## Core Features - **Layered architecture** - Transport → Handler → Service → Domain → Repository/Infrastructure five layers, Pydantic v2 strong type model. - **Complete trajectory recording** - Each operation records steps, page state, screenshots, and structured OperationRecord. - **Multi-tab and iframe** - Built-in BrowserContextManager supports multi-tab switching, iframe entry/exit, independent fingerprint, and state. - **Rich browser atomic operations** - 26 browser_* tools: navigation, click/input/hover/keyboard/scroll/drag, file upload/download, JS evaluation, screenshot, etc. - **Script Engine** - Nested step trees + VarsEnvironment support if/loop/assign process control, variable passing, whitelist AST secure expressions, and loop protection. - **Step-as-Unit error handling** - Arbitrary step failure/need for judgment pauses (pause), AI decides retry/skip/patch/abort through resolution. - **Recording exploration mode** - `trajectory_explore_start/stop` isolates AI trial and error paths to avoid polluting trajectories. - **Token optimization** - DOM intelligent pruning, screenshot compression/cropping/quality reduction, page state summarization, multi-level token consumption reduction. - **SQLite storage** - scripts/trajectories/sessions/fingerprints/bindings/runtime are split into repositories by responsibility, with versioned migration. - **Script lifecycle management** - script_deprecate/restore/health, automatically track continuous failures, and calculate success rates. - **Task binding** - script_bind maps external job_id to scripts, facilitating OpenClaw and other timing task schedules. - **Stability enhancement** - SmartSelector (multiple alternatives + unstable mode filtering), SmartWait (multi-wait condition cascading), ErrorRecovery (exponential backoff + selector variation + iframe search). - **Adaptive element positioning** - Save element fingerprint upon first hit (auto_save), reposition based on similarity when selector fails after revision (adaptive), no need to change code. - **Anti-crawling/cloud shield response** - `browser_init(stealth=true)` enables stealthy startup and fingerprint disguise, reducing the probability of being identified as an automated browser. - **MCP protocol** - stdio + Streamable HTTP dual transport, handlers/ modular registration. ## Stability Enhancement ### Unstable Factor Analysis ``` Selector failure (60%) Dynamic ID, class name changes, DOM structure changes Timing issues (25%) Element not loaded, network requests, animation not executed Environment changes (10%) Viewport changes, user state, Cookie influence Other factors (5%) Pop-up interference, resource loading failure ``` ### Solution **1. SmartSelector - Intelligent selector generation** - Automatically generate multiple alternative selectors - Priority: data-testid > id > aria > CSS > XPath - Detect and filter unstable class names (e.g., `el-*`, `ant-*`, `Mui-*`) **2. SmartWait - Intelligent waiting strategy** - Multiple waiting conditions: selector, visible, networkidle, text, function - Cascading waiting support - Page stability detection **3. ErrorRecovery - Error recovery mechanism** - Automatically detect error types - Selector variation attempts - Exponential backoff retries - Iframe element search ## System Architecture Five-layer architecture, dependency direction **unidirectional downward**: Handler → Service → Domain / Repository / Infrastructure, never in reverse. ``` ┌─────────────────────────────────────────────────────────────────┐ │ Transport MCP stdio / Streamable HTTP │ ├─────────────────────────────────────────────────────────────────┤ │ Handler handlers/{browser,trajectory,script}_handlers │ │ (tool registration + parameter validation + scheduling) │ ├─────────────────────────────────────────────────────────────────┤ │ Service BrowserService TrajectoryService ScriptService│ │ (business orchestration, no framework dependency) │ ├─────────────────────────────────────────────────────────────────┤ │ Domain Pydantic models (OperationRecord, Trajectory, │ │ Script, Session, Resolution, ...) │ ├──────────────────────────────────┬──────────────────────────────┤ │ Repository │ Infrastructure │ │ Protocol abstraction + SQLite implementation │ browser/ ActionPipeline │ │ ScriptRepo / TrajectoryRepo / │ + actions/ │ │ SessionRepo / RuntimeRepo / │ + stability/ │ │ FingerprintRepo / BindingRepo │ engine/ ScriptEngine │ │ │ + flow_control │ │ │ + data_flow │ │ │ optimizers/ DOM/Screenshot/ │ │ │ StateSummary │ └──────────────────────────────────┴──────────────────────────────┘ ``` ### Mermaid Architecture Diagram ```mermaid graph TB subgraph "Client" A[AI Agent / OpenClaw / Cursor] end subgraph "Transport" T1[MCP stdio] T2[Streamable HTTP] end subgraph "Handler" H1[browser_handlers] H2[trajectory_handlers] H3[script_handlers] end subgraph "Service" S1[BrowserService] S2[TrajectoryService] S3[ScriptService] end subgraph "Infrastructure" I1[ActionPipeline + actions/] I2[Stability: Selector/Wait/Recovery/Adaptive] I3[ScriptEngine: flow_control + data_flow] I4[Optimizers: DOM / Screenshot / StateSummary] end subgraph "Repository" R1[ScriptRepo] R2[TrajectoryRepo] R3[SessionRepo] R4[RuntimeRepo] R5[FingerprintRepo] R6[BindingRepo] end subgraph "Storage / Browser" DB[(SQLite zerotoken.db)] PW[Playwright / Chromium] end A --> T1 A --> T2 T1 --> H1 & H2 & H3 T2 --> H1 & H2 & H3 H1 --> S1 H2 --> S2 H3 --> S3 S1 --> I1 --> I2 S2 --> R2 S3 --> I3 --> I4 S3 --> R1 & R3 & R4 & R6 S1 --> R5 I1 --> PW R1 & R2 & R3 & R4 & R5 & R6 --> DB ``` ## Installation **OpenClaw users**: Complete steps see above "[Using ZeroToken with OpenClaw](#using-zerotoken-with-openclaw)". **Cursor and other IDEs**: Install and use stdio mode, configure `command: "zerotoken-mcp"` or `command: "uv", args: ["run", "zerotoken-mcp"]` on the client. ### Local development / pip installation ```bash # Clone the project git clone https://github.com/AMOS144/zerotoken.git cd zerotoken # Install dependencies uv sync # or pip install pip install zerotoken # Install Playwright browser playwright install chromium ``` ## Quick Start ### 1. Start MCP Server | Scenario | Command | Description | |------|------|------| | **OpenClaw** | `zerotoken-mcp-http` | Resident in the background, `openclaw.json` configured `url: "http://localhost:8000/mcp"`. See "[Using ZeroToken with OpenClaw](#using-zerotoken-with-openclaw)". | | **Cursor and other IDEs** | `zerotoken-mcp` or launched by the client | stdio mode, configure `command: "zerotoken-mcp"`. | ```bash # OpenClaw: HTTP mode (background resident) zerotoken-mcp-http # Cursor: stdio mode zerotoken-mcp ``` ### 2. AI Agent calls browser tools through MCP Example process: ``` # Initialize browser (if anti-crawling/cloud shield, pass stealth=true) → browser_init(headless=true) ← {"success": true, "config": {...}} # Start trajectory recording → trajectory_start(task_id="login_task", goal="login system") ← {"success": true, "task_id": "login_task"} # Execute Browser Actions (Automatically Recorded to Trajectory) → browser_open(url="https://example.com/login") ← { "step": 1, "action": "open", "params": {"url": "https://example.com/login"}, "result": {"success": true, "title": "Login"}, "page_state": {"url": "...", "title": "..."}, "screenshot": "base64..." } → browser_input(selector="#username", text="testuser") → browser_input(selector="#password", text="secret123") → browser_click(selector="#submit-btn") # Complete Trajectory and Get AI Prompt (with Fuzzy Point Marking) → trajectory_complete(export_for_ai=true) ← { "success": true, "ai_prompt": "Task Goal: Login System\n\nOperation History:\n[Step 1] open(...)\n[Step 2] click(...) [Requires Judgment: CAPTCHA Needs Recognition]" } After receiving `ai_prompt`, AI can process steps marked as "Requires Judgment" using Skills or custom logic. It is recommended to use `trajectory_list` to view saved trajectories and `trajectory_delete(task_id)` to avoid excessive recording. Browser-like tools can pass `include_screenshot: false` to reduce response volume. In case of failure, a structured error is returned with `code` and `retryable` for model retry. For key elements, pass `auto_save: true` to save fingerprints, and after refactoring, pass `adaptive: true` for automatic repositioning. ### 3. More Browser Actions **Multi-Tabs and Iframes**: ``` → browser_new_tab(url="https://second.example.com") # Returns tab_id → browser_list_tabs() # List all tabs → browser_switch_tab(tab_id=1) → browser_enter_iframe(selector="iframe#payment") → browser_click(selector="#confirm") # Within iframe → browser_exit_iframe() → browser_close_tab(tab_id=1) ``` **Files / Keyboard / Advanced Interactions**: ``` → browser_upload(selector="input[type=file]", file_path="/tmp/a.pdf") → browser_download(selector="a.export") # Returns save path → browser_keyboard(key="Control+S") → browser_hover(selector=".menu") → browser_drag_drop(source="#item", target="#bin") → browser_scroll(direction="down", amount=800) → browser_evaluate(expression="document.title") ``` **Exploration Mode (Trial and Error without Polluting Trajectory)**: ``` → trajectory_explore_start(reason="Try Different Entrances") → browser_click(...) # These steps won't enter the formal trajectory → browser_click(...) → trajectory_explore_stop(keep="none") # none / last / all → browser_click("#correct-entry") # Back to formal recording ``` **Script Generation and Playback**: ``` → script_generate(task_id="login_demo") # Generate script from trajectory → script_run(task_id="login_demo", vars={"user": "..."}) # Deterministic playback # Failure/require judgment will pause → session → script_resume(session_id="...", resolution={"type": "retry"}) # or skip / patch / abort → script_health(task_id="login_demo") # Continuous failure/success rate → script_deprecate(task_id="login_demo", reason="Refactored and Invalid") → script_restore(task_id="login_demo") ``` **Binding External job_id (OpenClaw Scheduled Tasks)**: ``` → script_bind(binding_key="daily-report", script_task_id="report_v3", default_vars={"date": "today"}) → script_run(... ) # Indirectly trigger via binding_key ``` ## Python API Primary usage is through MCP tool calls. You can also directly use the underlying service in Python: ```python from zerotoken import BrowserService, TrajectoryService, ScriptService from zerotoken.repository.sqlite import open_sqlite_repos # Open repositories split by responsibility (sharing a SQLite file) repos = open_sqlite_repos("zerotoken.db") browser = BrowserService() await browser.init(headless=True) trajectory = TrajectoryService(trajectory_repo=repos.trajectory) trajectory.start("task_001", goal="Login System") trajectory.bind(browser) await browser.open("https://example.com/login") await browser.click("#submit") result = trajectory.complete(export_for_ai=True) print(result["ai_prompt"]) await browser.close() ``` > Public exports: `BrowserService` / `TrajectoryService` / `ScriptService`, and all Pydantic models (`OperationRecord` / `Trajectory` / `Script` / `Resolution` etc.). Specific function signatures see `zerotoken/services/`. ## OperationRecord Structure Each browser action returns a detailed OperationRecord: ```json { "step": 1, "action": "click", "params": { "selector": "#submit-btn", "timeout": 30000 }, "result": { "success": true, "navigated": true, "new_url": "https://example.com/dashboard" }, "page_state": { "url": "https://example.com/dashboard", "title": "Dashboard", "timestamp": "2024-01-01T12:00:00" }, "screenshot": "base64_encoded_image_data", "fuzzy_point": { "requires_judgment": true, "reason": "CAPTCHA Needs Recognition", "hint": "AI Vision" }, "timestamp": "2024-01-01T12:00:00" } ``` `fuzzy_point` is an optional field, existing only in steps requiring AI/human judgment. When exporting AI prompts, markers like `[Requires Judgment: {reason}]` are appended. ## Project Structure ``` zerotoken/ ├── zerotoken/ # Library main body │ ├── __init__.py # Public exports of services and Pydantic models │ ├── models/ # Domain layer - Pydantic v2 models │ │ ├── operation.py # OperationRecord, PageState, ActionType, ... │ │ ├── trajectory.py # Trajectory, TrajectoryMetadata │ │ ├── script.py # Script, ScriptStep, StepHint │ │ └── session.py # PauseEvent, Resolution, RuntimeState │ ├── services/ # Service layer - business orchestration │ │ ├── browser_service.py │ │ ├── trajectory_service.py # Including RecordingMode (exploration mode) │ │ └── script_service.py # Including deprecate/restore/health │ ├── repository/ # Repository layer - split by responsibility │ │ ├── protocols.py # Protocol abstractions │ │ ├── sqlite.py # SQLite implementation │ │ └── migrations.py # Versioned migrations │ ├── browser/ # Infrastructure - browser │ │ ├── context.py # BrowserContextManager (multi-tabs) │ │ ├── pipeline.py # ActionPipeline │ │ ├── actions/ # navigate/interact/extract/page_mgmt/iframe/file_ops │ │ ├── stability/ # middleware/selector/wait/recovery/adaptive │ │ └── stealth.py │ ├── engine/ # Infrastructure - script engine │ │ ├── script_engine_v2.py # Executor │ │ ├── flow_control.py # if/loop/assign │ │ ├── data_flow.py # VarsEnvironment + secure expressions │ │ └── script_generator.py # Trajectory to script │ ├── optimizers/ # Infrastructure - token optimization │ │ ├── dom_pruner.py │ │ ├── screenshot_opt.py │ │ └── state_summary.py │ ├── benchmark/ # Performance benchmark suite ├── handlers/ # Handler layer - MCP tool registration │ ├── browser_handlers.py # 26 browser_* tools │ ├── trajectory_handlers.py # trajectory_* tools │ └── script_handlers.py # script_* / run / resume / session / binding ├── mcp_server.py # Entry: MCP stdio ├── mcp_server_http.py # Entry: Streamable HTTP ├── benchmark_cli.py # Benchmark CLI ├── tests/ # unit + integration └── zerotoken.db # SQLite database (generated at runtime) ``` ## Usage Scenarios 1. **AI Agent Browser Automation** - OpenClaw, LLM Agent, etc. 2. **RPA Process Automation** - Repetitive web operation recording and playback 3. **Data Collection** - Scheduled web data scraping 4. **Automated Testing** - Record test steps and playback **OpenClaw Matching Skill**: See [docs/skills.md](docs/skills.md) for using trajectories for scheduled/repeated tasks and reducing token consumption. ## Community Join **ZT Agent Club** QQ group to discuss ZeroToken and AI Agent automation: ![ZT Agent Club QQ Group QR Code](assets/qq-group-qr.png) - Group ID: 942359087 - Scan QR code to join the group chat ## Contributing Feel free to raise issues and PRs, see [CONTRIBUTING.md](CONTRIBUTING.md). ## License MIT License, see [LICENSE](LICENSE). --- **ZeroToken** - Record once, automate forever.

ZeroToken

Content

Connection Info

You Might Also Like

everything-claude-code

markitdown

cc-switch

servers

servers

Time

ZeroToken

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

everything-claude-code

markitdown

cc-switch

servers

servers

Time