Predicate Snapshot for OpenClaw: Cutting Browser Tokens by 90%
How ML-powered DOM pruning reduces OpenClaw agent token costs from 600K to 1.3K per page observation—without losing actionable elements.
The Problem: Accessibility Trees Explode on Real Sites
OpenClaw agents use accessibility trees (A11y) to observe web pages. This works well for simple sites. But on real-world pages—especially ad-heavy sites—the token count explodes.
We ran measurements on several sites using OpenClaw's default A11y tree:
| Site | Elements | Tokens |
|---|---|---|
| slickdeals.net | 24,567 | ~598K |
| news.ycombinator.com | 681 | ~16K |
| httpbin.org/html | 34 | ~1.5K |
| example.com | 12 | ~305 |
600K tokens just to observe slickdeals.net. At GPT-4 pricing ($0.03/1K tokens), that's $18 per page view. For an agent making 10 observations per task, you're looking at $180 per task—before the agent even does anything.
Where Do All Those Elements Come From?
Of those 24,567 elements on slickdeals.net, the vast majority are:
- Ad iframes and tracking pixels — invisible but present in the DOM
- Hidden overlays and modals — cookie banners, newsletter popups, etc.
- Decorative wrappers —
<div>,<span>containers with no semantic meaning - Non-interactive text nodes — paragraphs, spans, formatting elements
- Duplicate/redundant elements — multiple references to the same UI
For a task like "find the best laptop deal," the agent only needs maybe 20-30 actionable elements: the search box, category filters, deal cards, and pagination controls.
The Solution: ML-Powered Element Ranking
We built Predicate Snapshot, an OpenClaw skill that uses ML to rank DOM elements by actionability. Instead of sending everything to the LLM, it returns only the top N most relevant elements (default: 50).
How It Works
- DOM Extraction — The skill captures the full DOM tree via Chrome DevTools Protocol
- ML Ranking — Each element gets scored on actionability, visibility, semantic importance, and position
- Smart Filtering — Top-ranked elements are selected, preserving all interactive controls
- Compact Output — Results returned in pipe-delimited format optimized for LLM consumption
The ranking model considers:
Element Scoring Factors
Interactive State — Is the element actually clickable? Disabled buttons get lower scores.
Visual Prominence — Primary CTAs (large, centered, high-contrast) score higher than footer links.
Semantic Role — Form inputs, buttons, and links outrank decorative containers.
Document Position — Elements in the main content area beat header/footer noise.
ARIA Labels — Elements with accessibility labels indicate developer-marked importance.
Real Results
After applying Predicate Snapshot to the same sites:
| Site | A11y Tree | Predicate | Savings |
|---|---|---|---|
| slickdeals.net | 598K tokens | 1,283 tokens | 99.8% |
| news.ycombinator.com | 16K tokens | 587 tokens | 96% |
| httpbin.org/html | 1.5K tokens | 164 tokens | 90% |
| example.com | 305 tokens | 164 tokens | 46% |
The 99.8% reduction on slickdeals.net is not a typo. From 598K tokens down to 1.3K tokens. Same page, same actionable elements—just without the noise.
"But Aren't You Losing Information?"
This is the natural objection. If you're filtering 24,567 elements down to 50, you must be throwing away something important, right?
No. Here's why:
1. Most Elements Are Noise
The accessibility tree includes everything—tracking pixels, ad containers, invisible divs, cookie consent overlays. None of this helps the agent accomplish its task.
2. LLMs Need Actionable Elements
For browser automation, an agent needs to:
- Click buttons and links
- Fill form fields
- Read key content for decision-making
Predicate's ML ranking identifies exactly these elements while filtering noise. The top 50 elements contain all the interactive controls plus enough contextual text for the LLM to reason.
3. More Tokens = Worse Performance
Sending 600K tokens to an LLM causes:
- Higher latency — 10-15 seconds just to process the observation
- Higher cost — $11K+/month vs $5/month at scale
- Context overflow — Complex pages can exceed context window limits
- More hallucinations — Irrelevant context increases error rates
Quality Over Quantity
The goal isn't to preserve all elements—it's to preserve the right elements. Predicate Snapshot gives the agent exactly what it needs to act, nothing more.
Installation & Setup
Step 1: Install the Skill
1# Via ClawHub (recommended)
2npx clawdhub@latest install predicate-snapshot
3
4# Or from source
5git clone https://github.com/PredicateSystems/openclaw-predicate-skill ~/.openclaw/skills/predicate-snapshot
6cd ~/.openclaw/skills/predicate-snapshot
7npm install && npm run buildStep 2: Configure API Key (Optional)
The skill works in two modes:
- With API key: ML-powered ranking (~95-99% token reduction)
- Without API key: Local heuristic pruning (~80% reduction)—completely free
To enable ML ranking:
1# Add to your shell profile (~/.bashrc, ~/.zshrc, etc.)
2export PREDICATE_API_KEY="sk-your-key-here"Get a free API key at PredicateSystems.ai—includes 500 free snapshots/month.
Step 3: Use the Skill
1# In OpenClaw:
2/predicate-snapshot # Get top 50 ranked elements
3/predicate-act click 42 # Click element by ID
4/predicate-snapshot-local # Free local mode (no API)Output Format
Predicate Snapshot returns elements in a compact pipe-delimited format optimized for LLM consumption:
1# Predicate Snapshot
2# URL: https://example.com/login
3# Elements: showing top 50
4# Format: ID|role|text|imp|is_primary|docYq|ord|DG|href
5
642|button|Sign In|0.98|true|520|1|auth-form|
715|textbox|Username|0.95|true|480|1|auth-form|
823|textbox|Password|0.92|true|500|2|auth-form|
98|link|Forgot Password?|0.75|false|540|0|auth-form|/forgotEach field:
- ID — Stable element identifier for actions
- role — Semantic role (button, textbox, link, etc.)
- text — Visible text content
- imp — ML importance score (0-1)
- is_primary — Whether element is a primary CTA
- docYq — Document Y position (for layout reasoning)
- ord — Ordinal within dominant group ("3rd item in list")
- DG — Dominant group name (for grouping related elements)
- href — Link URL if applicable
Using with Autonomous Agents
OpenClaw agents work autonomously—they don't wait for manual slash commands. Here's how to integrate Predicate Snapshot into autonomous workflows.
Option 1: Include in Task Instructions (Recommended)
Add snapshot instructions directly in your task prompt:
1Navigate to amazon.com and find the cheapest laptop under $500.
2
3IMPORTANT: For page observation, use /predicate-snapshot instead of the
4default accessibility tree. Use /predicate-act to interact with elements
5by their ID from the snapshot.Option 2: Modify Agent System Prompt
For consistent usage across all tasks, add to your agent's system prompt:
1## Browser Observation
2When observing web pages, always use /predicate-snapshot instead of the
3default accessibility tree. This provides ML-ranked elements optimized
4for efficient decision-making (~500 tokens vs ~18,000 tokens).
5
6To interact with page elements:
71. Call /predicate-snapshot to get ranked elements with IDs
82. Call /predicate-act <action> <element_id> to perform actionsWhy This Matters Beyond Cost
1. Faster Inference
600K tokens vs 1.3K tokens isn't just about cost—it's about speed. Processing 600K tokens takes 10-15 seconds. Processing 1.3K tokens takes under a second. For multi-step tasks, this compounds dramatically.
2. Better Accuracy
Less noise means fewer hallucinations. When the LLM only sees relevant elements, it makes better decisions. We've observed significant accuracy improvements on complex navigation tasks.
3. Context Headroom
Multi-step browser tasks need room for conversation history. If each observation consumes 600K tokens, you hit context limits fast. With Predicate Snapshot, observations stay small, leaving room for history and reasoning.
4. Local LLMs Become Viable
We successfully ran complex browser automation tasks using a 3B parameter local model with Predicate Snapshots. This work hit the Hacker News front page. Small models can work when you give them clean, structured input.
Technical Architecture
Under the hood, Predicate Snapshot uses:
- Chrome DevTools Protocol (CDP) — Direct browser access for DOM extraction
- Playwright Adapter — Wraps Playwright pages for CDP session management
- Predicate Runtime SDK — ML ranking engine with cloud or local execution
- MCP Tool Interface — Standard OpenClaw skill protocol
The skill integrates with OpenClaw's browser session, requiring no changes to existing agent code beyond adding the skill commands.
Try It Yourself
Run the included demo to see the token comparison in action:
1cd ~/.openclaw/skills/predicate-snapshot
2
3# Run token comparison demo
4npm run demo
5
6# Or test in Docker
7./docker-test.sh skillThe demo runs against multiple sites and shows side-by-side token counts.
Get Started with Predicate Snapshot
Install the skill and start saving tokens on your OpenClaw browser agents today.
View on ClawHub