Predicate Snapshot for OpenClaw: Cutting Browser Tokens by 90%

The Problem: Accessibility Trees Explode on Real Sites

OpenClaw agents use accessibility trees (A11y) to observe web pages. This works well for simple sites. But on real-world pages—especially ad-heavy sites—the token count explodes.

We ran measurements on several sites using OpenClaw's default A11y tree:

Site	Elements	Tokens
slickdeals.net	24,567	~598K
news.ycombinator.com	681	~16K
httpbin.org/html	34	~1.5K
example.com	12	~305

600K tokens just to observe slickdeals.net. At GPT-4 pricing ($0.03/1K tokens), that's $18 per page view. For an agent making 10 observations per task, you're looking at $180 per task—before the agent even does anything.

Where Do All Those Elements Come From?

Of those 24,567 elements on slickdeals.net, the vast majority are:

Ad iframes and tracking pixels — invisible but present in the DOM
Hidden overlays and modals — cookie banners, newsletter popups, etc.
Decorative wrappers — <div>, <span> containers with no semantic meaning
Non-interactive text nodes — paragraphs, spans, formatting elements
Duplicate/redundant elements — multiple references to the same UI

For a task like "find the best laptop deal," the agent only needs maybe 20-30 actionable elements: the search box, category filters, deal cards, and pagination controls.

The Solution: ML-Powered Element Ranking

We built Predicate Snapshot, an OpenClaw skill that uses ML to rank DOM elements by actionability. Instead of sending everything to the LLM, it returns only the top N most relevant elements (default: 50).

How It Works

DOM Extraction — The skill captures the full DOM tree via Chrome DevTools Protocol
ML Ranking — Each element gets scored on actionability, visibility, semantic importance, and position
Smart Filtering — Top-ranked elements are selected, preserving all interactive controls
Compact Output — Results returned in pipe-delimited format optimized for LLM consumption

The ranking model considers:

Element Scoring Factors

Interactive State — Is the element actually clickable? Disabled buttons get lower scores.
Visual Prominence — Primary CTAs (large, centered, high-contrast) score higher than footer links.
Semantic Role — Form inputs, buttons, and links outrank decorative containers.
Document Position — Elements in the main content area beat header/footer noise.
ARIA Labels — Elements with accessibility labels indicate developer-marked importance.

Real Results

After applying Predicate Snapshot to the same sites:

Site	A11y Tree	Predicate	Savings
slickdeals.net	598K tokens	1,283 tokens	99.8%
news.ycombinator.com	16K tokens	587 tokens	96%
httpbin.org/html	1.5K tokens	164 tokens	90%
example.com	305 tokens	164 tokens	46%

The 99.8% reduction on slickdeals.net is not a typo. From 598K tokens down to 1.3K tokens. Same page, same actionable elements—just without the noise.

"But Aren't You Losing Information?"

This is the natural objection. If you're filtering 24,567 elements down to 50, you must be throwing away something important, right?

No. Here's why:

1. Most Elements Are Noise

The accessibility tree includes everything—tracking pixels, ad containers, invisible divs, cookie consent overlays. None of this helps the agent accomplish its task.

2. LLMs Need Actionable Elements

For browser automation, an agent needs to:

Click buttons and links
Fill form fields
Read key content for decision-making

Predicate's ML ranking identifies exactly these elements while filtering noise. The top 50 elements contain all the interactive controls plus enough contextual text for the LLM to reason.

3. More Tokens = Worse Performance

Sending 600K tokens to an LLM causes:

Higher latency — 10-15 seconds just to process the observation
Higher cost — $11K+/month vs $5/month at scale
Context overflow — Complex pages can exceed context window limits
More hallucinations — Irrelevant context increases error rates

Quality Over Quantity

The goal isn't to preserve all elements—it's to preserve the right elements. Predicate Snapshot gives the agent exactly what it needs to act, nothing more.

Installation & Setup

Step 1: Install the Skill

1# Via ClawHub (recommended)
2npx clawdhub@latest install predicate-snapshot
3
4# Or from source
5git clone https://github.com/PredicateSystems/openclaw-predicate-skill ~/.openclaw/skills/predicate-snapshot
6cd ~/.openclaw/skills/predicate-snapshot
7npm install && npm run build

Step 2: Configure API Key (Optional)

The skill works in two modes:

With API key: ML-powered ranking (~95-99% token reduction)
Without API key: Local heuristic pruning (~80% reduction)—completely free

To enable ML ranking:

1# Add to your shell profile (~/.bashrc, ~/.zshrc, etc.)
2export PREDICATE_API_KEY="sk-your-key-here"

Get a free API key at PredicateSystems.ai—includes 500 free snapshots/month.

Step 3: Use the Skill

1# In OpenClaw:
2/predicate-snapshot              # Get top 50 ranked elements
3/predicate-act click 42          # Click element by ID
4/predicate-snapshot-local        # Free local mode (no API)

Output Format

Predicate Snapshot returns elements in a compact pipe-delimited format optimized for LLM consumption:

1# Predicate Snapshot
2# URL: https://example.com/login
3# Elements: showing top 50
4# Format: ID|role|text|imp|is_primary|docYq|ord|DG|href
5
642|button|Sign In|0.98|true|520|1|auth-form|
715|textbox|Username|0.95|true|480|1|auth-form|
823|textbox|Password|0.92|true|500|2|auth-form|
98|link|Forgot Password?|0.75|false|540|0|auth-form|/forgot

Each field:

ID — Stable element identifier for actions
role — Semantic role (button, textbox, link, etc.)
text — Visible text content
imp — ML importance score (0-1)
is_primary — Whether element is a primary CTA
docYq — Document Y position (for layout reasoning)
ord — Ordinal within dominant group ("3rd item in list")
DG — Dominant group name (for grouping related elements)
href — Link URL if applicable

Using with Autonomous Agents

OpenClaw agents work autonomously—they don't wait for manual slash commands. Here's how to integrate Predicate Snapshot into autonomous workflows.

Option 1: Include in Task Instructions (Recommended)

Add snapshot instructions directly in your task prompt:

1Navigate to amazon.com and find the cheapest laptop under $500.
2
3IMPORTANT: For page observation, use /predicate-snapshot instead of the
4default accessibility tree. Use /predicate-act to interact with elements
5by their ID from the snapshot.

Option 2: Modify Agent System Prompt

For consistent usage across all tasks, add to your agent's system prompt:

1## Browser Observation
2When observing web pages, always use /predicate-snapshot instead of the
3default accessibility tree. This provides ML-ranked elements optimized
4for efficient decision-making (~500 tokens vs ~18,000 tokens).
5
6To interact with page elements:
71. Call /predicate-snapshot to get ranked elements with IDs
82. Call /predicate-act <action> <element_id> to perform actions

Why This Matters Beyond Cost

1. Faster Inference

600K tokens vs 1.3K tokens isn't just about cost—it's about speed. Processing 600K tokens takes 10-15 seconds. Processing 1.3K tokens takes under a second. For multi-step tasks, this compounds dramatically.

2. Better Accuracy

Less noise means fewer hallucinations. When the LLM only sees relevant elements, it makes better decisions. We've observed significant accuracy improvements on complex navigation tasks.

3. Context Headroom

Multi-step browser tasks need room for conversation history. If each observation consumes 600K tokens, you hit context limits fast. With Predicate Snapshot, observations stay small, leaving room for history and reasoning.

4. Local LLMs Become Viable

We successfully ran complex browser automation tasks using a 3B parameter local model with Predicate Snapshots. This work hit the Hacker News front page. Small models can work when you give them clean, structured input.

Technical Architecture

Under the hood, Predicate Snapshot uses:

Chrome DevTools Protocol (CDP) — Direct browser access for DOM extraction
Playwright Adapter — Wraps Playwright pages for CDP session management
Predicate Runtime SDK — ML ranking engine with cloud or local execution
MCP Tool Interface — Standard OpenClaw skill protocol

The skill integrates with OpenClaw's browser session, requiring no changes to existing agent code beyond adding the skill commands.

Try It Yourself

Run the included demo to see the token comparison in action:

1cd ~/.openclaw/skills/predicate-snapshot
2
3# Run token comparison demo
4npm run demo
5
6# Or test in Docker
7./docker-test.sh skill

The demo runs against multiple sites and shows side-by-side token counts.

Get Started with Predicate Snapshot

Install the skill and start saving tokens on your OpenClaw browser agents today.

View on ClawHub

Further Reading: