Back to Blog
Engineering
February 24, 202610 min read

Predicate Snapshot for OpenClaw: Cutting Browser Tokens by 90%

How ML-powered DOM pruning reduces OpenClaw agent token costs from 600K to 1.3K per page observation—without losing actionable elements.

The Problem: Accessibility Trees Explode on Real Sites

OpenClaw agents use accessibility trees (A11y) to observe web pages. This works well for simple sites. But on real-world pages—especially ad-heavy sites—the token count explodes.

We ran measurements on several sites using OpenClaw's default A11y tree:

SiteElementsTokens
slickdeals.net24,567~598K
news.ycombinator.com681~16K
httpbin.org/html34~1.5K
example.com12~305

600K tokens just to observe slickdeals.net. At GPT-4 pricing ($0.03/1K tokens), that's $18 per page view. For an agent making 10 observations per task, you're looking at $180 per task—before the agent even does anything.

Where Do All Those Elements Come From?

Of those 24,567 elements on slickdeals.net, the vast majority are:

  • Ad iframes and tracking pixels — invisible but present in the DOM
  • Hidden overlays and modals — cookie banners, newsletter popups, etc.
  • Decorative wrappers<div>, <span> containers with no semantic meaning
  • Non-interactive text nodes — paragraphs, spans, formatting elements
  • Duplicate/redundant elements — multiple references to the same UI

For a task like "find the best laptop deal," the agent only needs maybe 20-30 actionable elements: the search box, category filters, deal cards, and pagination controls.

The Solution: ML-Powered Element Ranking

We built Predicate Snapshot, an OpenClaw skill that uses ML to rank DOM elements by actionability. Instead of sending everything to the LLM, it returns only the top N most relevant elements (default: 50).

How It Works

  1. DOM Extraction — The skill captures the full DOM tree via Chrome DevTools Protocol
  2. ML Ranking — Each element gets scored on actionability, visibility, semantic importance, and position
  3. Smart Filtering — Top-ranked elements are selected, preserving all interactive controls
  4. Compact Output — Results returned in pipe-delimited format optimized for LLM consumption

The ranking model considers:

Element Scoring Factors

  1. Interactive State — Is the element actually clickable? Disabled buttons get lower scores.

  2. Visual Prominence — Primary CTAs (large, centered, high-contrast) score higher than footer links.

  3. Semantic Role — Form inputs, buttons, and links outrank decorative containers.

  4. Document Position — Elements in the main content area beat header/footer noise.

  5. ARIA Labels — Elements with accessibility labels indicate developer-marked importance.

Real Results

After applying Predicate Snapshot to the same sites:

SiteA11y TreePredicateSavings
slickdeals.net598K tokens1,283 tokens99.8%
news.ycombinator.com16K tokens587 tokens96%
httpbin.org/html1.5K tokens164 tokens90%
example.com305 tokens164 tokens46%

The 99.8% reduction on slickdeals.net is not a typo. From 598K tokens down to 1.3K tokens. Same page, same actionable elements—just without the noise.

"But Aren't You Losing Information?"

This is the natural objection. If you're filtering 24,567 elements down to 50, you must be throwing away something important, right?

No. Here's why:

1. Most Elements Are Noise

The accessibility tree includes everything—tracking pixels, ad containers, invisible divs, cookie consent overlays. None of this helps the agent accomplish its task.

2. LLMs Need Actionable Elements

For browser automation, an agent needs to:

  • Click buttons and links
  • Fill form fields
  • Read key content for decision-making

Predicate's ML ranking identifies exactly these elements while filtering noise. The top 50 elements contain all the interactive controls plus enough contextual text for the LLM to reason.

3. More Tokens = Worse Performance

Sending 600K tokens to an LLM causes:

  • Higher latency — 10-15 seconds just to process the observation
  • Higher cost — $11K+/month vs $5/month at scale
  • Context overflow — Complex pages can exceed context window limits
  • More hallucinations — Irrelevant context increases error rates

Quality Over Quantity

The goal isn't to preserve all elements—it's to preserve the right elements. Predicate Snapshot gives the agent exactly what it needs to act, nothing more.

Installation & Setup

Step 1: Install the Skill

1# Via ClawHub (recommended)
2npx clawdhub@latest install predicate-snapshot
3
4# Or from source
5git clone https://github.com/PredicateSystems/openclaw-predicate-skill ~/.openclaw/skills/predicate-snapshot
6cd ~/.openclaw/skills/predicate-snapshot
7npm install && npm run build

Step 2: Configure API Key (Optional)

The skill works in two modes:

  • With API key: ML-powered ranking (~95-99% token reduction)
  • Without API key: Local heuristic pruning (~80% reduction)—completely free

To enable ML ranking:

1# Add to your shell profile (~/.bashrc, ~/.zshrc, etc.)
2export PREDICATE_API_KEY="sk-your-key-here"

Get a free API key at PredicateSystems.ai—includes 500 free snapshots/month.

Step 3: Use the Skill

1# In OpenClaw:
2/predicate-snapshot              # Get top 50 ranked elements
3/predicate-act click 42          # Click element by ID
4/predicate-snapshot-local        # Free local mode (no API)

Output Format

Predicate Snapshot returns elements in a compact pipe-delimited format optimized for LLM consumption:

1# Predicate Snapshot
2# URL: https://example.com/login
3# Elements: showing top 50
4# Format: ID|role|text|imp|is_primary|docYq|ord|DG|href
5
642|button|Sign In|0.98|true|520|1|auth-form|
715|textbox|Username|0.95|true|480|1|auth-form|
823|textbox|Password|0.92|true|500|2|auth-form|
98|link|Forgot Password?|0.75|false|540|0|auth-form|/forgot

Each field:

  • ID — Stable element identifier for actions
  • role — Semantic role (button, textbox, link, etc.)
  • text — Visible text content
  • imp — ML importance score (0-1)
  • is_primary — Whether element is a primary CTA
  • docYq — Document Y position (for layout reasoning)
  • ord — Ordinal within dominant group ("3rd item in list")
  • DG — Dominant group name (for grouping related elements)
  • href — Link URL if applicable

Using with Autonomous Agents

OpenClaw agents work autonomously—they don't wait for manual slash commands. Here's how to integrate Predicate Snapshot into autonomous workflows.

Add snapshot instructions directly in your task prompt:

1Navigate to amazon.com and find the cheapest laptop under $500.
2
3IMPORTANT: For page observation, use /predicate-snapshot instead of the
4default accessibility tree. Use /predicate-act to interact with elements
5by their ID from the snapshot.

Option 2: Modify Agent System Prompt

For consistent usage across all tasks, add to your agent's system prompt:

1## Browser Observation
2When observing web pages, always use /predicate-snapshot instead of the
3default accessibility tree. This provides ML-ranked elements optimized
4for efficient decision-making (~500 tokens vs ~18,000 tokens).
5
6To interact with page elements:
71. Call /predicate-snapshot to get ranked elements with IDs
82. Call /predicate-act <action> <element_id> to perform actions

Why This Matters Beyond Cost

1. Faster Inference

600K tokens vs 1.3K tokens isn't just about cost—it's about speed. Processing 600K tokens takes 10-15 seconds. Processing 1.3K tokens takes under a second. For multi-step tasks, this compounds dramatically.

2. Better Accuracy

Less noise means fewer hallucinations. When the LLM only sees relevant elements, it makes better decisions. We've observed significant accuracy improvements on complex navigation tasks.

3. Context Headroom

Multi-step browser tasks need room for conversation history. If each observation consumes 600K tokens, you hit context limits fast. With Predicate Snapshot, observations stay small, leaving room for history and reasoning.

4. Local LLMs Become Viable

We successfully ran complex browser automation tasks using a 3B parameter local model with Predicate Snapshots. This work hit the Hacker News front page. Small models can work when you give them clean, structured input.

Technical Architecture

Under the hood, Predicate Snapshot uses:

  1. Chrome DevTools Protocol (CDP) — Direct browser access for DOM extraction
  2. Playwright Adapter — Wraps Playwright pages for CDP session management
  3. Predicate Runtime SDK — ML ranking engine with cloud or local execution
  4. MCP Tool Interface — Standard OpenClaw skill protocol

The skill integrates with OpenClaw's browser session, requiring no changes to existing agent code beyond adding the skill commands.

Try It Yourself

Run the included demo to see the token comparison in action:

1cd ~/.openclaw/skills/predicate-snapshot
2
3# Run token comparison demo
4npm run demo
5
6# Or test in Docker
7./docker-test.sh skill

The demo runs against multiple sites and shows side-by-side token counts.

Get Started with Predicate Snapshot

Install the skill and start saving tokens on your OpenClaw browser agents today.

View on ClawHub