Browser Harness and Predicate: Agent Freedom Needs Execution Boundaries

Browser Harness recently sparked a useful debate about how AI agents should control browsers.

The argument is simple: browser automation frameworks have accumulated a lot of deterministic glue. Target management, cross-origin iframes, file upload handling, alert watchdogs, element extractors, click helpers, and recovery paths all turn into thousands of lines of framework code. Sometimes that code helps. Sometimes it becomes the thing the model has to fight.

Browser Harness takes the opposite approach. It keeps a Chrome DevTools Protocol (CDP) websocket alive, gives the model a small set of Python helpers, explains the setup in a SKILL.md, and lets the model work close to the protocol. If a helper is missing, the model can inspect the helper file and add one.

That is a compelling experiment. It takes the current generation of models seriously: they have seen CDP, Python, DOM APIs, and debugging patterns. They can often reason below the abstraction layer better than a rigid wrapper can.

The real question

The open question is not whether that freedom is useful. It is. The question is where freedom should stop.

Predicate approaches the same problem from a different angle: give the model room to plan and adapt, but make execution observable, verifiable, and bounded by policy.

What Browser Harness Gets Right

Browser Harness is strongest as a critique of over-abstracted agent frameworks.

First, it reduces framework impedance. If a model knows how CDP works, hiding CDP behind a narrow click() or upload_file() wrapper can make the system less capable, not more. When the abstraction leaks, the model has no path down to the real substrate.

Second, it treats browser control as an open-ended debugging task. Cross-origin iframes, file choosers, popups, downloads, and target switching are not edge cases for browser agents. They are normal web behavior. A model that can inspect the protocol and write a missing helper may recover faster than a framework waiting for a maintainer to add another special case.

Third, it exposes a real failure mode in high-level wrappers: a tool can return "success" while the browser state did not move. The function call completed; the task did not.

Those are real strengths. Any serious browser-agent infrastructure should learn from them.

The Trade-Off: Capability vs. Ambient Authority

The hard part is that the same design that gives the model flexibility can also expand its authority.

If an agent can rewrite helper code mid-task, it can adapt to missing tools. It can also create execution paths the operator did not review. If that agent is reading untrusted web content at the same time, prompt injection becomes more than a bad instruction. It becomes a path from page content to code execution.

That does not mean Browser Harness is "wrong." It means the deployment model matters.

For research, evals, games, demos, and isolated sandboxes, a mutable helper layer is powerful. For enterprise workflows, regulated data, credentials, customer environments, or shared infrastructure, operators usually need additional answers:

What exact action is being requested?
Which principal is requesting it?
Which resource will it touch?
Is this action within the current task mandate?
Did the browser actually reach the expected state afterward?
Can we produce an audit trail that explains both the decision and the outcome?

Raw browser control does not answer those questions by itself. It gives the model a stronger actuator. Production systems also need a control plane around that actuator.

Where Predicate Fits

Predicate is built around a different separation of concerns:

Let the model reason freely. Make execution explicit, authorized, and verified.

In practice, that means two complementary layers.

`predicate-runtime`: Browser Automation With Verification

predicate-runtime can drive browser automation, but its core value is not just issuing clicks and keystrokes. The important loop is:

1snapshot -> plan/action -> execute -> snapshot -> verify -> trace

The runtime gives the agent a structured view of the browser through semantic snapshots rather than forcing every step to consume raw protocol noise. It can compact the page into element records such as role, text, labels, URLs, and interaction affordances. That gives smaller local models enough context to act without flooding the prompt with every CDP event or DOM node.

The verification step is the key difference. A click is not treated as successful because a mouse event was dispatched. It is successful when the expected state is observed afterward: url_contains(...), exists(...), not_exists(...), a changed cart state, a dismissed modal, or another predicate tied to the task.

This matters because browser automation fails in ways that look locally healthy:

A click handler fires but the overlay stays open.
A button is visible but disabled by application state.
A form submits but returns inline validation.
A search box accepts text but does not navigate.
A product page opens in a new target the agent did not follow.

Browser Harness is right that wrappers can hide failures. Predicate's answer is to make state verification a first-class part of the run.

`predicate-authorityd`: Per-Action Authority

predicate-authorityd addresses a different boundary: whether an action is allowed to execute at all.

The sidecar is a local authorization daemon. An agent or runtime asks for permission to perform an action against a resource. The sidecar validates identity, evaluates policy, and can issue a short-lived mandate for that specific action. In deployments that use the execution endpoint, the sidecar can also execute supported operations on the agent's behalf so secrets and sensitive resources do not need to be handed directly to the model process.

The important distinction is between identity and authority.

An IdP token says who the agent is. It does not prove that this particular action is appropriate right now. A mandate narrows the question:

1principal: research-agent
2action: browser.navigate
3resource: https://example.com/products/*
4task: compare approved vendors
5ttl: 5 minutes

That shape is useful for browser agents because web tasks often cross trust boundaries. The model may need to browse public pages, submit forms, download files, call APIs, or hand work to another agent. Each step can be evaluated as an action with scope, resource, time, and provenance rather than relying on one broad credential.

Token Budget Is a Runtime Boundary Too

There is another practical difference: how much browser state the model has to read on every step.

A low-level harness gives the model broad access to CDP, DOM inspection, screenshots, helper code, logs, and protocol-level debugging context. That freedom is useful, but it can make the observation loop expensive. If the agent has to reason over raw DOM dumps or screenshots repeatedly, token cost and latency become part of the runtime trade-off.

Predicate-runtime treats token budget as part of the system design. Browser state is converted into compact semantic snapshots: element IDs, roles, labels, text, URLs, visibility, and interaction affordances. The goal is not to hide the browser from the model. The goal is to give it the parts of browser state most likely to matter for the next action.

1raw page state -> compact semantic snapshot -> model action -> verified state change

This changes the default loop. Instead of sending a full page representation and asking the model to discover structure from scratch, Predicate gives it a compressed action map, executes the selected action, and verifies the result with a fresh snapshot afterward.

The token trade-off

Browser Harness maximizes inspection freedom. Predicate optimizes the common path for lower token usage, lower latency, and smaller local-model viability, while still allowing richer inspection or vision fallback when the compact snapshot is not enough.

A More Useful Comparison

The difference is not "CDP freedom" versus "framework control." The more useful distinction is:

Layer	Browser Harness Emphasis	Predicate Emphasis
Browser actuation	Expose low-level CDP and let the model adapt.	Drive browser actions through runtime tools and structured snapshots.
Token budget	May require raw DOM, screenshots, logs, or protocol context when debugging.	Compact semantic snapshots keep the common observation loop small.
Tool surface	Mutable helpers the model can inspect and extend.	Stable action APIs plus planner/executor loops.
Failure handling	Let the model debug with protocol access.	Verify state after each action and trace failures.
Security boundary	Depends on sandboxing and operator discipline.	Evaluate per-action authority with mandates and policy.
Auditability	Flexible but harder to predefine.	Decision, action, snapshot, predicate, and trace artifacts.

These approaches can be complementary. A low-level harness can be a powerful actuator. Predicate can sit around browser execution as the verification and authority layer: what action is requested, whether it is allowed, what happened afterward, and what evidence explains the result.

The Design Principle

Browser Harness is a useful reminder that models are becoming better systems programmers. We should not assume every missing browser helper needs to become a permanent framework abstraction.

But production agent infrastructure has two jobs:

Give the model enough freedom to solve the task.
Prevent that freedom from becoming ambient authority.

For browser agents, that means the runtime should expose real browser control while preserving explicit execution boundaries. The model can plan, inspect, repair, and adapt. The infrastructure should authorize actions, verify outcomes, and leave evidence behind.

The principle

Give the model freedom of thought. Do not give it unbounded authority of execution.

Build browser agents with verification boundaries

Predicate Runtime gives browser agents structured snapshots, action traces, and post-execution predicates so automation failures become observable instead of silent.

Explore the Runtime SDK