Predicate × PydanticAI (Python): User Manual
This guide shows:
- For Predicate users: how to use PydanticAI as the orchestration layer while keeping Predicate as the browser capability layer.
- For PydanticAI users: how to add Predicate as your typed, reliable browser toolset.
PydanticAI docs: Pydantic AI
Use PydanticAI as the orchestration layer while keeping Predicate as the browser capability layer with typed tools, bounded context, and action verification.
Table of Contents
- What You Get
- Installation
- Integration Surface
- Concept: Dependency Injection
- Tool Reference
- What Each Tool Is For
- Quickstart: PydanticAI User Adds Predicate
- Example: Typed Extraction
- Example: Self-Correcting Click with Guard
- Example: Navigate → Snapshot → Scroll → Click
- Example: Clicking by Text Coordinates
- Tracing (Local + Cloud)
- Troubleshooting
What You Get
- Typed tools: Predicate returns structured data (elements with IDs/bboxes/roles), not raw HTML.
- Bounded context by default: Predicate
snapshotuseslimit=50by default. - Action + verification: use stable primitives (
click,type_text,press_key) plus lightweight guards (verify_url_matches,verify_text_present) to build reliable flows. - Tracing: optional Predicate tracing works for both:
- local JSONL traces
- cloud traces (Pro/Enterprise, uploaded on
tracer.close())
Installation
From the Python SDK:
pip install sentienceapi[pydanticai]Integration Surface
Predicate provides a small integration layer:
PredicatePydanticDeps: deps container (DI) for PydanticAIregister_predicate_tools(agent): registers Predicate tools on your PydanticAI agent
Imports:
from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_toolsConcept: Dependency Injection
PydanticAI passes dependencies through ctx.deps. We inject:
browser: AsyncPredicateBrowser- optionally
tracer: sentience.tracing.Tracer
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)Tool Reference
Registered tools include:
Observe
| Tool | Description |
|---|---|
snapshot_state(limit=50, include_screenshot=False) | Bounded BrowserState(url, elements[]) |
read_page(format="text"|"markdown"|"raw") | Returns ReadResult |
Act
| Tool | Description |
|---|---|
click(element_id) | Click a specific element by ID |
type_text(element_id, text) | Type text into element |
press_key(key) | Send a keypress (e.g., "Enter") |
scroll_to(element_id, behavior, block) | Scroll element into view |
navigate(url) | Navigate to URL |
click_rect(x, y, width, height, button, click_count) | Click by pixel coordinates |
Locate by Text
| Tool | Description |
|---|---|
find_text_rect(text, case_sensitive=False, whole_word=False, max_results=10) | Find text coordinates on page |
Verify / Guard
| Tool | Description |
|---|---|
verify_url_matches(pattern) | Check URL contains pattern |
verify_text_present(text, format, case_sensitive) | Check text appears on page |
assert_eventually_url_matches(pattern, timeout_s, poll_s) | Wait for URL to match pattern |
Notes:
- Keep
limitcapped unless you explicitly need more. type_texttracing intentionally avoids recording the fulltextpayload to reduce accidental PII leakage.
What Each Tool Is For
Observe
snapshot_state(...)
- What it does: takes a Predicate snapshot (bounded by
limit, default 50) and returns a typed summary of interactive elements. - When to use it:
- You want element IDs to drive actions like
click,type_text,scroll_to. - You want a structured view of the UI (roles/text/bboxes) instead of parsing HTML.
- You want element IDs to drive actions like
- Typical flow:
- call
snapshot_state() - pick an element by role/text (or ask the LLM to pick)
- act with the element id
- call
read_page(...)
- What it does: extracts page content as text, markdown, or raw HTML.
- When to use it:
- You are doing extraction ("what's the price / status / table row data?").
- You want to verify a text-based condition ("Order confirmed", "Error", etc.).
- Recommended defaults:
format="text"for simple checksformat="markdown"for more structured extraction
Act
Async vs Sync (Important)
- In this PydanticAI integration, all tools are
asyncbecause they drive a live browser session and often wait for navigation/DOM updates. - Practically: the agent will call these tools as async tool calls. If you call the underlying functions yourself in your own code, you must use
await(e.g.,await browser.goto(...),await scroll_to_async(...)). - The core Predicate SDK also has sync equivalents (e.g.
click(...),type_text(...),scroll_to(...),snapshot(...)) for non-PydanticAI usage, but the PydanticAI toolset is designed to be async-first.
click(element_id)
- Clicks a specific element (by Predicate element id).
- Use it after
snapshot_state()when you have a target button/link. - Async: tool call is async (PydanticAI will
awaitit internally).
type_text(element_id, text)
- Types into a specific element (by id).
- Use it for search boxes, forms, login fields, etc.
- Async: tool call is async (PydanticAI will
awaitit internally).
press_key(key)
- Sends a keypress (e.g.,
"Enter","Escape","Tab"). - Common pattern: type into a search box, then
press_key("Enter"). - Async: tool call is async (PydanticAI will
awaitit internally).
scroll_to(element_id, ...)
- Scrolls the element into view (useful when the next click fails because the element is off-screen).
- Use it when:
snapshot_state()contains your element but it's not in the viewport- the page is long / content is lazy-loaded
- Async: tool call is async (PydanticAI will
awaitit internally).
navigate(url)
- Navigates the browser to a URL (uses Playwright
page.gotothroughAsyncPredicateBrowser.goto). - Use it at the start of a task or to force a known state.
- Async: tool call is async (PydanticAI will
awaitit internally).
click_rect(x, y, width, height, ...)
- Clicks a rectangle by pixel coordinates. This is the "bridge" tool for when you found text coordinates (via
find_text_rect) but don't have a stable element id. - Typical use:
find_text_rect("Sign In")→ click the first visible match's rectangle center. - Async: tool call is async (PydanticAI will
awaitit internally).
Verify / Guard (How to Make Agents Reliable)
These are best used after an action to confirm the browser is now in the expected state.
verify_url_matches(pattern)
- Use after navigation/click when the "success condition" is a URL change.
- Example: after clicking "Checkout", verify the URL contains
/checkout.
verify_text_present(text, ...)
- Use when the success condition is a page message / label / heading.
- Example: after submitting a form, verify
"Thank you"appears.
assert_eventually_url_matches(pattern, timeout_s=..., poll_s=...)
- What it does: retries
verify_url_matchesin a loop until:- it passes, or
timeout_sis reached.
- How retry works:
- Every
poll_sseconds, it re-checks the URL. - This is ideal for async navigation / SPA transitions where URL updates are not immediate.
- Every
- When to use it:
- clicking a link triggers a delayed navigation
- login redirects
- multi-step flows where you need a robust "wait until" without writing custom waits
Quickstart: PydanticAI User Adds Predicate
This is the minimal working pattern:
import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent
from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools
class PageSummary(BaseModel):
url: str
headline: str
async def main():
browser = AsyncPredicateBrowser(headless=False)
await browser.start()
await browser.page.goto("https://example.com")
agent = Agent(
"openai:gpt-5",
deps_type=PredicatePydanticDeps,
output_type=PageSummary,
instructions="Use the Predicate tools to read the page and return a typed summary.",
)
register_predicate_tools(agent)
deps = PredicatePydanticDeps(browser=browser)
result = await agent.run("Return the url and the main headline.", deps=deps)
print(result.output)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())Example: Typed Extraction
This pattern is ideal when you care about validated structured data.
See also: sdk-python/examples/pydantic_ai/pydantic_ai_typed_extraction.py
High-level approach:
- use
read_page(format="markdown")orread_page(format="text") - return a strict Pydantic model
Example: Self-Correcting Click with Guard
See also: sdk-python/examples/pydantic_ai/pydantic_ai_self_correcting_click.py
Pattern:
snapshot_state()→ find element IDclick(element_id)assert_eventually_url_matches(...)to confirm the click really navigated
Example: Navigate → Snapshot → Scroll → Click
This is a common "reliable interaction" sequence when the target element is off-screen:
navigate(url)to force a known starting statesnapshot_state()to get element IDsscroll_to(element_id)to bring the target into viewclick(element_id)to interact- optionally
assert_eventually_url_matches(...)to confirm the state transition
Concrete (copy/paste) example:
import asyncio
from pydantic_ai import Agent
from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools
async def main():
browser = AsyncPredicateBrowser(headless=False)
await browser.start()
agent = Agent(
"openai:gpt-5",
deps_type=PredicatePydanticDeps,
output_type=str,
instructions=(
"Use these tools in order: "
"navigate(url), snapshot_state(), scroll_to(element_id), click(element_id), "
"then assert_eventually_url_matches(...) if navigation is expected."
),
)
register_predicate_tools(agent)
deps = PredicatePydanticDeps(browser=browser)
result = await agent.run(
"Go to https://example.com, find a link, scroll to it if needed, click it, and confirm URL changed.",
deps=deps,
)
print(result.output)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())Example: Clicking by Text Coordinates
Use find_text_rect("Sign In") when the best handle is visible text.
from pydantic_ai import Agent
# ... create browser + agent + register tools ...
# In your agent instructions, encourage:
# 1) find_text_rect("Sign In")
# 2) click_rect(...) using the returned coordinatesConcrete pattern:
- call
find_text_rect("Sign In") - pick the first match that's
in_viewport - call
click_rect(x=match.rect.x, y=match.rect.y, width=match.rect.width, height=match.rect.height)
Concrete (copy/paste) example (direct tool calls, no LLM decision-making):
import asyncio
from pydantic_ai import Agent
from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools
async def main():
browser = AsyncPredicateBrowser(headless=False)
await browser.start()
await browser.goto("https://example.com")
agent = Agent(
"openai:gpt-5",
deps_type=PredicatePydanticDeps,
output_type=str,
instructions="You may call Predicate tools, but the Python code will also demonstrate direct tool usage.",
)
tools = register_predicate_tools(agent)
ctx = type("Ctx", (), {})()
ctx.deps = PredicatePydanticDeps(browser=browser)
# 1) Locate text on screen
matches = await tools["find_text_rect"](ctx, "Sign In")
if matches.status != "success" or not matches.results:
raise RuntimeError(f"Text not found: {matches.error}")
# 2) Click the first in-viewport match by rectangle
m0 = next((m for m in matches.results if m.in_viewport), matches.results[0])
await tools["click_rect"](
ctx,
x=m0.rect.x,
y=m0.rect.y,
width=m0.rect.width,
height=m0.rect.height,
)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())Notes:
- Prefer element-id-based actions when possible (
snapshot_state→click(element_id)), since it's usually more stable. - Use
find_text_rect+click_rectwhen:- the element isn't in the Predicate registry (or you're operating purely from rendered text)
- you need an immediate pixel-level click (e.g., canvas-like UIs)
Tracing & Observability
How Tracing Works
When you pass a tracer via PredicatePydanticDeps(..., tracer=tracer), each tool call emits structured trace events:
run_start— marks the beginning of an agent runstep_start— before each tool invocationstep_end— after each tool completeserror— when exceptions occur
This gives you a clean, replayable timeline of what the agent actually did in the browser, separate from PydanticAI's orchestration layer.
Local vs Cloud Tracing
Predicate tracing supports two modes:
Local tracing writes JSONL to disk (JsonlTraceSink) for debugging and development:
from predicate import create_tracer
from predicate.integrations.pydanticai import PredicatePydanticDeps
# Create local tracer
tracer = create_tracer(run_id="pydanticai-demo")
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)
# Always close to flush events
tracer.close()Cloud tracing (Pro/Enterprise) buffers JSONL locally and uploads once on tracer.close():
from predicate import create_tracer
from predicate.integrations.pydanticai import PredicatePydanticDeps
# Create cloud tracer
tracer = create_tracer(
api_key="sk_pro_...",
upload_trace=True,
goal="PydanticAI + Predicate run",
agent_type="PydanticAI",
)
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)
# Uploads trace on close
tracer.close()Orchestration vs Browser Tracing
Key insight: Your framework (PydanticAI) owns LLM orchestration, while Predicate owns browser execution + structured state.
You can (and often should) instrument both:
- Use PydanticAI's built-in tracing/logging for agent decisions and LLM calls
- Use Predicate tracing for browser actions and verification outcomes
This dual-layer observability gives you complete visibility into both what the agent decided and what it actually did in the browser.
Troubleshooting
| Issue | Solution |
|---|---|
window.sentience is not available | Ensure the Predicate extension is loaded and injected into the Playwright session. |
| Tool calls succeed but nothing changes | Add guards: verify_url_matches, verify_text_present, and/or assert_eventually_url_matches. |
| Extraction is flaky | Prefer read_page(format="markdown") for extraction and keep snapshot_state(limit=50) for interaction targeting. |
Additional Resources
Last updated: January 2026