Docs/SDK/Pydantic AI Integration

Predicate × PydanticAI (Python): User Manual

This guide shows:

For Predicate users: how to use PydanticAI as the orchestration layer while keeping Predicate as the browser capability layer.
For PydanticAI users: how to add Predicate as your typed, reliable browser toolset.

PydanticAI docs: Pydantic AI

Use PydanticAI as the orchestration layer while keeping Predicate as the browser capability layer with typed tools, bounded context, and action verification.

What You Get
Installation
Integration Surface
Concept: Dependency Injection
Tool Reference
What Each Tool Is For
Quickstart: PydanticAI User Adds Predicate
Example: Typed Extraction
Example: Self-Correcting Click with Guard
Example: Navigate → Snapshot → Scroll → Click
Example: Clicking by Text Coordinates
Tracing (Local + Cloud)
Troubleshooting

What You Get

Typed tools: Predicate returns structured data (elements with IDs/bboxes/roles), not raw HTML.
Bounded context by default: Predicate snapshot uses limit=50 by default.
Action + verification: use stable primitives (click, type_text, press_key) plus lightweight guards (verify_url_matches, verify_text_present) to build reliable flows.
Tracing: optional Predicate tracing works for both:
- local JSONL traces
- cloud traces (Pro/Enterprise, uploaded on tracer.close())

Installation

From the Python SDK:

pip install sentienceapi[pydanticai]

Integration Surface

Predicate provides a small integration layer:

PredicatePydanticDeps: deps container (DI) for PydanticAI
register_predicate_tools(agent): registers Predicate tools on your PydanticAI agent

Imports:

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools

Concept: Dependency Injection

PydanticAI passes dependencies through ctx.deps. We inject:

browser: AsyncPredicateBrowser
optionally tracer: sentience.tracing.Tracer

deps = PredicatePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)

Tool Reference

Registered tools include:

Observe

Tool	Description
`snapshot_state(limit=50, include_screenshot=False)`	Bounded `BrowserState(url, elements[])`
`read_page(format="text"\|"markdown"\|"raw")`	Returns `ReadResult`

Act

Tool	Description
`click(element_id)`	Click a specific element by ID
`type_text(element_id, text)`	Type text into element
`press_key(key)`	Send a keypress (e.g., "Enter")
`scroll_to(element_id, behavior, block)`	Scroll element into view
`navigate(url)`	Navigate to URL
`click_rect(x, y, width, height, button, click_count)`	Click by pixel coordinates

Locate by Text

Tool	Description
`find_text_rect(text, case_sensitive=False, whole_word=False, max_results=10)`	Find text coordinates on page

Verify / Guard

Tool	Description
`verify_url_matches(pattern)`	Check URL contains pattern
`verify_text_present(text, format, case_sensitive)`	Check text appears on page
`assert_eventually_url_matches(pattern, timeout_s, poll_s)`	Wait for URL to match pattern

Notes:

Keep limit capped unless you explicitly need more.
type_text tracing intentionally avoids recording the full text payload to reduce accidental PII leakage.

What Each Tool Is For

Observe

`snapshot_state(...)`

What it does: takes a Predicate snapshot (bounded by limit, default 50) and returns a typed summary of interactive elements.
When to use it:
- You want element IDs to drive actions like click, type_text, scroll_to.
- You want a structured view of the UI (roles/text/bboxes) instead of parsing HTML.
Typical flow:
- call snapshot_state()
- pick an element by role/text (or ask the LLM to pick)
- act with the element id

`read_page(...)`

What it does: extracts page content as text, markdown, or raw HTML.
When to use it:
- You are doing extraction ("what's the price / status / table row data?").
- You want to verify a text-based condition ("Order confirmed", "Error", etc.).
Recommended defaults:
- format="text" for simple checks
- format="markdown" for more structured extraction

Act

Async vs Sync (Important)

In this PydanticAI integration, all tools are async because they drive a live browser session and often wait for navigation/DOM updates.
Practically: the agent will call these tools as async tool calls. If you call the underlying functions yourself in your own code, you must use await (e.g., await browser.goto(...), await scroll_to_async(...)).
The core Predicate SDK also has sync equivalents (e.g. click(...), type_text(...), scroll_to(...), snapshot(...)) for non-PydanticAI usage, but the PydanticAI toolset is designed to be async-first.

`click(element_id)`

Clicks a specific element (by Predicate element id).
Use it after snapshot_state() when you have a target button/link.
Async: tool call is async (PydanticAI will await it internally).

`type_text(element_id, text)`

Types into a specific element (by id).
Use it for search boxes, forms, login fields, etc.
Async: tool call is async (PydanticAI will await it internally).

`press_key(key)`

Sends a keypress (e.g., "Enter", "Escape", "Tab").
Common pattern: type into a search box, then press_key("Enter").
Async: tool call is async (PydanticAI will await it internally).

`scroll_to(element_id, ...)`

Scrolls the element into view (useful when the next click fails because the element is off-screen).
Use it when:
- snapshot_state() contains your element but it's not in the viewport
- the page is long / content is lazy-loaded
Async: tool call is async (PydanticAI will await it internally).

`navigate(url)`

Navigates the browser to a URL (uses Playwright page.goto through AsyncPredicateBrowser.goto).
Use it at the start of a task or to force a known state.
Async: tool call is async (PydanticAI will await it internally).

`click_rect(x, y, width, height, ...)`

Clicks a rectangle by pixel coordinates. This is the "bridge" tool for when you found text coordinates (via find_text_rect) but don't have a stable element id.
Typical use: find_text_rect("Sign In") → click the first visible match's rectangle center.
Async: tool call is async (PydanticAI will await it internally).

Verify / Guard (How to Make Agents Reliable)

These are best used after an action to confirm the browser is now in the expected state.

`verify_url_matches(pattern)`

Use after navigation/click when the "success condition" is a URL change.
Example: after clicking "Checkout", verify the URL contains /checkout.

`verify_text_present(text, ...)`

Use when the success condition is a page message / label / heading.
Example: after submitting a form, verify "Thank you" appears.

`assert_eventually_url_matches(pattern, timeout_s=..., poll_s=...)`

What it does: retries verify_url_matches in a loop until:
- it passes, or
- timeout_s is reached.
How retry works:
- Every poll_s seconds, it re-checks the URL.
- This is ideal for async navigation / SPA transitions where URL updates are not immediate.
When to use it:
- clicking a link triggers a delayed navigation
- login redirects
- multi-step flows where you need a robust "wait until" without writing custom waits

Quickstart: PydanticAI User Adds Predicate

This is the minimal working pattern:

import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools


class PageSummary(BaseModel):
  url: str
  headline: str


async def main():
  browser = AsyncPredicateBrowser(headless=False)
  await browser.start()
  await browser.page.goto("https://example.com")

  agent = Agent(
      "openai:gpt-5",
      deps_type=PredicatePydanticDeps,
      output_type=PageSummary,
      instructions="Use the Predicate tools to read the page and return a typed summary.",
  )
  register_predicate_tools(agent)

  deps = PredicatePydanticDeps(browser=browser)
  result = await agent.run("Return the url and the main headline.", deps=deps)
  print(result.output)

  await browser.close()


if __name__ == "__main__":
  asyncio.run(main())

Example: Typed Extraction

This pattern is ideal when you care about validated structured data.

See also: sdk-python/examples/pydantic_ai/pydantic_ai_typed_extraction.py

High-level approach:

use read_page(format="markdown") or read_page(format="text")
return a strict Pydantic model

Example: Self-Correcting Click with Guard

See also: sdk-python/examples/pydantic_ai/pydantic_ai_self_correcting_click.py

Pattern:

snapshot_state() → find element ID
click(element_id)
assert_eventually_url_matches(...) to confirm the click really navigated

Example: Navigate → Snapshot → Scroll → Click

This is a common "reliable interaction" sequence when the target element is off-screen:

navigate(url) to force a known starting state
snapshot_state() to get element IDs
scroll_to(element_id) to bring the target into view
click(element_id) to interact
optionally assert_eventually_url_matches(...) to confirm the state transition

Concrete (copy/paste) example:

import asyncio
from pydantic_ai import Agent

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools


async def main():
  browser = AsyncPredicateBrowser(headless=False)
  await browser.start()

  agent = Agent(
      "openai:gpt-5",
      deps_type=PredicatePydanticDeps,
      output_type=str,
      instructions=(
          "Use these tools in order: "
          "navigate(url), snapshot_state(), scroll_to(element_id), click(element_id), "
          "then assert_eventually_url_matches(...) if navigation is expected."
      ),
  )
  register_predicate_tools(agent)

  deps = PredicatePydanticDeps(browser=browser)
  result = await agent.run(
      "Go to https://example.com, find a link, scroll to it if needed, click it, and confirm URL changed.",
      deps=deps,
  )
  print(result.output)

  await browser.close()


if __name__ == "__main__":
  asyncio.run(main())

Example: Clicking by Text Coordinates

Use find_text_rect("Sign In") when the best handle is visible text.

from pydantic_ai import Agent

# ... create browser + agent + register tools ...

# In your agent instructions, encourage:
# 1) find_text_rect("Sign In")
# 2) click_rect(...) using the returned coordinates

Concrete pattern:

call find_text_rect("Sign In")
pick the first match that's in_viewport
call click_rect(x=match.rect.x, y=match.rect.y, width=match.rect.width, height=match.rect.height)

Concrete (copy/paste) example (direct tool calls, no LLM decision-making):

import asyncio
from pydantic_ai import Agent

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools


async def main():
  browser = AsyncPredicateBrowser(headless=False)
  await browser.start()
  await browser.goto("https://example.com")

  agent = Agent(
      "openai:gpt-5",
      deps_type=PredicatePydanticDeps,
      output_type=str,
      instructions="You may call Predicate tools, but the Python code will also demonstrate direct tool usage.",
  )
  tools = register_predicate_tools(agent)

  ctx = type("Ctx", (), {})()
  ctx.deps = PredicatePydanticDeps(browser=browser)

  # 1) Locate text on screen
  matches = await tools["find_text_rect"](ctx, "Sign In")
  if matches.status != "success" or not matches.results:
      raise RuntimeError(f"Text not found: {matches.error}")

  # 2) Click the first in-viewport match by rectangle
  m0 = next((m for m in matches.results if m.in_viewport), matches.results[0])
  await tools["click_rect"](
      ctx,
      x=m0.rect.x,
      y=m0.rect.y,
      width=m0.rect.width,
      height=m0.rect.height,
  )

  await browser.close()


if __name__ == "__main__":
  asyncio.run(main())

Notes:

Prefer element-id-based actions when possible (snapshot_state → click(element_id)), since it's usually more stable.
Use find_text_rect + click_rect when:
- the element isn't in the Predicate registry (or you're operating purely from rendered text)
- you need an immediate pixel-level click (e.g., canvas-like UIs)

Tracing & Observability

How Tracing Works

When you pass a tracer via PredicatePydanticDeps(..., tracer=tracer), each tool call emits structured trace events:

run_start — marks the beginning of an agent run
step_start — before each tool invocation
step_end — after each tool completes
error — when exceptions occur

This gives you a clean, replayable timeline of what the agent actually did in the browser, separate from PydanticAI's orchestration layer.

Local vs Cloud Tracing

Predicate tracing supports two modes:

Local tracing writes JSONL to disk (JsonlTraceSink) for debugging and development:

from predicate import create_tracer
from predicate.integrations.pydanticai import PredicatePydanticDeps

# Create local tracer
tracer = create_tracer(run_id="pydanticai-demo")
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)

result = await agent.run("...", deps=deps)

# Always close to flush events
tracer.close()

Cloud tracing (Pro/Enterprise) buffers JSONL locally and uploads once on tracer.close():

from predicate import create_tracer
from predicate.integrations.pydanticai import PredicatePydanticDeps

# Create cloud tracer
tracer = create_tracer(
  api_key="sk_pro_...",
  upload_trace=True,
  goal="PydanticAI + Predicate run",
  agent_type="PydanticAI",
)
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)

result = await agent.run("...", deps=deps)

# Uploads trace on close
tracer.close()

Orchestration vs Browser Tracing

Key insight: Your framework (PydanticAI) owns LLM orchestration, while Predicate owns browser execution + structured state.

You can (and often should) instrument both:

Use PydanticAI's built-in tracing/logging for agent decisions and LLM calls
Use Predicate tracing for browser actions and verification outcomes

This dual-layer observability gives you complete visibility into both what the agent decided and what it actually did in the browser.

Troubleshooting

Issue	Solution
`window.sentience` is not available	Ensure the Predicate extension is loaded and injected into the Playwright session.
Tool calls succeed but nothing changes	Add guards: `verify_url_matches`, `verify_text_present`, and/or `assert_eventually_url_matches`.
Extraction is flaky	Prefer `read_page(format="markdown")` for extraction and keep `snapshot_state(limit=50)` for interaction targeting.

Additional Resources

Last updated: January 2026

Browser-Use Integration

LangChain / LangGraph Integration

Predicate × PydanticAI (Python): User Manual

Table of Contents

What You Get

Installation

Integration Surface

Concept: Dependency Injection

Tool Reference

Observe

Act

Locate by Text

Verify / Guard

What Each Tool Is For

Observe

snapshot_state(...)

read_page(...)

Act

Async vs Sync (Important)

click(element_id)

type_text(element_id, text)

press_key(key)

scroll_to(element_id, ...)

navigate(url)

click_rect(x, y, width, height, ...)

Verify / Guard (How to Make Agents Reliable)

verify_url_matches(pattern)

verify_text_present(text, ...)

assert_eventually_url_matches(pattern, timeout_s=..., poll_s=...)

Quickstart: PydanticAI User Adds Predicate

Example: Typed Extraction

Example: Self-Correcting Click with Guard

Example: Navigate → Snapshot → Scroll → Click

Example: Clicking by Text Coordinates

Tracing & Observability

How Tracing Works

Local vs Cloud Tracing

Orchestration vs Browser Tracing

Troubleshooting

Additional Resources

`snapshot_state(...)`

`read_page(...)`

`click(element_id)`

`type_text(element_id, text)`

`press_key(key)`

`scroll_to(element_id, ...)`

`navigate(url)`

`click_rect(x, y, width, height, ...)`

`verify_url_matches(pattern)`

`verify_text_present(text, ...)`

`assert_eventually_url_matches(pattern, timeout_s=..., poll_s=...)`