Docs/SDK/Pydantic AI Integration

Predicate × PydanticAI (Python): User Manual

This guide shows:

  • For Predicate users: how to use PydanticAI as the orchestration layer while keeping Predicate as the browser capability layer.
  • For PydanticAI users: how to add Predicate as your typed, reliable browser toolset.

PydanticAI docs: Pydantic AI

Use PydanticAI as the orchestration layer while keeping Predicate as the browser capability layer with typed tools, bounded context, and action verification.

Table of Contents

  1. What You Get
  2. Installation
  3. Integration Surface
  4. Concept: Dependency Injection
  5. Tool Reference
  6. What Each Tool Is For
  7. Quickstart: PydanticAI User Adds Predicate
  8. Example: Typed Extraction
  9. Example: Self-Correcting Click with Guard
  10. Example: Navigate → Snapshot → Scroll → Click
  11. Example: Clicking by Text Coordinates
  12. Tracing (Local + Cloud)
  13. Troubleshooting

What You Get

  • Typed tools: Predicate returns structured data (elements with IDs/bboxes/roles), not raw HTML.
  • Bounded context by default: Predicate snapshot uses limit=50 by default.
  • Action + verification: use stable primitives (click, type_text, press_key) plus lightweight guards (verify_url_matches, verify_text_present) to build reliable flows.
  • Tracing: optional Predicate tracing works for both:
    • local JSONL traces
    • cloud traces (Pro/Enterprise, uploaded on tracer.close())

Installation

From the Python SDK:

pip install sentienceapi[pydanticai]

Integration Surface

Predicate provides a small integration layer:

  • PredicatePydanticDeps: deps container (DI) for PydanticAI
  • register_predicate_tools(agent): registers Predicate tools on your PydanticAI agent

Imports:

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools

Concept: Dependency Injection

PydanticAI passes dependencies through ctx.deps. We inject:

  • browser: AsyncPredicateBrowser
  • optionally tracer: sentience.tracing.Tracer
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)

Tool Reference

Registered tools include:

Observe

ToolDescription
snapshot_state(limit=50, include_screenshot=False)Bounded BrowserState(url, elements[])
read_page(format="text"|"markdown"|"raw")Returns ReadResult

Act

ToolDescription
click(element_id)Click a specific element by ID
type_text(element_id, text)Type text into element
press_key(key)Send a keypress (e.g., "Enter")
scroll_to(element_id, behavior, block)Scroll element into view
navigate(url)Navigate to URL
click_rect(x, y, width, height, button, click_count)Click by pixel coordinates

Locate by Text

ToolDescription
find_text_rect(text, case_sensitive=False, whole_word=False, max_results=10)Find text coordinates on page

Verify / Guard

ToolDescription
verify_url_matches(pattern)Check URL contains pattern
verify_text_present(text, format, case_sensitive)Check text appears on page
assert_eventually_url_matches(pattern, timeout_s, poll_s)Wait for URL to match pattern

Notes:

  • Keep limit capped unless you explicitly need more.
  • type_text tracing intentionally avoids recording the full text payload to reduce accidental PII leakage.

What Each Tool Is For

Observe

snapshot_state(...)

  • What it does: takes a Predicate snapshot (bounded by limit, default 50) and returns a typed summary of interactive elements.
  • When to use it:
    • You want element IDs to drive actions like click, type_text, scroll_to.
    • You want a structured view of the UI (roles/text/bboxes) instead of parsing HTML.
  • Typical flow:
    • call snapshot_state()
    • pick an element by role/text (or ask the LLM to pick)
    • act with the element id

read_page(...)

  • What it does: extracts page content as text, markdown, or raw HTML.
  • When to use it:
    • You are doing extraction ("what's the price / status / table row data?").
    • You want to verify a text-based condition ("Order confirmed", "Error", etc.).
  • Recommended defaults:
    • format="text" for simple checks
    • format="markdown" for more structured extraction

Act

Async vs Sync (Important)

  • In this PydanticAI integration, all tools are async because they drive a live browser session and often wait for navigation/DOM updates.
  • Practically: the agent will call these tools as async tool calls. If you call the underlying functions yourself in your own code, you must use await (e.g., await browser.goto(...), await scroll_to_async(...)).
  • The core Predicate SDK also has sync equivalents (e.g. click(...), type_text(...), scroll_to(...), snapshot(...)) for non-PydanticAI usage, but the PydanticAI toolset is designed to be async-first.

click(element_id)

  • Clicks a specific element (by Predicate element id).
  • Use it after snapshot_state() when you have a target button/link.
  • Async: tool call is async (PydanticAI will await it internally).

type_text(element_id, text)

  • Types into a specific element (by id).
  • Use it for search boxes, forms, login fields, etc.
  • Async: tool call is async (PydanticAI will await it internally).

press_key(key)

  • Sends a keypress (e.g., "Enter", "Escape", "Tab").
  • Common pattern: type into a search box, then press_key("Enter").
  • Async: tool call is async (PydanticAI will await it internally).

scroll_to(element_id, ...)

  • Scrolls the element into view (useful when the next click fails because the element is off-screen).
  • Use it when:
    • snapshot_state() contains your element but it's not in the viewport
    • the page is long / content is lazy-loaded
  • Async: tool call is async (PydanticAI will await it internally).
  • Navigates the browser to a URL (uses Playwright page.goto through AsyncPredicateBrowser.goto).
  • Use it at the start of a task or to force a known state.
  • Async: tool call is async (PydanticAI will await it internally).

click_rect(x, y, width, height, ...)

  • Clicks a rectangle by pixel coordinates. This is the "bridge" tool for when you found text coordinates (via find_text_rect) but don't have a stable element id.
  • Typical use: find_text_rect("Sign In") → click the first visible match's rectangle center.
  • Async: tool call is async (PydanticAI will await it internally).

Verify / Guard (How to Make Agents Reliable)

These are best used after an action to confirm the browser is now in the expected state.

verify_url_matches(pattern)

  • Use after navigation/click when the "success condition" is a URL change.
  • Example: after clicking "Checkout", verify the URL contains /checkout.

verify_text_present(text, ...)

  • Use when the success condition is a page message / label / heading.
  • Example: after submitting a form, verify "Thank you" appears.

assert_eventually_url_matches(pattern, timeout_s=..., poll_s=...)

  • What it does: retries verify_url_matches in a loop until:
    • it passes, or
    • timeout_s is reached.
  • How retry works:
    • Every poll_s seconds, it re-checks the URL.
    • This is ideal for async navigation / SPA transitions where URL updates are not immediate.
  • When to use it:
    • clicking a link triggers a delayed navigation
    • login redirects
    • multi-step flows where you need a robust "wait until" without writing custom waits

Quickstart: PydanticAI User Adds Predicate

This is the minimal working pattern:

import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools


class PageSummary(BaseModel):
  url: str
  headline: str


async def main():
  browser = AsyncPredicateBrowser(headless=False)
  await browser.start()
  await browser.page.goto("https://example.com")

  agent = Agent(
      "openai:gpt-5",
      deps_type=PredicatePydanticDeps,
      output_type=PageSummary,
      instructions="Use the Predicate tools to read the page and return a typed summary.",
  )
  register_predicate_tools(agent)

  deps = PredicatePydanticDeps(browser=browser)
  result = await agent.run("Return the url and the main headline.", deps=deps)
  print(result.output)

  await browser.close()


if __name__ == "__main__":
  asyncio.run(main())

Example: Typed Extraction

This pattern is ideal when you care about validated structured data.

See also: sdk-python/examples/pydantic_ai/pydantic_ai_typed_extraction.py

High-level approach:

  • use read_page(format="markdown") or read_page(format="text")
  • return a strict Pydantic model

Example: Self-Correcting Click with Guard

See also: sdk-python/examples/pydantic_ai/pydantic_ai_self_correcting_click.py

Pattern:

  • snapshot_state() → find element ID
  • click(element_id)
  • assert_eventually_url_matches(...) to confirm the click really navigated

Example: Navigate → Snapshot → Scroll → Click

This is a common "reliable interaction" sequence when the target element is off-screen:

  • navigate(url) to force a known starting state
  • snapshot_state() to get element IDs
  • scroll_to(element_id) to bring the target into view
  • click(element_id) to interact
  • optionally assert_eventually_url_matches(...) to confirm the state transition

Concrete (copy/paste) example:

import asyncio
from pydantic_ai import Agent

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools


async def main():
  browser = AsyncPredicateBrowser(headless=False)
  await browser.start()

  agent = Agent(
      "openai:gpt-5",
      deps_type=PredicatePydanticDeps,
      output_type=str,
      instructions=(
          "Use these tools in order: "
          "navigate(url), snapshot_state(), scroll_to(element_id), click(element_id), "
          "then assert_eventually_url_matches(...) if navigation is expected."
      ),
  )
  register_predicate_tools(agent)

  deps = PredicatePydanticDeps(browser=browser)
  result = await agent.run(
      "Go to https://example.com, find a link, scroll to it if needed, click it, and confirm URL changed.",
      deps=deps,
  )
  print(result.output)

  await browser.close()


if __name__ == "__main__":
  asyncio.run(main())

Example: Clicking by Text Coordinates

Use find_text_rect("Sign In") when the best handle is visible text.

from pydantic_ai import Agent

# ... create browser + agent + register tools ...

# In your agent instructions, encourage:
# 1) find_text_rect("Sign In")
# 2) click_rect(...) using the returned coordinates

Concrete pattern:

  • call find_text_rect("Sign In")
  • pick the first match that's in_viewport
  • call click_rect(x=match.rect.x, y=match.rect.y, width=match.rect.width, height=match.rect.height)

Concrete (copy/paste) example (direct tool calls, no LLM decision-making):

import asyncio
from pydantic_ai import Agent

from predicate import AsyncPredicateBrowser
from predicate.integrations.pydanticai import PredicatePydanticDeps, register_predicate_tools


async def main():
  browser = AsyncPredicateBrowser(headless=False)
  await browser.start()
  await browser.goto("https://example.com")

  agent = Agent(
      "openai:gpt-5",
      deps_type=PredicatePydanticDeps,
      output_type=str,
      instructions="You may call Predicate tools, but the Python code will also demonstrate direct tool usage.",
  )
  tools = register_predicate_tools(agent)

  ctx = type("Ctx", (), {})()
  ctx.deps = PredicatePydanticDeps(browser=browser)

  # 1) Locate text on screen
  matches = await tools["find_text_rect"](ctx, "Sign In")
  if matches.status != "success" or not matches.results:
      raise RuntimeError(f"Text not found: {matches.error}")

  # 2) Click the first in-viewport match by rectangle
  m0 = next((m for m in matches.results if m.in_viewport), matches.results[0])
  await tools["click_rect"](
      ctx,
      x=m0.rect.x,
      y=m0.rect.y,
      width=m0.rect.width,
      height=m0.rect.height,
  )

  await browser.close()


if __name__ == "__main__":
  asyncio.run(main())

Notes:

  • Prefer element-id-based actions when possible (snapshot_stateclick(element_id)), since it's usually more stable.
  • Use find_text_rect + click_rect when:
    • the element isn't in the Predicate registry (or you're operating purely from rendered text)
    • you need an immediate pixel-level click (e.g., canvas-like UIs)

Tracing & Observability

How Tracing Works

When you pass a tracer via PredicatePydanticDeps(..., tracer=tracer), each tool call emits structured trace events:

  • run_start — marks the beginning of an agent run
  • step_start — before each tool invocation
  • step_end — after each tool completes
  • error — when exceptions occur

This gives you a clean, replayable timeline of what the agent actually did in the browser, separate from PydanticAI's orchestration layer.

Local vs Cloud Tracing

Predicate tracing supports two modes:

Local tracing writes JSONL to disk (JsonlTraceSink) for debugging and development:

from predicate import create_tracer
from predicate.integrations.pydanticai import PredicatePydanticDeps

# Create local tracer
tracer = create_tracer(run_id="pydanticai-demo")
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)

result = await agent.run("...", deps=deps)

# Always close to flush events
tracer.close()

Cloud tracing (Pro/Enterprise) buffers JSONL locally and uploads once on tracer.close():

from predicate import create_tracer
from predicate.integrations.pydanticai import PredicatePydanticDeps

# Create cloud tracer
tracer = create_tracer(
  api_key="sk_pro_...",
  upload_trace=True,
  goal="PydanticAI + Predicate run",
  agent_type="PydanticAI",
)
deps = PredicatePydanticDeps(browser=browser, tracer=tracer)

result = await agent.run("...", deps=deps)

# Uploads trace on close
tracer.close()

Orchestration vs Browser Tracing

Key insight: Your framework (PydanticAI) owns LLM orchestration, while Predicate owns browser execution + structured state.

You can (and often should) instrument both:

  • Use PydanticAI's built-in tracing/logging for agent decisions and LLM calls
  • Use Predicate tracing for browser actions and verification outcomes

This dual-layer observability gives you complete visibility into both what the agent decided and what it actually did in the browser.


Troubleshooting

IssueSolution
window.sentience is not availableEnsure the Predicate extension is loaded and injected into the Playwright session.
Tool calls succeed but nothing changesAdd guards: verify_url_matches, verify_text_present, and/or assert_eventually_url_matches.
Extraction is flakyPrefer read_page(format="markdown") for extraction and keep snapshot_state(limit=50) for interaction targeting.

Additional Resources


Last updated: January 2026