Docs/SDK/Snapshot API

Snapshot API

The snapshot() function captures the current rendered page state and returns a ranked, token-bounded set of interactive elements, plus runtime signals you can use for Jest-style verification: layout/ordinality (including dominant_group_key), modal/overlay detection (modal_detected, modal_grids), and diagnostics (stability/confidence, reason codes, and best-effort CAPTCHA / “requires vision” signals).

Basic Usage

from predicate import snapshot, SnapshotOptions, SnapshotFilter

# Basic snapshot (uses default options)
snap = snapshot(

Important Notes

Credit Consumption:

When api_key is provided, this calls the server-side /v1/snapshot endpoint which consumes 1 credit per call (metered billing).
Use use_api=False for local processing (no credits; ranking is best-effort/local-only).

Gateway timeout (API mode):

When use_api=true, the SDK sends raw_elements to the Gateway (POST /v1/snapshot) and waits for a refined response.
By default, the SDK uses a 30s timeout for this Gateway round trip.
If you see client-side timeouts on large/heavy pages (e.g. ReadTimeout), increase the Gateway timeout:
- Python: SnapshotOptions.gateway_timeout_s (seconds)
- TypeScript: SnapshotOptions.gatewayTimeoutMs (milliseconds)

Payload Size Limit:

The snapshot payload sent to the server is capped at 10MB to ensure reliable API performance.
If your page has more elements than fit in 10MB, use the limit option to reduce the number of elements, or use use_api=False for local processing (no size limit).

Screenshots:

If you pass screenshot: true, the screenshot is captured locally by the extension.
Even in use_api=true mode, the SDK does not receive screenshots from the server; it merges the server-ranked elements with the locally captured screenshot.

Parameters

Python:

browser (PredicateBrowser): Browser instance
options (SnapshotOptions, optional): Snapshot configuration options

SnapshotOptions fields:

screenshot (bool | ScreenshotConfig, optional): Capture screenshot. True for PNG, or {"format": "jpeg", "quality": 80}. Default: False.
limit (int, optional): Maximum number of elements to return. Default: 50. Range: 1-500 (SDK). In API mode, the server caps this value (default cap: 100).
filter (SnapshotFilter | dict, optional): Filter options:
- min_area: Minimum element area in pixels
- allowed_roles: List of roles to include (e.g., ["button", "link"])
- min_z_index: Minimum z-index value
use_api (bool, optional): Force server API (True) or local extension (False). Auto-detects if None.
gateway_timeout_s (float, optional): Gateway snapshot timeout in seconds (only relevant when use_api=true). Default: 30.
show_overlay (bool, optional): Display visual overlay in browser highlighting detected elements. Default: False.
goal (str, optional): Optional goal/task description for ML reranking.

TypeScript:

browser (PredicateBrowser): Browser instance
options (object, optional):
- screenshot (boolean | object): Capture screenshot
- limit (number): Maximum elements to return
- filter (object): Filter options
- use_api (boolean): Force server API or local extension
- gatewayTimeoutMs (number): Gateway snapshot timeout in milliseconds (only relevant when use_api=true). Default: 30000.
- show_overlay (boolean): Display visual overlay (default: false)
- goal (string, optional): Optional goal/task description for ML reranking

Example: increase Gateway timeout

from predicate import snapshot, SnapshotOptions

# Large pages can take longer to refine server-side.
snap = snapshot(
    browser,
    SnapshotOptions(use_api=True, gateway_timeout_s=60),
)

Returns

Snapshot object with:

elements: List of Element objects (sorted by importance)
url: Current page URL
viewport: Viewport dimensions
timestamp: Snapshot timestamp
screenshot: Base64-encoded image (if requested)
dominant_group_key: Geometric group key for the main content area (may be null)
diagnostics: Stability/debug diagnostics (may be null)
modal_detected: True if a modal/overlay grid was detected (may be null)
modal_grids: Detected modal grids (may be null)
ml_rerank: ML reranking metadata (may be null)

Diagnostics (`snapshot.diagnostics`)

diagnostics is best-effort runtime evidence about page stability and “how trustworthy the snapshot is right now”.

Use it to:

decide whether to retry (.eventually() / bounded retries)
explain failures (add the reason codes to artifacts/logs)
detect non-DOM blockers (CAPTCHA signals)
decide whether to escalate to a different executor when structure is insufficient

Diagnostics Fields

Field	Type	How to use it
`confidence`	`number \| null`	A 0..1 stability score. Low confidence typically means the page is still moving (navigation, hydration, modals, DOM churn). Use it as a signal to retry snapshots before acting.
`reasons`	`string[]`	Machine-readable reason codes explaining low confidence. Log these and include them in artifacts—this is often the fastest way to debug flaky runs.
`metrics`	`object \| null`	Best-effort browser-side metrics used to compute confidence. Useful for diagnosing “why was this unstable?” and for telemetry dashboards.
`captcha`	`object \| null`	Detection-only CAPTCHA signal (no solving). Use it to branch to your CAPTCHA handling strategy or fail fast with a clear reason.
`requires_vision`	`boolean \| null`	Best-effort recommendation that structure may be insufficient for this page state (e.g., heavy canvas / non-semantic UI). Use it as an escalation signal.
`requires_vision_reason`	`string \| null`	Human-readable explanation for why structure is likely insufficient. Include it in traces/artifacts to make failures explainable.

Diagnostics Metrics (`diagnostics.metrics`)

Metric	Meaning
`ready_state`	Document readyState (e.g., `"loading"`, `"interactive"`, `"complete"`).
`quiet_ms`	How long the page has been “quiet” (no major DOM churn), in milliseconds (best-effort).
`node_count`	Approximate DOM node count (best-effort). Useful for “page exploded” diagnostics.
`interactive_count`	How many interactive candidates were detected (best-effort).
`raw_elements_count`	How many raw elements were captured before filtering (best-effort).

CAPTCHA Diagnostics (`diagnostics.captcha`)

CAPTCHA diagnostics are detection-only signals:

Field	Meaning
`detected`	True if a CAPTCHA-like pattern was detected.
`provider_hint`	Best-effort provider hint (may be null).
`confidence`	0..1 confidence of detection.
`evidence`	Best-effort evidence hits (text/selector/iframe/url) to make detections explainable.

Element Properties

Each element in snapshot.elements has the following properties:

Core Properties

Property	Type	Description
`id`	`int`	Unique identifier for clicking/interacting
`role`	`str`	Semantic role (button, link, textbox, heading, etc.)
`text`	`str \| None`	Visible text content
`importance`	`int`	AI importance score (0-1000, higher = more important)
`bbox`	`BBox`	Bounding box with x, y, width, height
`visual_cues`	`VisualCues`	Visual analysis (is_primary, is_clickable, background_color_name)
`in_viewport`	`bool`	Whether element is visible in current viewport
`is_occluded`	`bool`	Whether element is covered by another element
`z_index`	`int`	CSS z-index value (default: 0)

ML Reranking Properties (Optional)

These fields are present when goal is provided in SnapshotOptions:

Property	Type	Description
`fused_rank_index`	`int \| None`	0-based rank after sorting by `importance_fused`
`heuristic_index`	`int \| None`	0-based rank before ML reranking (original heuristic position)
`ml_probability`	`float \| None`	Confidence score from ONNX model (0.0 - 1.0)
`ml_score`	`float \| None`	Raw logit score from ONNX model (for debugging)

Ordinal / Layout Properties (Optional)

These fields support position-based selection ("first result", "top item"):

Property	Type	Description
`center_x`	`float \| None`	X coordinate of element center (viewport coords)
`center_y`	`float \| None`	Y coordinate of element center (viewport coords)
`doc_y`	`float \| None`	Y coordinate in document (center_y + scroll_y)
`group_key`	`str \| None`	Geometric bucket key for ordinal grouping
`group_index`	`int \| None`	Position within group (0-indexed, sorted by doc_y)
`in_dominant_group`	`bool \| None`	Whether element is in the dominant group (main content area)

State-Aware Assertion Properties (Optional)

These fields enable Jest-style assertions for form controls:

Property	Type	Description
`name`	`str \| None`	Accessible name/label for controls (distinct from visible text)
`value`	`str \| None`	Current value for inputs/textarea/select (may be redacted for PII)
`input_type`	`str \| None`	Input type (e.g., "text", "email", "password")
`value_redacted`	`bool \| None`	Whether value was redacted for privacy (password/email/tel)
`checked`	`bool \| None`	Normalized checked state for checkboxes/radios
`disabled`	`bool \| None`	Normalized disabled state
`expanded`	`bool \| None`	Normalized expanded state for dropdowns/accordions
`aria_checked`	`str \| None`	Raw ARIA checked string (tri-state: "true"/"false"/"mixed")
`aria_disabled`	`str \| None`	Raw ARIA disabled string
`aria_expanded`	`str \| None`	Raw ARIA expanded string

Additional Properties (Optional)

Property	Type	Description
`href`	`str \| None`	Hyperlink URL (for link elements)
`nearby_text`	`str \| None`	Nearby static text (best-effort, usually for top-ranked elements)
`diff_status`	`str \| None`	Diff status: "ADDED", "REMOVED", "MODIFIED", "MOVED" (for diff overlay)

Visual Overlay Feature

When show_overlay=True, Predicate displays a visual overlay in the browser highlighting all detected elements:

Color Coding:

Red: Target element (when specified in agent actions)
Blue: Primary elements (is_primary=true)
Green: Regular interactive elements

Visual Indicators:

Border thickness and opacity scale with importance score
Semi-transparent fill for better visibility
Importance badges showing scores
Star icon for primary elements
Target emoji for the target element
Auto-clear: Overlay automatically disappears after 5 seconds

Use Cases:

Debugging: Visualize what elements Predicate detects on the page
Learning: Understand how importance scoring works
Validation: Verify that critical buttons/links are being detected
Analysis: See which elements rank highest for your use case

# Example: Debug why a button isn't being clicked
from predicate import SnapshotOptions

browser.goto("https://example.com")
snap = snapshot(browser, SnapshotOptions(show_overlay=True))  # See what's detected
time.sleep(6)

ML Reranking (Goal-Based Optimization)

When you provide a goal parameter in SnapshotOptions, the server uses an ONNX-based machine learning model to rerank elements based on relevance to your goal. This dramatically improves element selection accuracy for agent tasks.

ML Rerank Metadata (`snapshot.ml_rerank`)

When ML reranking is enabled, snapshot.ml_rerank provides best-effort metadata about what happened in the server-side rerank pass.

Field	Type	Meaning
`enabled`	`boolean`	Whether ML reranking was enabled for this snapshot.
`applied`	`boolean`	Whether reranking actually ran (may be false if conditions were not met).
`reason`	`string \| null`	Why reranking was applied or skipped (best-effort).
`candidate_count`	`number`	How many elements were considered for reranking.
`top_probability`	`number \| null`	Confidence of the top-ranked element (0..1).
`min_confidence`	`number \| null`	Confidence threshold used (if any).
`is_high_confidence`	`boolean \| null`	True if top probability meets the high-confidence threshold.
`tags`	`string[]`	Internal labels for debugging and analysis.
`error`	`string \| null`	Error message if reranking failed (best-effort).

# Trigger ML reranking by providing a goal
snap = snapshot(browser, SnapshotOptions(
    goal="Click the login button",
    limit=50
))

# Elements are now sorted by ML relevance, not just heuristic importance
for element in snap.elements[:5

When ML fields are present:

When goal is provided in SnapshotOptions
When using agent methods like agent.act() (goals are passed automatically)
When goal is not specified (elements ranked by heuristic importance only)

What the fields mean:

fused_rank_index: Final position after ML + heuristic fusion (0 = most relevant to goal)
heuristic_index: Original position before ML (shows how much ML changed the ranking)
ml_probability: Model's confidence that this element is relevant (0.0-1.0)
ml_score: Raw logit score before softmax (useful for debugging model behavior)

Ordinality & Layout

Query API