Docs/SDK/Snapshot API

Snapshot API

The snapshot() function captures the current rendered page state and returns a ranked, token-bounded set of interactive elements, plus runtime signals you can use for Jest-style verification: layout/ordinality (including dominant_group_key), modal/overlay detection (modal_detected, modal_grids), and diagnostics (stability/confidence, reason codes, and best-effort CAPTCHA / “requires vision” signals).

Basic Usage

from predicate import snapshot, SnapshotOptions, SnapshotFilter

# Basic snapshot (uses default options)
snap = snapshot(

Important Notes

Credit Consumption:

  • When api_key is provided, this calls the server-side /v1/snapshot endpoint which consumes 1 credit per call (metered billing).
  • Use use_api=False for local processing (no credits; ranking is best-effort/local-only).

Gateway timeout (API mode):

  • When use_api=true, the SDK sends raw_elements to the Gateway (POST /v1/snapshot) and waits for a refined response.
  • By default, the SDK uses a 30s timeout for this Gateway round trip.
  • If you see client-side timeouts on large/heavy pages (e.g. ReadTimeout), increase the Gateway timeout:
    • Python: SnapshotOptions.gateway_timeout_s (seconds)
    • TypeScript: SnapshotOptions.gatewayTimeoutMs (milliseconds)

Payload Size Limit:

  • The snapshot payload sent to the server is capped at 10MB to ensure reliable API performance.
  • If your page has more elements than fit in 10MB, use the limit option to reduce the number of elements, or use use_api=False for local processing (no size limit).

Screenshots:

  • If you pass screenshot: true, the screenshot is captured locally by the extension.
  • Even in use_api=true mode, the SDK does not receive screenshots from the server; it merges the server-ranked elements with the locally captured screenshot.

Parameters

Python:

  • browser (PredicateBrowser): Browser instance
  • options (SnapshotOptions, optional): Snapshot configuration options

SnapshotOptions fields:

  • screenshot (bool | ScreenshotConfig, optional): Capture screenshot. True for PNG, or {"format": "jpeg", "quality": 80}. Default: False.
  • limit (int, optional): Maximum number of elements to return. Default: 50. Range: 1-500 (SDK). In API mode, the server caps this value (default cap: 100).
  • filter (SnapshotFilter | dict, optional): Filter options:
    • min_area: Minimum element area in pixels
    • allowed_roles: List of roles to include (e.g., ["button", "link"])
    • min_z_index: Minimum z-index value
  • use_api (bool, optional): Force server API (True) or local extension (False). Auto-detects if None.
  • gateway_timeout_s (float, optional): Gateway snapshot timeout in seconds (only relevant when use_api=true). Default: 30.
  • show_overlay (bool, optional): Display visual overlay in browser highlighting detected elements. Default: False.
  • goal (str, optional): Optional goal/task description for ML reranking.

TypeScript:

  • browser (PredicateBrowser): Browser instance
  • options (object, optional):
    • screenshot (boolean | object): Capture screenshot
    • limit (number): Maximum elements to return
    • filter (object): Filter options
    • use_api (boolean): Force server API or local extension
    • gatewayTimeoutMs (number): Gateway snapshot timeout in milliseconds (only relevant when use_api=true). Default: 30000.
    • show_overlay (boolean): Display visual overlay (default: false)
    • goal (string, optional): Optional goal/task description for ML reranking

Example: increase Gateway timeout

from predicate import snapshot, SnapshotOptions

# Large pages can take longer to refine server-side.
snap = snapshot(
    browser,
    SnapshotOptions(use_api=True, gateway_timeout_s=60),
)

Returns

Snapshot object with:

  • elements: List of Element objects (sorted by importance)
  • url: Current page URL
  • viewport: Viewport dimensions
  • timestamp: Snapshot timestamp
  • screenshot: Base64-encoded image (if requested)
  • dominant_group_key: Geometric group key for the main content area (may be null)
  • diagnostics: Stability/debug diagnostics (may be null)
  • modal_detected: True if a modal/overlay grid was detected (may be null)
  • modal_grids: Detected modal grids (may be null)
  • ml_rerank: ML reranking metadata (may be null)

Diagnostics (snapshot.diagnostics)

diagnostics is best-effort runtime evidence about page stability and “how trustworthy the snapshot is right now”.

Use it to:

  • decide whether to retry (.eventually() / bounded retries)
  • explain failures (add the reason codes to artifacts/logs)
  • detect non-DOM blockers (CAPTCHA signals)
  • decide whether to escalate to a different executor when structure is insufficient

Diagnostics Fields

FieldTypeHow to use it
confidencenumber | null

A 0..1 stability score. Low confidence typically means the page is still moving (navigation, hydration, modals, DOM churn). Use it as a signal to retry snapshots before acting.

reasonsstring[]

Machine-readable reason codes explaining low confidence. Log these and include them in artifacts—this is often the fastest way to debug flaky runs.

metricsobject | null

Best-effort browser-side metrics used to compute confidence. Useful for diagnosing “why was this unstable?” and for telemetry dashboards.

captchaobject | null

Detection-only CAPTCHA signal (no solving). Use it to branch to your CAPTCHA handling strategy or fail fast with a clear reason.

requires_visionboolean | null

Best-effort recommendation that structure may be insufficient for this page state (e.g., heavy canvas / non-semantic UI). Use it as an escalation signal.

requires_vision_reasonstring | null

Human-readable explanation for why structure is likely insufficient. Include it in traces/artifacts to make failures explainable.

Diagnostics Metrics (diagnostics.metrics)

MetricMeaning
ready_stateDocument readyState (e.g., "loading", "interactive", "complete").
quiet_msHow long the page has been “quiet” (no major DOM churn), in milliseconds (best-effort).
node_countApproximate DOM node count (best-effort). Useful for “page exploded” diagnostics.
interactive_countHow many interactive candidates were detected (best-effort).
raw_elements_countHow many raw elements were captured before filtering (best-effort).

CAPTCHA Diagnostics (diagnostics.captcha)

CAPTCHA diagnostics are detection-only signals:

FieldMeaning
detectedTrue if a CAPTCHA-like pattern was detected.
provider_hintBest-effort provider hint (may be null).
confidence0..1 confidence of detection.
evidenceBest-effort evidence hits (text/selector/iframe/url) to make detections explainable.

Element Properties

Each element in snapshot.elements has the following properties:

Core Properties

PropertyTypeDescription
idintUnique identifier for clicking/interacting
rolestrSemantic role (button, link, textbox, heading, etc.)
textstr | NoneVisible text content
importanceintAI importance score (0-1000, higher = more important)
bboxBBoxBounding box with x, y, width, height
visual_cuesVisualCuesVisual analysis (is_primary, is_clickable, background_color_name)
in_viewportboolWhether element is visible in current viewport
is_occludedboolWhether element is covered by another element
z_indexintCSS z-index value (default: 0)

ML Reranking Properties (Optional)

These fields are present when goal is provided in SnapshotOptions:

PropertyTypeDescription
fused_rank_indexint | None0-based rank after sorting by importance_fused
heuristic_indexint | None0-based rank before ML reranking (original heuristic position)
ml_probabilityfloat | NoneConfidence score from ONNX model (0.0 - 1.0)
ml_scorefloat | NoneRaw logit score from ONNX model (for debugging)

Ordinal / Layout Properties (Optional)

These fields support position-based selection ("first result", "top item"):

PropertyTypeDescription
center_xfloat | NoneX coordinate of element center (viewport coords)
center_yfloat | NoneY coordinate of element center (viewport coords)
doc_yfloat | NoneY coordinate in document (center_y + scroll_y)
group_keystr | NoneGeometric bucket key for ordinal grouping
group_indexint | NonePosition within group (0-indexed, sorted by doc_y)
in_dominant_groupbool | NoneWhether element is in the dominant group (main content area)

State-Aware Assertion Properties (Optional)

These fields enable Jest-style assertions for form controls:

PropertyTypeDescription
namestr | NoneAccessible name/label for controls (distinct from visible text)
valuestr | NoneCurrent value for inputs/textarea/select (may be redacted for PII)
input_typestr | NoneInput type (e.g., "text", "email", "password")
value_redactedbool | NoneWhether value was redacted for privacy (password/email/tel)
checkedbool | NoneNormalized checked state for checkboxes/radios
disabledbool | NoneNormalized disabled state
expandedbool | NoneNormalized expanded state for dropdowns/accordions
aria_checkedstr | NoneRaw ARIA checked string (tri-state: "true"/"false"/"mixed")
aria_disabledstr | NoneRaw ARIA disabled string
aria_expandedstr | NoneRaw ARIA expanded string

Additional Properties (Optional)

PropertyTypeDescription
hrefstr | NoneHyperlink URL (for link elements)
nearby_textstr | NoneNearby static text (best-effort, usually for top-ranked elements)
diff_statusstr | NoneDiff status: "ADDED", "REMOVED", "MODIFIED", "MOVED" (for diff overlay)

Visual Overlay Feature

When show_overlay=True, Predicate displays a visual overlay in the browser highlighting all detected elements:

Color Coding:

  • Red: Target element (when specified in agent actions)
  • Blue: Primary elements (is_primary=true)
  • Green: Regular interactive elements

Visual Indicators:

  • Border thickness and opacity scale with importance score
  • Semi-transparent fill for better visibility
  • Importance badges showing scores
  • Star icon for primary elements
  • Target emoji for the target element
  • Auto-clear: Overlay automatically disappears after 5 seconds

Use Cases:

  • Debugging: Visualize what elements Predicate detects on the page
  • Learning: Understand how importance scoring works
  • Validation: Verify that critical buttons/links are being detected
  • Analysis: See which elements rank highest for your use case
# Example: Debug why a button isn't being clicked
from predicate import SnapshotOptions

browser.goto("https://example.com")
snap = snapshot(browser, SnapshotOptions(show_overlay=True))  # See what's detected
time.sleep(6)

ML Reranking (Goal-Based Optimization)

When you provide a goal parameter in SnapshotOptions, the server uses an ONNX-based machine learning model to rerank elements based on relevance to your goal. This dramatically improves element selection accuracy for agent tasks.

ML Rerank Metadata (snapshot.ml_rerank)

When ML reranking is enabled, snapshot.ml_rerank provides best-effort metadata about what happened in the server-side rerank pass.

FieldTypeMeaning
enabledbooleanWhether ML reranking was enabled for this snapshot.
appliedbooleanWhether reranking actually ran (may be false if conditions were not met).
reasonstring | nullWhy reranking was applied or skipped (best-effort).
candidate_countnumberHow many elements were considered for reranking.
top_probabilitynumber | nullConfidence of the top-ranked element (0..1).
min_confidencenumber | nullConfidence threshold used (if any).
is_high_confidenceboolean | nullTrue if top probability meets the high-confidence threshold.
tagsstring[]Internal labels for debugging and analysis.
errorstring | nullError message if reranking failed (best-effort).
# Trigger ML reranking by providing a goal
snap = snapshot(browser, SnapshotOptions(
    goal="Click the login button",
    limit=50
))

# Elements are now sorted by ML relevance, not just heuristic importance
for element in snap.elements[:5

When ML fields are present:

  • When goal is provided in SnapshotOptions
  • When using agent methods like agent.act() (goals are passed automatically)
  • When goal is not specified (elements ranked by heuristic importance only)

What the fields mean:

  • fused_rank_index: Final position after ML + heuristic fusion (0 = most relevant to goal)
  • heuristic_index: Original position before ML (shows how much ML changed the ranking)
  • ml_probability: Model's confidence that this element is relevant (0.0-1.0)
  • ml_score: Raw logit score before softmax (useful for debugging model behavior)