Docs/SDK/Agent Quick Start

Agent Quick Start

NEW in v0.3.0+ - The Agent Abstraction Layer provides natural language automation with 4 levels of control, from simple commands to full conversational AI.

Overview

The Predicate SDK offers multiple levels of abstraction for browser automation:

LevelUse CaseCode ReductionRequirements
Level 1: Raw Playwright

Maximum control, edge cases

0%

LLM API key

Level 2: Direct SDK

Precise control, debugging

80%

Predicate API key

Level 3: PredicateAgent

Quick automation, step-by-step

95%

LLM API key

Level 4: ConversationalAgent

Complex tasks, chatbots

99%

LLM API key

Quick Tip: Start with Level 3 (PredicateAgent) for most automation tasks. Upgrade to Level 4 (ConversationalAgent) when you need multi-step planning or conversational interfaces.

Level 1: Raw Playwright - Maximum Control

Use Playwright directly with semantic element finding - no LLM required:

from playwright.sync_api import sync_playwright

# Pure Playwright - no Predicate SDK
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page(

When to use Level 1:

  • Maximum control over every action
  • No external API dependencies
  • Debugging complex edge cases
  • Building reusable libraries

Limitations:

  • Requires brittle CSS selectors (breaks when HTML changes)
  • No semantic understanding of page elements
  • Manual waiting and error handling
  • More code to write and maintain

Level 2: Direct SDK - Semantic Queries

Use Predicate SDK for semantic element finding without LLMs:

from predicate import PredicateBrowser, snapshot, find, click, type_text, press

# Predicate SDK - semantic queries, no LLM
with PredicateBrowser(api_key="your_key") as browser:
    browser.page.goto("https://amazon.com"

Benefits over Level 1:

  • Semantic queries instead of CSS selectors (80% less code)
  • Importance-based element ranking
  • Built-in waiting and error handling
  • Works across different page layouts

When to use Level 2:

  • Precise control over each action
  • Performance-critical applications (minimize LLM calls)
  • Debugging specific element interactions
  • Building reusable automation libraries

Use single natural language commands - the agent handles the rest:

from predicate import PredicateBrowser, PredicateAgent
from predicate.llm import OpenAIProvider

# 1. Create browser and LLM provider
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"

Level 4: ConversationalAgent - Full Automation (Maximum Convenience)

ONE command does everything - automatic planning and execution:

from predicate import PredicateBrowser, ConversationalAgent
from predicate.llm import OpenAIProvider

# 1. Setup
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"

Available LLM Providers

# OpenAI (GPT-4, GPT-4o, etc.)
from predicate.llm import OpenAIProvider
llm = OpenAIProvider(api_key="sk_...", model="gpt-4o")

# Anthropic (Claude)
from predicate.llm import AnthropicProvider
llm = AnthropicProvider(api_key

When to Use Each Level

Use Raw Playwright (Level 1) when:

  • Maximum control over every action is required
  • No external API dependencies allowed
  • Debugging complex edge cases
  • Performance is absolutely critical (no API calls)

Use Direct SDK (Level 2) when:

  • You need precise control over each action
  • Debugging specific element interactions
  • Building reusable automation libraries
  • Performance is critical (minimize LLM calls)

Use PredicateAgent (Level 3) when:

  • Quick automation tasks
  • Step-by-step natural language commands
  • You want to see each action as it happens
  • Most common use case

Use ConversationalAgent (Level 4) when:

  • Complex multi-step tasks
  • Building conversational interfaces
  • You want the agent to plan and execute automatically
  • Maximum convenience is priority

Cost Comparison

Understanding the cost and complexity tradeoffs between levels:

Lines of Code Comparison

Same task: "Search Amazon for wireless mouse and click first result"

LevelLines of CodeComplexityCredits UsedLLM Tokens
Level 1~15 linesHigh (CSS selectors)00
Level 2~10 linesMedium (semantic queries)~2-40
Level 3~5 linesLow (natural language)~2-4~1,500
Level 4~3 linesVery Low (one command)~2-4~2,500

Token Cost Analysis (Level 3 vs Level 4)

Level 3: PredicateAgent - Manual step-by-step commands

  • 4 actions × ~400 tokens = ~1,600 tokens ($0.006 with GPT-4o)
  • You control each step explicitly
  • Predictable token usage

Level 4: ConversationalAgent - Automatic planning

  • Initial planning: ~800 tokens
  • 4 actions × ~400 tokens = ~1,600 tokens
  • Response generation: ~200 tokens
  • Total: ~2,600 tokens ($0.010 with GPT-4o)
  • Agent plans and executes automatically
  • Slightly higher token usage for convenience

Credit Cost Breakdown

Predicate API Credits (same across all SDK levels):

  • Snapshot with server ranking: 1 credit per call
  • Local extension only: 0 credits
  • Typical task (4 snapshots): 4 credits ≈ $0.004

LLM Costs (Level 3 & 4 only):

  • GPT-4o: ~$0.006-0.010 per task
  • Claude Sonnet: ~$0.009-0.015 per task
  • Local LLM (Qwen/Llama): $0 (free!)

Total Cost Per Task

LevelPredicate CreditsLLM CostTotal Cost
Level 1$0$0$0
Level 2$0.004$0$0.004
Level 3$0.004$0.006$0.010
Level 4$0.004$0.010$0.014
Level 3 (Local LLM)$0.004$0$0.004

Cost Optimization Tips:

  1. Use Level 2 for repetitive tasks (no LLM costs)
  2. Use Level 3 with local LLM for zero LLM costs
  3. Use use_api=False in snapshots to avoid credit usage (free tier)
  4. Batch similar tasks to minimize LLM context switching

Next Steps