Docs/SDK/Agent Quick Start

Agent Quick Start

NEW in v0.3.0+ - The Agent Abstraction Layer provides natural language automation with 4 levels of control, from simple commands to full conversational AI.

Overview

The Predicate SDK offers multiple levels of abstraction for browser automation:

Level	Use Case	Code Reduction	Requirements
Level 1: Raw Playwright	Maximum control, edge cases	0%	LLM API key
Level 2: Direct SDK	Precise control, debugging	80%	Predicate API key
Level 3: PredicateAgent	Quick automation, step-by-step	95%	LLM API key
Level 4: ConversationalAgent	Complex tasks, chatbots	99%	LLM API key

Quick Tip: Start with Level 3 (PredicateAgent) for most automation tasks. Upgrade to Level 4 (ConversationalAgent) when you need multi-step planning or conversational interfaces.

Level 1: Raw Playwright - Maximum Control

Use Playwright directly with semantic element finding - no LLM required:

from playwright.sync_api import sync_playwright

# Pure Playwright - no Predicate SDK
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page(

When to use Level 1:

Maximum control over every action
No external API dependencies
Debugging complex edge cases
Building reusable libraries

Limitations:

Requires brittle CSS selectors (breaks when HTML changes)
No semantic understanding of page elements
Manual waiting and error handling
More code to write and maintain

Level 2: Direct SDK - Semantic Queries

Use Predicate SDK for semantic element finding without LLMs:

from predicate import PredicateBrowser, snapshot, find, click, type_text, press

# Predicate SDK - semantic queries, no LLM
with PredicateBrowser(api_key="your_key") as browser:
    browser.page.goto("https://amazon.com"

Benefits over Level 1:

Semantic queries instead of CSS selectors (80% less code)
Importance-based element ranking
Built-in waiting and error handling
Works across different page layouts

When to use Level 2:

Precise control over each action
Performance-critical applications (minimize LLM calls)
Debugging specific element interactions
Building reusable automation libraries

Level 3: PredicateAgent - Natural Language Commands (Recommended)

Use single natural language commands - the agent handles the rest:

from predicate import PredicateBrowser, PredicateAgent
from predicate.llm import OpenAIProvider

# 1. Create browser and LLM provider
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"

Level 4: ConversationalAgent - Full Automation (Maximum Convenience)

ONE command does everything - automatic planning and execution:

from predicate import PredicateBrowser, ConversationalAgent
from predicate.llm import OpenAIProvider

# 1. Setup
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"

Available LLM Providers

# OpenAI (GPT-4, GPT-4o, etc.)
from predicate.llm import OpenAIProvider
llm = OpenAIProvider(api_key="sk_...", model="gpt-4o")

# Anthropic (Claude)
from predicate.llm import AnthropicProvider
llm = AnthropicProvider(api_key

When to Use Each Level

Use Raw Playwright (Level 1) when:

Maximum control over every action is required
No external API dependencies allowed
Debugging complex edge cases
Performance is absolutely critical (no API calls)

Use Direct SDK (Level 2) when:

You need precise control over each action
Debugging specific element interactions
Building reusable automation libraries
Performance is critical (minimize LLM calls)

Use PredicateAgent (Level 3) when:

Quick automation tasks
Step-by-step natural language commands
You want to see each action as it happens
Most common use case

Use ConversationalAgent (Level 4) when:

Complex multi-step tasks
Building conversational interfaces
You want the agent to plan and execute automatically
Maximum convenience is priority

Cost Comparison

Understanding the cost and complexity tradeoffs between levels:

Lines of Code Comparison

Same task: "Search Amazon for wireless mouse and click first result"

Level	Lines of Code	Complexity	Credits Used	LLM Tokens
Level 1	~15 lines	High (CSS selectors)	0	0
Level 2	~10 lines	Medium (semantic queries)	~2-4	0
Level 3	~5 lines	Low (natural language)	~2-4	~1,500
Level 4	~3 lines	Very Low (one command)	~2-4	~2,500

Token Cost Analysis (Level 3 vs Level 4)

Level 3: PredicateAgent - Manual step-by-step commands

4 actions × ~400 tokens = ~1,600 tokens ($0.006 with GPT-4o)
You control each step explicitly
Predictable token usage

Level 4: ConversationalAgent - Automatic planning

Initial planning: ~800 tokens
4 actions × ~400 tokens = ~1,600 tokens
Response generation: ~200 tokens
Total: ~2,600 tokens ($0.010 with GPT-4o)
Agent plans and executes automatically
Slightly higher token usage for convenience

Credit Cost Breakdown

Predicate API Credits (same across all SDK levels):

Snapshot with server ranking: 1 credit per call
Local extension only: 0 credits
Typical task (4 snapshots): 4 credits ≈ $0.004

LLM Costs (Level 3 & 4 only):

GPT-4o: ~$0.006-0.010 per task
Claude Sonnet: ~$0.009-0.015 per task
Local LLM (Qwen/Llama): $0 (free!)

Total Cost Per Task

Level	Predicate Credits	LLM Cost	Total Cost
Level 1	$0	$0	$0
Level 2	$0.004	$0	$0.004
Level 3	$0.004	$0.006	$0.010
Level 4	$0.004	$0.010	$0.014
Level 3 (Local LLM)	$0.004	$0	$0.004

Cost Optimization Tips:

Use Level 2 for repetitive tasks (no LLM costs)
Use Level 3 with local LLM for zero LLM costs
Use use_api=False in snapshots to avoid credit usage (free tier)
Batch similar tasks to minimize LLM context switching

Next Steps

Tracing & Debugging → - Learn how to debug and monitor agent behavior
Browser Setup → - Configure browser options
Examples → - See more agent examples

Quick Start (5 min)

Browser Setup