Docs/SDK/Agent Quick Start

Agent Quick Start

NEW in v0.3.0+ - The Agent Abstraction Layer provides natural language automation with 4 levels of control, from simple commands to full conversational AI.

Overview

The Predicate SDK offers multiple levels of abstraction for browser automation:

LevelUse CaseCode ReductionRequirements
Level 1: Raw Playwright

Maximum control, edge cases

0%

LLM API key

Level 2: Direct SDK

Precise control, debugging

80%

Predicate API key

Level 3: PredicateAgent

Quick automation, step-by-step

95%

LLM API key

Level 4: ConversationalAgent

Complex tasks, chatbots

99%

LLM API key

Quick Tip: Start with Level 3 (PredicateAgent) for most automation tasks. Upgrade to Level 4 (ConversationalAgent) when you need multi-step planning or conversational interfaces.

Level 1: Raw Playwright - Maximum Control

Use Playwright directly with semantic element finding - no LLM required:

from playwright.sync_api import sync_playwright

# Pure Playwright - no Predicate SDK
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page(

When to use Level 1:

Limitations:

Level 2: Direct SDK - Semantic Queries

Use Predicate SDK for semantic element finding without LLMs:

from predicate import PredicateBrowser, snapshot, find, click, type_text, press

# Predicate SDK - semantic queries, no LLM
with PredicateBrowser(api_key="your_key") as browser:
    browser.page.goto("https://amazon.com"

Benefits over Level 1:

When to use Level 2:

Use single natural language commands - the agent handles the rest:

from predicate import PredicateBrowser, PredicateAgent
from predicate.llm import OpenAIProvider

# 1. Create browser and LLM provider
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"

Level 4: ConversationalAgent - Full Automation (Maximum Convenience)

ONE command does everything - automatic planning and execution:

from predicate import PredicateBrowser, ConversationalAgent
from predicate.llm import OpenAIProvider

# 1. Setup
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"

Available LLM Providers

# OpenAI (GPT-4, GPT-4o, etc.)
from predicate.llm import OpenAIProvider
llm = OpenAIProvider(api_key="sk_...", model="gpt-4o")

# Anthropic (Claude)
from predicate.llm import AnthropicProvider
llm = AnthropicProvider(api_key

When to Use Each Level

Use Raw Playwright (Level 1) when:

Use Direct SDK (Level 2) when:

Use PredicateAgent (Level 3) when:

Use ConversationalAgent (Level 4) when:

Cost Comparison

Understanding the cost and complexity tradeoffs between levels:

Lines of Code Comparison

Same task: "Search Amazon for wireless mouse and click first result"

LevelLines of CodeComplexityCredits UsedLLM Tokens
Level 1~15 linesHigh (CSS selectors)00
Level 2~10 linesMedium (semantic queries)~2-40
Level 3~5 linesLow (natural language)~2-4~1,500
Level 4~3 linesVery Low (one command)~2-4~2,500

Token Cost Analysis (Level 3 vs Level 4)

Level 3: PredicateAgent - Manual step-by-step commands

Level 4: ConversationalAgent - Automatic planning

Credit Cost Breakdown

Predicate API Credits (same across all SDK levels):

LLM Costs (Level 3 & 4 only):

Total Cost Per Task

LevelPredicate CreditsLLM CostTotal Cost
Level 1$0$0$0
Level 2$0.004$0$0.004
Level 3$0.004$0.006$0.010
Level 4$0.004$0.010$0.014
Level 3 (Local LLM)$0.004$0$0.004

Cost Optimization Tips:

  1. Use Level 2 for repetitive tasks (no LLM costs)
  2. Use Level 3 with local LLM for zero LLM costs
  3. Use use_api=False in snapshots to avoid credit usage (free tier)
  4. Batch similar tasks to minimize LLM context switching

Next Steps