Agent Quick Start
NEW in v0.3.0+ - The Agent Abstraction Layer provides natural language automation with 4 levels of control, from simple commands to full conversational AI.
Overview
The Predicate SDK offers multiple levels of abstraction for browser automation:
| Level | Use Case | Code Reduction | Requirements |
|---|---|---|---|
| Level 1: Raw Playwright | Maximum control, edge cases | 0% | LLM API key |
| Level 2: Direct SDK | Precise control, debugging | 80% | Predicate API key |
| Level 3: PredicateAgent | Quick automation, step-by-step | 95% | LLM API key |
| Level 4: ConversationalAgent | Complex tasks, chatbots | 99% | LLM API key |
Quick Tip: Start with Level 3 (PredicateAgent) for most automation tasks. Upgrade to Level 4 (ConversationalAgent) when you need multi-step planning or conversational interfaces.
Level 1: Raw Playwright - Maximum Control
Use Playwright directly with semantic element finding - no LLM required:
from playwright.sync_api import sync_playwright
# Pure Playwright - no Predicate SDK
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page(When to use Level 1:
- Maximum control over every action
- No external API dependencies
- Debugging complex edge cases
- Building reusable libraries
Limitations:
- Requires brittle CSS selectors (breaks when HTML changes)
- No semantic understanding of page elements
- Manual waiting and error handling
- More code to write and maintain
Level 2: Direct SDK - Semantic Queries
Use Predicate SDK for semantic element finding without LLMs:
from predicate import PredicateBrowser, snapshot, find, click, type_text, press
# Predicate SDK - semantic queries, no LLM
with PredicateBrowser(api_key="your_key") as browser:
browser.page.goto("https://amazon.com"Benefits over Level 1:
- Semantic queries instead of CSS selectors (80% less code)
- Importance-based element ranking
- Built-in waiting and error handling
- Works across different page layouts
When to use Level 2:
- Precise control over each action
- Performance-critical applications (minimize LLM calls)
- Debugging specific element interactions
- Building reusable automation libraries
Level 3: PredicateAgent - Natural Language Commands (Recommended)
Use single natural language commands - the agent handles the rest:
from predicate import PredicateBrowser, PredicateAgent
from predicate.llm import OpenAIProvider
# 1. Create browser and LLM provider
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"Level 4: ConversationalAgent - Full Automation (Maximum Convenience)
ONE command does everything - automatic planning and execution:
from predicate import PredicateBrowser, ConversationalAgent
from predicate.llm import OpenAIProvider
# 1. Setup
browser = PredicateBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o"Available LLM Providers
# OpenAI (GPT-4, GPT-4o, etc.)
from predicate.llm import OpenAIProvider
llm = OpenAIProvider(api_key="sk_...", model="gpt-4o")
# Anthropic (Claude)
from predicate.llm import AnthropicProvider
llm = AnthropicProvider(api_keyWhen to Use Each Level
Use Raw Playwright (Level 1) when:
- Maximum control over every action is required
- No external API dependencies allowed
- Debugging complex edge cases
- Performance is absolutely critical (no API calls)
Use Direct SDK (Level 2) when:
- You need precise control over each action
- Debugging specific element interactions
- Building reusable automation libraries
- Performance is critical (minimize LLM calls)
Use PredicateAgent (Level 3) when:
- Quick automation tasks
- Step-by-step natural language commands
- You want to see each action as it happens
- Most common use case
Use ConversationalAgent (Level 4) when:
- Complex multi-step tasks
- Building conversational interfaces
- You want the agent to plan and execute automatically
- Maximum convenience is priority
Cost Comparison
Understanding the cost and complexity tradeoffs between levels:
Lines of Code Comparison
Same task: "Search Amazon for wireless mouse and click first result"
| Level | Lines of Code | Complexity | Credits Used | LLM Tokens |
|---|---|---|---|---|
| Level 1 | ~15 lines | High (CSS selectors) | 0 | 0 |
| Level 2 | ~10 lines | Medium (semantic queries) | ~2-4 | 0 |
| Level 3 | ~5 lines | Low (natural language) | ~2-4 | ~1,500 |
| Level 4 | ~3 lines | Very Low (one command) | ~2-4 | ~2,500 |
Token Cost Analysis (Level 3 vs Level 4)
Level 3: PredicateAgent - Manual step-by-step commands
- 4 actions × ~400 tokens = ~1,600 tokens ($0.006 with GPT-4o)
- You control each step explicitly
- Predictable token usage
Level 4: ConversationalAgent - Automatic planning
- Initial planning: ~800 tokens
- 4 actions × ~400 tokens = ~1,600 tokens
- Response generation: ~200 tokens
- Total: ~2,600 tokens ($0.010 with GPT-4o)
- Agent plans and executes automatically
- Slightly higher token usage for convenience
Credit Cost Breakdown
Predicate API Credits (same across all SDK levels):
- Snapshot with server ranking: 1 credit per call
- Local extension only: 0 credits
- Typical task (4 snapshots): 4 credits ≈ $0.004
LLM Costs (Level 3 & 4 only):
- GPT-4o: ~$0.006-0.010 per task
- Claude Sonnet: ~$0.009-0.015 per task
- Local LLM (Qwen/Llama): $0 (free!)
Total Cost Per Task
| Level | Predicate Credits | LLM Cost | Total Cost |
|---|---|---|---|
| Level 1 | $0 | $0 | $0 |
| Level 2 | $0.004 | $0 | $0.004 |
| Level 3 | $0.004 | $0.006 | $0.010 |
| Level 4 | $0.004 | $0.010 | $0.014 |
| Level 3 (Local LLM) | $0.004 | $0 | $0.004 |
Cost Optimization Tips:
- Use Level 2 for repetitive tasks (no LLM costs)
- Use Level 3 with local LLM for zero LLM costs
- Use
use_api=Falsein snapshots to avoid credit usage (free tier) - Batch similar tasks to minimize LLM context switching
Next Steps
- Tracing & Debugging → - Learn how to debug and monitor agent behavior
- Browser Setup → - Configure browser options
- Examples → - See more agent examples