Building Trustworthy AI Agents: Demo with Complete Agent Loop - Pre-Execution Authorization and Post-Execution Deterministic Verification
Deep dive into Predicate Secure - a drop-in security wrapper that adds pre-execution authorization and post-execution verification to AI browser agents using local LLM verification.
AI agents are powerful, but how do you ensure they don't go rogue? Today we're releasing Predicate Secure - a drop-in security wrapper that adds enterprise-grade authorization and verification to browser automation agents. Think of it as a safety harness for your AI agents.
📦 Open Source: The complete demo is available on GitHub at PredicateSystems/predicate-secure (see the demo/ folder). Get started in 5 minutes with local LLM verification.
Predicate Secure integrates with your existing AI agent frameworks in just 3-5 lines of code - including browser-use, LangChain, PydanticAI, raw Playwright, and OpenClaw. This frictionless adoption means you can add robust security without rewriting your agents.
This post walks through our comprehensive demo that showcases the complete agent security loop: pre-execution authorization, browser automation, and post-execution verification using local LLMs.
The Challenge: Trustworthy Agent Automation
When AI agents interact with browsers and web services, they need guardrails. A misconfigured prompt or unexpected model behavior could lead to:
- Navigating to unauthorized domains
- Clicking sensitive buttons or forms
- Exposing credentials or API keys
- Performing unauthorized actions (e.g., deleting all emails)
- Executing actions outside policy boundaries
Traditional approaches rely on prompt engineering or hope for the best. Predicate Secure takes a different approach: enforce policy before execution, verify outcomes after.
The Solution: Complete Deterministic Agent Loop
Predicate Secure implements a complete three-phase agent loop that combines:
- Pre-execution authorization - Deterministic policy-based decisions
- Action execution - Controlled browser automation
- Post-execution verification - Deterministic assertion checking
This is not a probabilistic safety approach. Every action is governed by explicit policy rules (deterministic authorization) and validated against concrete predicates (deterministic verification). The LLM's role is constrained to generating verification predicates based on observed state changes - the actual verification execution is deterministic.
Pre-Execution Authorization
Policy-based decision: Is this action allowed?
Action Execution
Browser automation with snapshot capture
Post-Execution Verification
LLM-generated assertions validate outcomes
Demo Architecture
The demo showcases a complete end-to-end implementation with:
0
External Dependencies
100%
Offline Capable
Free
Local LLM Verification
Core Components
1. Predicate Runtime SDK (predicate-runtime==1.1.2)
- Browser automation via AsyncPredicateBrowser
- Semantic element detection with
find()DSL - Visual overlay for element highlighting
- Automatic Chrome extension injection
2. Predicate Authority (predicate-authority>=0.1.0)
- YAML-based policy enforcement
- Fail-closed authorization (deny by default)
- Optional Rust-based sidecar for production
- Flexible identity: Local IdP, Okta, Entra ID (Azure AD), OIDC
3. Local LLM Verification (Qwen 2.5 7B Instruct)
- Generates verification predicates from page state changes
- Runs completely offline on Apple Silicon (MPS)
- ~14GB model, 5-second cold start after initial download
4. Cloud Tracing (Optional)
- Upload authorization and verification events to Predicate Studio
- Visualize execution timeline in web UI
- Track decisions across agent runs
Frictionless Framework Integration
Predicate Secure wraps your existing agent code in 3-5 lines - no rewrites needed:
| Framework | Adapter | Integration Effort |
|---|---|---|
| browser-use | BrowserUseAdapter | 3 lines |
| LangChain | SentienceLangChainCore | 4 lines |
| PydanticAI | predicate.integrations.pydanticai | 3 lines |
| Raw Playwright | AgentRuntime.from_playwright_page() | 5 lines |
| OpenClaw | OpenClawAdapter | 3 lines |
All adapters are production-ready and maintained in the predicate-runtime SDK. Drop-in security for any agent framework.
What the Demo Does
The demo executes a simple but complete browser task:
- Navigate to https://www.example.com with policy check
- Take snapshot with visual element overlay
- Find and click "Learn more" link using semantic query
- Verify URL contains "example-domains" after navigation
- Upload trace to Predicate Studio (if API key provided)
Each action goes through the full authorization + verification loop.
Code Walkthrough
1. Semantic Element Finding
Instead of brittle CSS selectors, we use semantic queries:
1from predicate import find
2
3# Find link by semantic properties, not CSS
4element = find(snapshot, "role=link text~'Learn more'")
5
6if element:
7 print(f"Found: {element.text} (ID: {element.id})")
8 print(f"Clickable: {element.visual_cues.is_clickable}")
9 await click_element(element)The find() function understands:
- ARIA roles (
role=link,role=button) - Text content matching (
text~'substring') - Visual cues (clickability, visibility)
- Element importance ranking
2. Authorization Policy
Authorization rules are declarative YAML:
1# Allow navigation to safe domains
2- name: allow-navigation-safe-domains
3effect: ALLOW
4principals:
5 - "agent:demo-browser"
6actions:
7 - "browser.navigate"
8resources:
9 - "https://www.example.com*"
10 - "https://www.google.com*"
11conditions:
12 required_labels:
13 - "browser_initialized"
14
15# Allow clicks on safe element types
16- name: allow-browser-click-safe-elements
17effect: ALLOW
18principals:
19 - "agent:demo-browser"
20actions:
21 - "browser.click"
22resources:
23 - "element:role=link[*"
24 - "element:role=button[*"
25 - "element#*" # By snapshot ID
26conditions:
27 required_labels:
28 - "element_visible"
29 - "snapshot_captured"
30
31# Default deny (fail-closed)
32- name: default-deny
33effect: DENY
34principals:
35 - "*"
36actions:
37 - "*"
38resources:
39 - "*"The policy is fail-closed: any action not explicitly allowed is denied. This prevents agents from taking unexpected actions.
3. LLM-Generated Verification Predicates
After each action, the local LLM analyzes the state changes and generates deterministic verification predicates (assertions to check):
Important: The LLM is NOT doing visual verification. Instead, it generates structured assertions (like url_contains, element_exists) based on observed state changes. The actual verification execution is deterministic - predicates are evaluated as true/false checks.
1# Capture pre and post snapshots
2pre_snapshot = await get_page_summary()
3result = await execute_action()
4post_snapshot = await get_page_summary()
5
6# LLM generates verification plan (what to check, not the check itself)
7verification_plan = verifier.generate_verification_plan(
8 action="click",
9 action_target="element#6",
10 pre_snapshot_summary=pre_snapshot,
11 post_snapshot_summary=post_snapshot,
12 context={"task": "Find and click Learn more link"}
13)
14
15# Execute generated predicates deterministically
16for verification in verification_plan.verifications:
17 passed = execute_predicate(
18 verification.predicate, # e.g., "url_contains"
19 verification.args # e.g., ["example-domains"]
20 )
21
22 if not passed:
23 raise AssertionError("Post-execution verification failed")The LLM sees both snapshots and generates a structured verification plan:
1{
2"verifications": [
3 {
4 "predicate": "url_contains",
5 "args": ["example-domains"]
6 },
7 {
8 "predicate": "snapshot_changed",
9 "args": []
10 }
11],
12"reasoning": "Verify navigation by checking URL change and snapshot difference."
13}For Production Workflows:
For well-understood web flows (like QA testing flows or regular business processes), you can skip LLM generation and use human-defined predicates directly:
1# Predefined verification for known workflows
2verification_plan = VerificationPlan(
3 action="click",
4 verifications=[
5 VerificationSpec(predicate="url_contains", args=["example-domains"]),
6 VerificationSpec(predicate="element_exists", args=["h1"]),
7 VerificationSpec(predicate="snapshot_changed", args=[]),
8 ],
9 reasoning="Predefined checks for 'Learn more' click flow",
10)
11
12# Execute the same way - deterministic evaluation
13all_passed = execute_verifications(verification_plan)This approach is faster (no LLM inference), more predictable (explicit assertions), and ideal for regression testing of known workflows. Use LLM-generated predicates for exploratory tasks or novel scenarios.
4. Visual Element Overlay
Enable visual debugging with snapshot overlays:
1from predicate.snapshot import snapshot_async
2from predicate.models import SnapshotOptions
3
4snap = await snapshot_async(
5 browser,
6 SnapshotOptions(
7 show_overlay=True, # Highlights detected elements in browser
8 screenshot=False,
9 ),
10)
11
12print(f"Captured {len(snap.elements)} elements")
13# Watch the browser - you'll see colored boxes around detected elements!This is invaluable for debugging why an agent can't find an element.
Real Demo Output
Here's what the demo produces when run:
1╭──────────────── Demo Configuration ─────────────────╮
2│ Predicate Secure Browser Automation Demo │
3│ Task: Navigate to example.com and verify page loads │
4│ Start URL: https://www.example.com │
5│ Principal: agent:demo-browser │
6╰─────────────────────────────────────────────────────╯
7
8Initializing Local LLM Verifier...
9⠋ Loading Qwen 2.5 7B model...
10✓ Verifier initialized
11
12Initializing Cloud Tracer...
13☁️ Cloud tracing enabled (Pro tier)
14✓ Cloud tracer initialized
15Run ID: 777c0308-82c8-454d-98df-5a603d12d418
16View trace: https://studio.predicatesystems.dev/runs/...
17
18Step 1: Initializing Browser...
19✓ Browser started
20
21Step 2: Executing Browser Task...
22
23→ Action: navigate (https://www.example.com)
24Pre-execution: Checking authorization...
25✓ Action authorized
26Executing action...
27✓ Action executed
28Post-execution: Generating verification plan...
29i Generated 1 verifications
30 Reasoning: Fallback: verify URL changed after navigation
31Executing verifications...
32 [1] url_changed()
33 ✓ Passed
34✓ All verifications passed
35
36→ Action: snapshot (current_page)
37Pre-execution: Checking authorization...
38✓ Action authorized
39Executing action...
40 Snapshot captured: 2 elements
41 (Watch the browser - elements are highlighted!)
42✓ Action executed
43Post-execution: Generating verification plan...
44i Generated 1 verifications
45 Reasoning: Verify page load by checking URL contains domain.
46Executing verifications...
47 [1] url_contains(example.com)
48 ✓ Passed
49✓ All verifications passed
50
51→ Finding link with text: 'Learn more'
52✓ Found element: Learn more (ID: 6)
53 Role: link, Clickable: True
54
55→ Action: click (element#6)
56Pre-execution: Checking authorization...
57✓ Action authorized
58Executing action...
59 Clicked at coordinates: (256.0, 198.078125)
60✓ Action executed
61Post-execution: Generating verification plan...
62i Generated 2 verifications
63 Reasoning: Verify navigation and page load.
64Executing verifications...
65 [1] url_contains(example.com)
66 ✓ Passed
67 [2] snapshot_changed()
68 ✓ Passed
69✓ All verifications passed
70
71✓ Task completed successfully
72
73Cleaning up...
74✓ Browser closed
75Uploading trace to Predicate Studio...
76✅ Trace uploaded successfully
77View in Studio: https://studio.predicatesystems.dev/runs/...Setup Instructions
Prerequisites
- Python 3.11+ (Python 3.11.9 recommended)
- 16GB+ RAM (for 7B model) or 8GB+ (for 3B model)
- Apple Silicon Mac (MPS support) or CUDA GPU
- 10GB disk space for model files
Installation (5 minutes)
1# Clone repository
2cd /path/to/Sentience/predicate-secure/py-predicate-secure
3
4# Install SDK
5pip install -e .
6
7# Install demo dependencies
8cd demo
9pip install -r requirements.txt
10
11# Install Playwright browsers
12playwright install chromiumConfiguration
Create a .env file in the demo directory:
1# Browser display (false = show browser)
2BROWSER_HEADLESS=false
3
4# LLM model for verification
5LLM_MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
6LLM_DEVICE=auto # Automatically detects MPS/CUDA/CPU
7LLM_MAX_TOKENS=512
8LLM_TEMPERATURE=0.0
9
10# Optional: Predicate API key for cloud tracing
11# PREDICATE_API_KEY=your-api-key-here
12
13# Demo configuration
14DEMO_START_URL=https://www.example.com
15DEMO_TASK_DESCRIPTION=Navigate to example.com and verify page loads
16DEMO_PRINCIPAL_ID=agent:demo-browserThe demo works completely offline (after initial model download). No API key required!
Running the Demo
1# Simple mode with in-process authorization
2python secure_browser_demo.py
3
4# First run: Model downloads automatically (~14GB, 2-5 minutes)
5# Subsequent runs: Fast startup (~5 seconds)Performance Characteristics
Based on real demo runs on Apple Silicon (M-series):
| Metric | Value | Notes |
|---|---|---|
| Model Load Time | ~5 seconds | After initial download |
| LLM Inference Time | ~3-5 seconds | Per verification plan generation |
| Snapshot Capture | ~1 second | With API or local extension |
| Authorization Check | <1ms | In-process policy evaluation |
| Total Action Loop | ~5-10 seconds | Including verification |
| Memory Usage | ~8GB | 7B model on MPS |
Production Deployment
Sidecar Mode
For production, use the Rust-based predicate-authorityd sidecar. The sidecar is optional but recommended for enterprise deployments.
Option 1: Local IdP (Demo/Testing)
1# Start sidecar with local IdP mode
2export LOCAL_IDP_SIGNING_KEY="your-production-secret-key"
3
4predicate-authorityd run \
5--host 127.0.0.1 \
6--port 8787 \
7--mode local_only \
8--policy-file policies/browser_automation.yaml \
9--identity-mode local-idp \
10--local-idp-issuer "http://localhost/predicate-local-idp" \
11--local-idp-audience "api://predicate-authority"
12
13# Verify sidecar is running
14curl http://127.0.0.1:8787/healthOption 2: Bring Your Own IdP (Enterprise)
The sidecar integrates with your existing identity provider:
Okta:
1predicate-authorityd run \
2--identity-mode oidc \
3--oidc-issuer https://your-domain.okta.com \
4--oidc-client-id <client-id> \
5--oidc-client-secret <secret> \
6--policy-file policies/browser_automation.yamlEntra ID (Azure AD):
1predicate-authorityd run \
2--identity-mode entra \
3--entra-tenant-id <tenant-id> \
4--entra-client-id <client-id> \
5--entra-client-secret <secret> \
6--policy-file policies/browser_automation.yamlGeneric OIDC:
1predicate-authorityd run \
2--identity-mode oidc \
3--oidc-issuer https://your-idp.com \
4--oidc-client-id <client-id> \
5--oidc-client-secret <secret> \
6--policy-file policies/browser_automation.yamlBenefits of sidecar mode:
- Centralized authorization across multiple agents
- Production-grade audit logging
- Hot-reload policy changes without agent restart
- Fleet management and monitoring
- Higher performance (Rust vs Python)
- Enterprise identity integration (Okta, Entra ID, OIDC)
Cloud-Connected Mode
For enterprise deployments with Predicate Cloud:
1export PREDICATE_API_KEY="your-api-key"
2
3predicate-authorityd run \
4--mode cloud_connected \
5--control-plane-url https://api.predicatesystems.dev \
6--tenant-id your-tenant \
7--project-id your-project \
8--predicate-api-key $PREDICATE_API_KEYThis enables:
- Centralized policy management
- Real-time monitoring dashboard
- Historical audit trails
- Team collaboration on policies
Key Takeaways
1. Defense in Depth
Don't rely on prompt engineering alone. Use policy-based authorization + LLM verification for robust safety.
2. Local LLMs Are Viable
Qwen 2.5 7B provides sufficient reasoning for verification predicates while running completely offline on consumer hardware.
3. Semantic Queries Beat CSS
The find() DSL with role-based and text-based matching is more resilient than brittle CSS selectors.
4. Visual Debugging Matters
Snapshot overlays that highlight detected elements make debugging agent behavior dramatically faster.
What's Next?
We're actively developing Predicate Secure with upcoming features:
- Multi-step verification chains - Complex assertion flows
- Replay killswitches - Emergency agent shutdown
- Vision fallback - Handle CAPTCHAs and complex UIs
- Permission recovery - Graceful handling of authorization failures
- Temporal integration - Durable execution for long-running agents
The demo is open source and available in the predicate-secure SDK repository.
Try Predicate Secure Today
SDK QuickstartTechnical Deep Dive Resources
Want to go deeper? Check out these resources:
- Demo README - Complete setup guide
- Architecture Doc - System design details
- Predicate Authority User Manual - Policy language reference
- SDK Python Docs - Browser automation API
Have questions or feedback? Reach out to us on GitHub.
Built with ❤️ by the Predicate team.