Building Trustworthy AI Agents: Demo with Complete Agent Loop - Pre-Execution Authorization and Post-Execution Deterministic Verification

AI agents are powerful, but how do you ensure they don't go rogue? Today we're releasing Predicate Secure - a drop-in security wrapper that adds enterprise-grade authorization and verification to browser automation agents. Think of it as a safety harness for your AI agents.

📦 Open Source: The complete demo is available on GitHub at PredicateSystems/predicate-secure (see the demo/ folder). Get started in 5 minutes with local LLM verification.

Predicate Secure integrates with your existing AI agent frameworks in just 3-5 lines of code - including browser-use, LangChain, PydanticAI, raw Playwright, and OpenClaw. This frictionless adoption means you can add robust security without rewriting your agents.

This post walks through our comprehensive demo that showcases the complete agent security loop: pre-execution authorization, browser automation, and post-execution verification using local LLMs.

The Challenge: Trustworthy Agent Automation

When AI agents interact with browsers and web services, they need guardrails. A misconfigured prompt or unexpected model behavior could lead to:

Navigating to unauthorized domains
Clicking sensitive buttons or forms
Exposing credentials or API keys
Performing unauthorized actions (e.g., deleting all emails)
Executing actions outside policy boundaries

Traditional approaches rely on prompt engineering or hope for the best. Predicate Secure takes a different approach: enforce policy before execution, verify outcomes after.

The Solution: Complete Deterministic Agent Loop

Predicate Secure implements a complete three-phase agent loop that combines:

Pre-execution authorization - Deterministic policy-based decisions
Action execution - Controlled browser automation
Post-execution verification - Deterministic assertion checking

This is not a probabilistic safety approach. Every action is governed by explicit policy rules (deterministic authorization) and validated against concrete predicates (deterministic verification). The LLM's role is constrained to generating verification predicates based on observed state changes - the actual verification execution is deterministic.

Phase 1

Pre-Execution Authorization

Policy-based decision: Is this action allowed?

Phase 2

Action Execution

Browser automation with snapshot capture

Phase 3

Post-Execution Verification

LLM-generated assertions validate outcomes

Demo Architecture

The demo showcases a complete end-to-end implementation with:

External Dependencies

100%

Offline Capable

Free

Local LLM Verification

Core Components

1. Predicate Runtime SDK (predicate-runtime==1.1.2)

Browser automation via AsyncPredicateBrowser
Semantic element detection with find() DSL
Visual overlay for element highlighting
Automatic Chrome extension injection

2. Predicate Authority (predicate-authority>=0.1.0)

YAML-based policy enforcement
Fail-closed authorization (deny by default)
Optional Rust-based sidecar for production
Flexible identity: Local IdP, Okta, Entra ID (Azure AD), OIDC

3. Local LLM Verification (Qwen 2.5 7B Instruct)

Generates verification predicates from page state changes
Runs completely offline on Apple Silicon (MPS)
~14GB model, 5-second cold start after initial download

4. Cloud Tracing (Optional)

Upload authorization and verification events to Predicate Studio
Visualize execution timeline in web UI
Track decisions across agent runs

Frictionless Framework Integration

Predicate Secure wraps your existing agent code in 3-5 lines - no rewrites needed:

Framework	Adapter	Integration Effort
browser-use	`BrowserUseAdapter`	3 lines
LangChain	`SentienceLangChainCore`	4 lines
PydanticAI	`predicate.integrations.pydanticai`	3 lines
Raw Playwright	`AgentRuntime.from_playwright_page()`	5 lines
OpenClaw	`OpenClawAdapter`	3 lines

All adapters are production-ready and maintained in the predicate-runtime SDK. Drop-in security for any agent framework.

What the Demo Does

The demo executes a simple but complete browser task:

Navigate to https://www.example.com with policy check
Take snapshot with visual element overlay
Find and click "Learn more" link using semantic query
Verify URL contains "example-domains" after navigation
Upload trace to Predicate Studio (if API key provided)

Each action goes through the full authorization + verification loop.

Code Walkthrough

1. Semantic Element Finding

Instead of brittle CSS selectors, we use semantic queries:

1from predicate import find
2
3# Find link by semantic properties, not CSS
4element = find(snapshot, "role=link text~'Learn more'")
5
6if element:
7  print(f"Found: {element.text} (ID: {element.id})")
8  print(f"Clickable: {element.visual_cues.is_clickable}")
9  await click_element(element)

The find() function understands:

ARIA roles (role=link, role=button)
Text content matching (text~'substring')
Visual cues (clickability, visibility)
Element importance ranking

2. Authorization Policy

Authorization rules are declarative YAML:

1# Allow navigation to safe domains
2- name: allow-navigation-safe-domains
3effect: ALLOW
4principals:
5  - "agent:demo-browser"
6actions:
7  - "browser.navigate"
8resources:
9  - "https://www.example.com*"
10  - "https://www.google.com*"
11conditions:
12  required_labels:
13    - "browser_initialized"
14
15# Allow clicks on safe element types
16- name: allow-browser-click-safe-elements
17effect: ALLOW
18principals:
19  - "agent:demo-browser"
20actions:
21  - "browser.click"
22resources:
23  - "element:role=link[*"
24  - "element:role=button[*"
25  - "element#*"  # By snapshot ID
26conditions:
27  required_labels:
28    - "element_visible"
29    - "snapshot_captured"
30
31# Default deny (fail-closed)
32- name: default-deny
33effect: DENY
34principals:
35  - "*"
36actions:
37  - "*"
38resources:
39  - "*"

The policy is fail-closed: any action not explicitly allowed is denied. This prevents agents from taking unexpected actions.

3. LLM-Generated Verification Predicates

After each action, the local LLM analyzes the state changes and generates deterministic verification predicates (assertions to check):

Important: The LLM is NOT doing visual verification. Instead, it generates structured assertions (like url_contains, element_exists) based on observed state changes. The actual verification execution is deterministic - predicates are evaluated as true/false checks.

1# Capture pre and post snapshots
2pre_snapshot = await get_page_summary()
3result = await execute_action()
4post_snapshot = await get_page_summary()
5
6# LLM generates verification plan (what to check, not the check itself)
7verification_plan = verifier.generate_verification_plan(
8  action="click",
9  action_target="element#6",
10  pre_snapshot_summary=pre_snapshot,
11  post_snapshot_summary=post_snapshot,
12  context={"task": "Find and click Learn more link"}
13)
14
15# Execute generated predicates deterministically
16for verification in verification_plan.verifications:
17  passed = execute_predicate(
18      verification.predicate,  # e.g., "url_contains"
19      verification.args         # e.g., ["example-domains"]
20  )
21
22  if not passed:
23      raise AssertionError("Post-execution verification failed")

The LLM sees both snapshots and generates a structured verification plan:

1{
2"verifications": [
3  {
4    "predicate": "url_contains",
5    "args": ["example-domains"]
6  },
7  {
8    "predicate": "snapshot_changed",
9    "args": []
10  }
11],
12"reasoning": "Verify navigation by checking URL change and snapshot difference."
13}

For Production Workflows:

For well-understood web flows (like QA testing flows or regular business processes), you can skip LLM generation and use human-defined predicates directly:

1# Predefined verification for known workflows
2verification_plan = VerificationPlan(
3  action="click",
4  verifications=[
5      VerificationSpec(predicate="url_contains", args=["example-domains"]),
6      VerificationSpec(predicate="element_exists", args=["h1"]),
7      VerificationSpec(predicate="snapshot_changed", args=[]),
8  ],
9  reasoning="Predefined checks for 'Learn more' click flow",
10)
11
12# Execute the same way - deterministic evaluation
13all_passed = execute_verifications(verification_plan)

This approach is faster (no LLM inference), more predictable (explicit assertions), and ideal for regression testing of known workflows. Use LLM-generated predicates for exploratory tasks or novel scenarios.

4. Visual Element Overlay

Enable visual debugging with snapshot overlays:

1from predicate.snapshot import snapshot_async
2from predicate.models import SnapshotOptions
3
4snap = await snapshot_async(
5  browser,
6  SnapshotOptions(
7      show_overlay=True,  # Highlights detected elements in browser
8      screenshot=False,
9  ),
10)
11
12print(f"Captured {len(snap.elements)} elements")
13# Watch the browser - you'll see colored boxes around detected elements!

This is invaluable for debugging why an agent can't find an element.

Real Demo Output

Here's what the demo produces when run:

1╭──────────────── Demo Configuration ─────────────────╮
2│ Predicate Secure Browser Automation Demo            │
3│ Task: Navigate to example.com and verify page loads │
4│ Start URL: https://www.example.com                  │
5│ Principal: agent:demo-browser                       │
6╰─────────────────────────────────────────────────────╯
7
8Initializing Local LLM Verifier...
9⠋ Loading Qwen 2.5 7B model...
10✓ Verifier initialized
11
12Initializing Cloud Tracer...
13☁️  Cloud tracing enabled (Pro tier)
14✓ Cloud tracer initialized
15Run ID: 777c0308-82c8-454d-98df-5a603d12d418
16View trace: https://studio.predicatesystems.dev/runs/...
17
18Step 1: Initializing Browser...
19✓ Browser started
20
21Step 2: Executing Browser Task...
22
23→ Action: navigate (https://www.example.com)
24Pre-execution: Checking authorization...
25✓ Action authorized
26Executing action...
27✓ Action executed
28Post-execution: Generating verification plan...
29i Generated 1 verifications
30  Reasoning: Fallback: verify URL changed after navigation
31Executing verifications...
32  [1] url_changed()
33      ✓ Passed
34✓ All verifications passed
35
36→ Action: snapshot (current_page)
37Pre-execution: Checking authorization...
38✓ Action authorized
39Executing action...
40  Snapshot captured: 2 elements
41  (Watch the browser - elements are highlighted!)
42✓ Action executed
43Post-execution: Generating verification plan...
44i Generated 1 verifications
45  Reasoning: Verify page load by checking URL contains domain.
46Executing verifications...
47  [1] url_contains(example.com)
48      ✓ Passed
49✓ All verifications passed
50
51→ Finding link with text: 'Learn more'
52✓ Found element: Learn more (ID: 6)
53  Role: link, Clickable: True
54
55→ Action: click (element#6)
56Pre-execution: Checking authorization...
57✓ Action authorized
58Executing action...
59  Clicked at coordinates: (256.0, 198.078125)
60✓ Action executed
61Post-execution: Generating verification plan...
62i Generated 2 verifications
63  Reasoning: Verify navigation and page load.
64Executing verifications...
65  [1] url_contains(example.com)
66      ✓ Passed
67  [2] snapshot_changed()
68      ✓ Passed
69✓ All verifications passed
70
71✓ Task completed successfully
72
73Cleaning up...
74✓ Browser closed
75Uploading trace to Predicate Studio...
76✅ Trace uploaded successfully
77View in Studio: https://studio.predicatesystems.dev/runs/...

Setup Instructions

Prerequisites

Python 3.11+ (Python 3.11.9 recommended)
16GB+ RAM (for 7B model) or 8GB+ (for 3B model)
Apple Silicon Mac (MPS support) or CUDA GPU
10GB disk space for model files

Installation (5 minutes)

1# Clone repository
2cd /path/to/Sentience/predicate-secure/py-predicate-secure
3
4# Install SDK
5pip install -e .
6
7# Install demo dependencies
8cd demo
9pip install -r requirements.txt
10
11# Install Playwright browsers
12playwright install chromium

Configuration

Create a .env file in the demo directory:

1# Browser display (false = show browser)
2BROWSER_HEADLESS=false
3
4# LLM model for verification
5LLM_MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
6LLM_DEVICE=auto  # Automatically detects MPS/CUDA/CPU
7LLM_MAX_TOKENS=512
8LLM_TEMPERATURE=0.0
9
10# Optional: Predicate API key for cloud tracing
11# PREDICATE_API_KEY=your-api-key-here
12
13# Demo configuration
14DEMO_START_URL=https://www.example.com
15DEMO_TASK_DESCRIPTION=Navigate to example.com and verify page loads
16DEMO_PRINCIPAL_ID=agent:demo-browser

The demo works completely offline (after initial model download). No API key required!

Running the Demo

1# Simple mode with in-process authorization
2python secure_browser_demo.py
3
4# First run: Model downloads automatically (~14GB, 2-5 minutes)
5# Subsequent runs: Fast startup (~5 seconds)

Performance Characteristics

Based on real demo runs on Apple Silicon (M-series):

Metric	Value	Notes
Model Load Time	~5 seconds	After initial download
LLM Inference Time	~3-5 seconds	Per verification plan generation
Snapshot Capture	~1 second	With API or local extension
Authorization Check	<1ms	In-process policy evaluation
Total Action Loop	~5-10 seconds	Including verification
Memory Usage	~8GB	7B model on MPS

Production Deployment

Sidecar Mode

For production, use the Rust-based predicate-authorityd sidecar. The sidecar is optional but recommended for enterprise deployments.

Option 1: Local IdP (Demo/Testing)

1# Start sidecar with local IdP mode
2export LOCAL_IDP_SIGNING_KEY="your-production-secret-key"
3
4predicate-authorityd run \
5--host 127.0.0.1 \
6--port 8787 \
7--mode local_only \
8--policy-file policies/browser_automation.yaml \
9--identity-mode local-idp \
10--local-idp-issuer "http://localhost/predicate-local-idp" \
11--local-idp-audience "api://predicate-authority"
12
13# Verify sidecar is running
14curl http://127.0.0.1:8787/health

Option 2: Bring Your Own IdP (Enterprise)

The sidecar integrates with your existing identity provider:

Okta:

1predicate-authorityd run \
2--identity-mode oidc \
3--oidc-issuer https://your-domain.okta.com \
4--oidc-client-id <client-id> \
5--oidc-client-secret <secret> \
6--policy-file policies/browser_automation.yaml

Entra ID (Azure AD):

1predicate-authorityd run \
2--identity-mode entra \
3--entra-tenant-id <tenant-id> \
4--entra-client-id <client-id> \
5--entra-client-secret <secret> \
6--policy-file policies/browser_automation.yaml

Generic OIDC:

1predicate-authorityd run \
2--identity-mode oidc \
3--oidc-issuer https://your-idp.com \
4--oidc-client-id <client-id> \
5--oidc-client-secret <secret> \
6--policy-file policies/browser_automation.yaml

Benefits of sidecar mode:

Centralized authorization across multiple agents
Production-grade audit logging
Hot-reload policy changes without agent restart
Fleet management and monitoring
Higher performance (Rust vs Python)
Enterprise identity integration (Okta, Entra ID, OIDC)

Cloud-Connected Mode

For enterprise deployments with Predicate Cloud:

1export PREDICATE_API_KEY="your-api-key"
2
3predicate-authorityd run \
4--mode cloud_connected \
5--control-plane-url https://api.predicatesystems.dev \
6--tenant-id your-tenant \
7--project-id your-project \
8--predicate-api-key $PREDICATE_API_KEY

This enables:

Centralized policy management
Real-time monitoring dashboard
Historical audit trails
Team collaboration on policies

Key Takeaways

1. Defense in Depth

Don't rely on prompt engineering alone. Use policy-based authorization + LLM verification for robust safety.

2. Local LLMs Are Viable

Qwen 2.5 7B provides sufficient reasoning for verification predicates while running completely offline on consumer hardware.

3. Semantic Queries Beat CSS

The find() DSL with role-based and text-based matching is more resilient than brittle CSS selectors.

4. Visual Debugging Matters

Snapshot overlays that highlight detected elements make debugging agent behavior dramatically faster.

What's Next?

We're actively developing Predicate Secure with upcoming features:

Multi-step verification chains - Complex assertion flows
Replay killswitches - Emergency agent shutdown
Vision fallback - Handle CAPTCHAs and complex UIs
Permission recovery - Graceful handling of authorization failures
Temporal integration - Durable execution for long-running agents

The demo is open source and available in the predicate-secure SDK repository.

Try Predicate Secure Today

SDK Quickstart

Technical Deep Dive Resources

Want to go deeper? Check out these resources:

Demo README - Complete setup guide
Architecture Doc - System design details
Predicate Authority User Manual - Policy language reference
SDK Python Docs - Browser automation API

Have questions or feedback? Reach out to us on GitHub.

Built with ❤️ by the Predicate team.