CAPTCHA Handling
Built-in CAPTCHA detection with configurable handling strategies for browser automation agents.
Why Predicate Does Not Solve CAPTCHAs
Predicate is designed as infrastructure for token-efficient browser automation runtime, not as a full-service automation provider. We deliberately limit our scope to detection and verification for two reasons:
Separation of Concerns
As an infrastructure layer, Predicate focuses on providing reliable, efficient browser automation primitives. CAPTCHA resolution involves domain-specific policies, third-party integrations, and compliance considerations that vary significantly across organizations and use cases.
Policy Compliance
Different organizations have distinct security policies, legal requirements, and acceptable use guidelines governing CAPTCHA handling. By delegating resolution to customer-controlled systems, Predicate avoids imposing a one-size-fits-all approach and enables customers to implement solutions that align with their specific compliance obligations.
Predicate detects CAPTCHAs, pauses execution, and verifies clearance. You decide how to resolve them using your own workflows or external systems that comply with your organization's security policies.
How CAPTCHA Handling Works
When the Predicate SDK detects a CAPTCHA during browser automation, the following flow occurs:
- Detection: The Chrome extension scans for CAPTCHA signals (provider iframes, container selectors, keywords) and reports
snapshot.diagnostics.captchawith a confidence score. - Policy check: If
captcha.detected == trueandconfidence >= minConfidence, the runtime invokes your configured policy. - Action dispatch: Based on the policy, the runtime either aborts, retries with a new session, or waits for clearance.
- Verification loop: For
wait_until_cleared, the runtime continuously re-snapshots untilcaptcha.detected == falseor timeout is reached.
Recommended Hybrid Flow (Token → OCR → Vision)
CAPTCHAs vary by site and session. A practical approach is a hybrid fallback chain that starts with token-based solvers (best coverage for standard providers), then tries OCR for image-only challenges, and finally uses a vision LLM as a last resort.
This design keeps the SDK responsibilities focused on detection + orchestration, while you control the actual solving logic and providers (e.g., a token service like 2Captcha, or an internal OCR pipeline).
High-level behavior:
- Token solver first: Works for reCAPTCHA/hCaptcha/Turnstile when a site key is available.
- OCR second: Works for simple image captchas with a visible
<img>and nearby input. - Vision last: Best-effort fallback for custom or unusual layouts.
Signal source: CAPTCHA detection is surfaced on the current page via snapshot.diagnostics.captcha. This signal is only reliable on the page where the CAPTCHA is actually rendered (e.g., after a modal opens or a challenge iframe appears).
Pseudocode (Python):
from predicate.captcha import CaptchaContext
def solve_captcha(ctx: CaptchaContext):
if ctx.captcha.provider_hint in {"recaptcha", "hcaptcha", "turnstile"}:
if try_token_solver(ctx): # e.g., external provider
return wait_until_cleared(ctx)
if has_image_captcha(ctx): # visible <img> + nearby input
if try_ocr_solver(ctx): # image → text
return wait_until_cleared(ctx)
# Best-effort fallback
if try_vision_solver(ctx): # vision LLM on captured image
return wait_until_cleared(ctx)
return abort("captcha_unresolved")For questions or guidance, contact
Predicate Systems Support
Page-control Hook for External Solvers
When you run a CAPTCHA handler, the runtime provides a page-control hook so your solver can read or write small page state inside the same live browser session. This is what makes token injection or callback triggering possible without reloading the page.
Use it sparingly and keep the JS payload small and bounded.
from predicate.captcha import CaptchaContext
async def external_solver(ctx: CaptchaContext):
# Minimal, bounded JS: read a site key and inject a token.
sitekey = await ctx.page_control.evaluate_js("/* read sitekey from DOM */")
token = await request_token(sitekey) # your external system
await ctx.page_control.evaluate_js("/* write token into response field */")
return {"action": "wait_until_cleared"}Concepts and Terminology
Policies
Policies determine what happens when a CAPTCHA is detected:
| Policy | Behavior |
|---|---|
abort | Stop execution immediately when CAPTCHA is detected. Safest default for workflows where CAPTCHA indicates an unexpected state. |
callback | Invoke your custom handler function once per CAPTCHA incident, allowing you to decide the appropriate action dynamically. |
Actions
Actions are the runtime behaviors your handler can request:
| Action | Behavior | Details |
|---|---|---|
abort | Terminate the run | Sets reason_code to captcha_policy_abort |
retry_new_session | Reset and retry | Closes current browser session, opens fresh one, retries from beginning. Bounded by maxRetriesNewSession (default: 3) |
wait_until_cleared | Pause and poll | Suspends execution, periodically re-snapshots until CAPTCHA is no longer detected or timeoutMs expires |
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
minConfidence | number | 0.7 | Minimum confidence threshold (0-1) to trigger CAPTCHA handling |
timeoutMs | number | 120000 | Maximum time (ms) to wait for CAPTCHA clearance |
pollMs | number | 1000 | Interval (ms) between re-snapshot attempts during wait_until_cleared |
maxRetriesNewSession | number | 3 | Maximum retry_new_session attempts before aborting |
Default Strategies (Helpers, Not Solvers)
Predicate provides three built-in strategy helpers. These are not solvers; they configure the runtime's response to CAPTCHA detection and rely on external systems or humans for actual resolution.
| Strategy | Purpose | Use Case |
|---|---|---|
HumanHandoffSolver | Pause execution and signal a human operator | Live sessions, monitoring dashboards, manual intervention workflows |
VisionSolver | Use vision to confirm clearance (no clicking/typing) | Automated verification after external resolution |
ExternalSolver | Call your webhook/service, then wait for clearance | Integration with third-party CAPTCHA services or custom internal systems |
Quick Start
Option 1: Abort Policy (Safest Default)
Immediately stop when CAPTCHA is detected. No resolution attempted.
from predicate import AgentRuntime, CaptchaOptions
runtime.set_captcha_options(
CaptchaOptions(
policy="abort",
min_confidence=0.7,
)
)Option 2: Human-in-Loop (Recommended for Live Sessions)
When CAPTCHA is detected, the runtime pauses and waits for a human to solve it in the live browser session.
from predicate import CaptchaOptions, HumanHandoffSolver
runtime.set_captcha_options(
CaptchaOptions(
policy="callback",
handler=HumanHandoffSolver(),
min_confidence=0.7,
timeout_ms=120_000, # Wait up to 2 minutesOption 3: Vision-Only Verification
Uses vision to confirm the CAPTCHA has cleared. Does not click or type. Useful after an external system has already solved the CAPTCHA.
from predicate import CaptchaOptions, VisionSolver
runtime.set_captcha_options(
CaptchaOptions(policy="callback", handler=VisionSolver())
)Option 4: External Resolver Orchestration
Calls your webhook/service when CAPTCHA is detected, then waits for clearance. Your external system performs the actual resolution.
from predicate import CaptchaOptions, ExternalSolver
async def notify_webhook(ctx) -> None:
"""
Example hook: send context to your external system.
Replace with your own client / queue / webhook call.
Predicate does NOT implement solver logic.
"""
print(f"[captcha] external resolver notified: url=Complete Example
import asyncio
import os
from predicate import (
AgentRuntime,
AsyncPredicateBrowser,
CaptchaOptions,
ExternalSolver,
HumanHandoffSolver,
VisionSolver,
)
from predicate.tracing import JsonlTraceSinkCAPTCHA Detection
Detected Providers
Predicate detects the following CAPTCHA providers with high confidence:
| Provider | provider_hint | Detection Signals |
|---|---|---|
| Google reCAPTCHA | recaptcha | iframe src, .g-recaptcha, [data-sitekey] |
| hCaptcha | hcaptcha | iframe src, .h-captcha |
| Cloudflare Turnstile | turnstile | iframe src, .cf-turnstile, [data-cf-turnstile-sitekey] |
| Arkose Labs (FunCaptcha) | arkose | iframe src, #FunCaptcha, [data-arkose-public-key] |
| AWS WAF CAPTCHA | awswaf | iframe src, [data-awswaf-captcha], script src |
| Generic/Unknown | unknown | Text keywords: "verify you are human", "unusual traffic", "security check" |
CaptchaDiagnostics Schema
The snapshot.diagnostics.captcha object contains:
type CaptchaDiagnostics = {
detected: boolean;
provider_hint?: "recaptcha" | "hcaptcha" | "turnstile" | "arkose" | "awswaf" | "unknown";
confidence: number; // 0..1
evidence: {
text_hits: string[]; // Matched text keywords
selector_hits: string[]; // Matched CSS selectors
iframe_src_hits: string[]; // Matched iframe sources
url_hits: string[]; // Matched URL patterns
};
};Confidence Scoring
The detection uses a multi-signal scoring system:
- Provider iframe hit: +0.7 confidence
- Provider container selector hit: +0.5 confidence
- Keyword text hit: +0.3 confidence
- URL pattern hit: +0.2 confidence
A CAPTCHA is considered detected when confidence >= 0.7 (configurable via minConfidence).
External Resolution Guidance
If you integrate an external provider (e.g., 2captcha, Anti-Captcha) or your own internal system:
1. Predicate Only Detects and Verifies
The external system performs the actual resolution. Predicate monitors for clearance.
2. Webhook Payload Structure
Your handler receives a context object with:
| Field (Python) | Field (TypeScript) | Description |
|---|---|---|
run_id | runId | Current run identifier |
step_index | stepIndex | Current step number |
url | url | Page URL where CAPTCHA was detected |
captcha | captcha | CAPTCHA diagnostics (provider_hint, confidence, evidence) |
3. Policy Compliance
Keep audit logs and ensure your resolution approach complies with your organization's policies (consent, allowed domains, rate limits).
4. Handler Return Value
Your handler should return or allow the default wait_until_cleared action. The runtime then confirms clearance before resuming.
5. Timeout Configuration
Set appropriate timeouts based on your external service's SLA. External services may take 30-180 seconds to resolve.
Trace Integration
CAPTCHA events are automatically emitted as verification events to the tracer:
{
"type": "verification",
"data": {
"kind": "captcha",
"label": "captcha_detected",
"passed": false,
"reason": "CAPTCHA detected: recaptcha (confidence: 0.85)",
"details": {
"provider_hint": "recaptcha",
"confidence": 0.85,
"evidence": {
"selector_hits": [".g-recaptcha"],
"iframe_src_hits": ["https://www.google.com/recaptcha/..."]
}
}
},
"step_id": "abc-123"
}Best Practices
Production Recommendations
- Start with abort policy - Use
policy: "abort"in production until you understand your CAPTCHA patterns - Set appropriate timeouts - Match timeouts to your external service SLA or human operator availability
- Monitor CAPTCHA frequency - High CAPTCHA rates may indicate blocked IPs or aggressive rate limiting
- Use meaningful run IDs - Helps correlate CAPTCHA incidents across your logging/monitoring systems
Common Pitfalls
- Don't set minConfidence too low - Values below 0.5 may cause false positives
- Don't ignore timeouts - Always handle timeout scenarios gracefully
- Don't assume clearance - Always verify the CAPTCHA is actually cleared before resuming
Related Documentation
- Agent Runtime - Full AgentRuntime API reference
- Tracing & Debugging - Trace events and debugging
- Debugging Agent Failures - Post-mortem analysis