Ordinality & Layout Support
Semantic Geometry is the foundational feature of Predicate SDK that helps AI agents perceive and interact with web pages. Ordinality and layout support extend this by enabling agents to understand element positions when users specify goals like "click the first result", "select the 2nd item", or "choose the first product with rating >= 4".
Overview
When users give AI agents positional instructions, the agent needs to know:
- Which elements belong together (e.g., all search results vs navigation links)
- What order they're in (1st, 2nd, 3rd, last)
- Where they are on the page (row/column position in a grid)
Predicate SDK solves this with two complementary features:
- Ordinality - Assigns position indices to elements within groups ("1st", "2nd", "last")
- Layout Detection - Identifies grids, lists, and page regions (header, nav, main, footer)
Why This Matters for LLMs
Layout detection is the critical missing link that allows an LLM to reliably answer "list + constraint" queries like "Find the first product with rating >= 4".
Without layout detection, an LLM sees a flat soup of text nodes and has to "guess" which rating belongs to which product based on proximity. With layout support, that soup transforms into structured objects, making the task trivial.
The Association Problem
The biggest challenge for an LLM on a web page is knowing boundaries between items.
Without Layout: The LLM sees a text sequence like:
...[Product A]... [Price]... [3.5 Stars]... [Product B]...
It might incorrectly assume "3.5 stars" belongs to Product B if the DOM structure is messy.
With Layout: You can present the LLM with structured objects:
{
"grid_id": 101,
"label": "Product Card",
"children": [
{ "text": "Wireless Headphones", "role": "title" },
{ "text": "4.0", "role": "rating" }
]
}The rating is explicitly linked to its parent product - no guessing required.
The Ordering Problem
"First" is ambiguous in a responsive grid.
Without Layout: DOM order often differs from visual order (e.g., in masonry layouts or flex-direction columns). The "first" product in the DOM might actually be in the top-right corner visually.
With Layout: The grid detection algorithm sorts items by visual rows first, then columns. This guarantees that "first" means "top-left," matching human intuition.
Summary of Benefits
| Feature | Benefit for "List + Constraints" |
|---|---|
| Dominant Group | Filters out noise (nav, footer) so the LLM only checks the relevant list |
| Container Inference | Solves the "Association Problem" - knowing which price/rating belongs to which item |
| Grid Sorting | Solves the "Ordering Problem" - correctly identifying the "first" item visually |
How It Works
Dominant Group Detection
The SDK automatically identifies the main content group on a page. This is typically the primary list or grid that users want to interact with (search results, product listings, article feeds).
Each element gets a group_key that indicates which visual group it belongs to. The most common group is marked as the dominant_group_key in the snapshot.
from predicate import PredicateBrowser, snapshot
with PredicateBrowser() as browser:
browser.page.goto("https://news.ycombinator.com")
snap = snapshot(browser)
# The dominant group is the main content area
printOrdinal Selection
Each element in a group has a group_index (0-based position). This enables selecting elements by ordinal position:
# Get elements sorted by position in the dominant group
dominant_elements = sorted(
[e for e in snap.elements if e.in_dominant_group],
key=lambda e: e.group_index or 0
)Element Position Fields
Each element includes position data for ordinal selection:
| Field | Type | Description |
|---|---|---|
center_x | number | X coordinate of element center (viewport-relative) |
center_y | number | Y coordinate of element center (viewport-relative) |
doc_y | number | Absolute Y position in document (includes scroll offset) |
group_key | string | Geometric bucket key for grouping (format: x{bucket}-h{bucket}) |
group_index | number | Position within group (0-indexed, sorted by doc_y) |
in_dominant_group | boolean | Whether element is in the main content group |
href | string | Hyperlink URL (for link elements) |
Layout Detection
Layout detection provides detailed grid and region information for complex page structures.
Layout Fields on Elements
Elements may include a layout field with geometric metadata:
| Field | Type | Description |
|---|---|---|
grid_id | number | Unique ID for the grid this element belongs to |
grid_pos | GridPosition | Row and column indices (0-based) |
parent_index | number | Index of inferred parent element in the elements array |
children_indices | number[] | List of child element indices (capped at 30) |
region | string | Page region: header, nav, main, aside, or footer |
grid_confidence | number | Confidence score for grid assignment (0.0-1.0) |
Grid Coordinates API
Get bounding boxes and metadata for detected grids:
from predicate import PredicateBrowser, snapshot
with PredicateBrowser() as browser:
browser.page.goto("https://example.com/products")
snap = snapshot(browser)
# Get all detected grids
all_grids =GridInfo Properties
| Property | Type | Description |
|---|---|---|
grid_id | number | Unique identifier for the grid |
bbox | BBox | Bounding box (x, y, width, height) in document coordinates |
row_count | number | Number of rows in the grid |
col_count | number | Number of columns in the grid |
item_count | number | Total number of items in the grid |
label | string | null | Inferred semantic label (see below) |
is_dominant | boolean | Whether this is the main content grid |
Grid Labels
The SDK automatically infers grid labels based on content patterns:
| Label | Detected When |
|---|---|
product_grid | Price patterns ($, €, £), "Add to cart", ratings |
search_results | Snippets, ellipses, mostly links |
article_feed | Timestamps ("2 hours ago"), bylines, dates |
navigation | Short text, homogeneous links, nav keywords |
button_grid | All elements are buttons |
link_list | 80%+ of elements are links |
Working with Grid Positions
Access individual element positions within a grid:
# Access element layout data
for elem in snap.elements:
if elem.layout and elem.layout.grid_id is not None:
print(f"Element '{elem.Practical Examples
Example 1: Click the First Search Result
from predicate import PredicateBrowser, snapshot, click
with PredicateBrowser() as browser:
browser.page.goto("https://google.com")
# ... perform search ...
snap = snapshot(browser)Example 2: Select Product in Grid by Row/Column
# Find product at row 1, column 2 (0-indexed)
target_row, target_col = 1, 2
for elem in snap.elements:
if elem.layout and elem.layout.grid_pos:
pos = elem.Example 3: Filter by Region
# Get only elements in the main content area
main_elements = [
e for e in snap.elements
if e.layout and e.layout.region == "main"
]
# Get navigation links
nav_links = [Important Notes
- The
layoutfield is optional and may not be present in all snapshots - Grid labels are best-effort heuristics and may not always be accurate
children_indicesis capped at 30 elements to prevent large payloads- Confidence scores (
grid_confidence,region_confidence) indicate detection reliability