Docs/SDK/Content Reading

Content Reading API

Extract page content as text, markdown, or raw HTML using the read() function.

Basic Usage

from predicate import read

# Get markdown content
result = read(browser, format="markdown")
print(result["content"])

Parameters

Python:

  • browser (PredicateBrowser): Browser instance
  • format (str): Output format - "raw" (default), "text", or "markdown"
  • enhance_markdown (bool): Use markdownify for better conversion (default: True)

TypeScript:

  • browser (PredicateBrowser): Browser instance
  • options (object, optional):
    • format (string): "raw", "text", or "markdown"

Returns

Dict/object with:

  • status: "success" or "error"
  • url: Page URL
  • format: Output format
  • content: Extracted content (string)
  • length: Content length in characters

Format Options

"raw" (default):

  • Returns the raw HTML content of the page
  • Useful for custom processing or parsing with external libraries

"text":

  • Extracts plain text content
  • Strips HTML tags and formatting
  • Useful for text analysis or NLP tasks

"markdown":

  • Converts HTML to Markdown format
  • Preserves structure (headings, lists, links)
  • Enhanced with markdownify for better conversion quality
  • Useful for documentation or content extraction

Example Use Cases

Extract article content:

browser.page.goto("https://example.com/article")
result = read(browser, format="markdown")
article_content = result["content"]
print(f"Article length: {result[

Extract text for analysis:

result = read(browser, format="text")
text_content = result["content"]
# Use with NLP libraries or text analysis tools

Save content to file:

result = read(browser, format="markdown")
with open("page_content.md", "w", encoding="utf-8") as f:
    f.write(result[