Content Reading API
Extract page content as text, markdown, or raw HTML using the read() function.
Basic Usage
from predicate import read
# Get markdown content
result = read(browser, format="markdown")
print(result["content"])Parameters
Python:
browser(PredicateBrowser): Browser instanceformat(str): Output format -"raw"(default),"text", or"markdown"enhance_markdown(bool): Usemarkdownifyfor better conversion (default: True)
TypeScript:
browser(PredicateBrowser): Browser instanceoptions(object, optional):format(string): "raw", "text", or "markdown"
Returns
Dict/object with:
status: "success" or "error"url: Page URLformat: Output formatcontent: Extracted content (string)length: Content length in characters
Format Options
"raw" (default):
- Returns the raw HTML content of the page
- Useful for custom processing or parsing with external libraries
"text":
- Extracts plain text content
- Strips HTML tags and formatting
- Useful for text analysis or NLP tasks
"markdown":
- Converts HTML to Markdown format
- Preserves structure (headings, lists, links)
- Enhanced with
markdownifyfor better conversion quality - Useful for documentation or content extraction
Example Use Cases
Extract article content:
browser.page.goto("https://example.com/article")
result = read(browser, format="markdown")
article_content = result["content"]
print(f"Article length: {result[Extract text for analysis:
result = read(browser, format="text")
text_content = result["content"]
# Use with NLP libraries or text analysis toolsSave content to file:
result = read(browser, format="markdown")
with open("page_content.md", "w", encoding="utf-8") as f:
f.write(result[