What Stagehand Does Differently
Traditional browser automation breaks when the page changes. A button moves, a class name updates, and your CSS selector stops working. Stagehand uses a vision model to interpret the page and translate natural language instructions into browser actions — making automations resilient to UI changes.
The tradeoff: LLM calls on every action add latency (typically 1-3 seconds per act() or extract() call). For interactive workflows that run once per user request this is fine. For high-volume batch scraping, raw Playwright is faster.
The Three Core Methods
| Method | What it does | Returns |
|---|---|---|
| page.act(instruction) | Performs an action on the page (click, fill, navigate) | None — side effect only |
| page.extract(instruction) | Extracts structured data from the current page | Dict matching your schema |
| page.observe(instruction) | Returns a list of actionable elements without acting | List of element descriptors |
act(): Performing Page Actions
from stagehand import Stagehand
import asyncio
async def main():
stagehand = Stagehand()
await stagehand.init()
page = stagehand.page
await page.goto('https://news.ycombinator.com')
# Click actions
await page.act('Click on the first story link')
# Form filling
await page.goto('https://github.com/login')
await page.act('Fill in the username field with myuser@example.com')
await page.act('Fill in the password field with my_password')
await page.act('Click the Sign in button')
await stagehand.close()
asyncio.run(main())
Write act() instructions as specific commands, not descriptions. 'Click the blue Submit button in the checkout form' is better than 'Submit the form' — the model has more context to locate the correct element.extract(): Getting Structured Data
extract() returns data matching a schema you describe. Use Zod (TypeScript) or Pydantic (Python) schemas for structured output.
from pydantic import BaseModel
from typing import List
class Product(BaseModel):
name: str
price: str
rating: str
in_stock: bool
class ProductList(BaseModel):
products: List[Product]
async def scrape_products(url: str):
stagehand = Stagehand()
await stagehand.init()
page = stagehand.page
await page.goto(url)
# Extract with schema validation
result = await page.extract(
'Extract all product listings visible on this page',
schema=ProductList,
)
print(result.products) # List[Product]
await stagehand.close()
return result.products
observe(): Planning Before Acting
observe() is useful when you want to check what is available on a page before deciding what to do. It returns actionable elements without executing any action — useful for building agents that reason about their options.
async def smart_navigation(page, goal: str):
# First observe what options are available
options = await page.observe(
f'What navigation links or buttons are available to help accomplish: {goal}'
)
print('Available actions:')
for option in options:
print(f' - {option.description}')
# Then decide and act based on observations
if options:
await page.act(options[0].description)
Session Management and Resumption
For long-running tasks, save your session ID and resume it if interrupted. This preserves cookies, local storage, and authentication state.
import json
async def run_with_session_persistence(task_id: str):
stagehand = Stagehand()
# Try to resume existing session
session_file = f'sessions/{task_id}.json'
if os.path.exists(session_file):
with open(session_file) as f:
saved = json.load(f)
await stagehand.init(session_id=saved['session_id'])
else:
await stagehand.init()
# Save session for potential resumption
os.makedirs('sessions', exist_ok=True)
with open(session_file, 'w') as f:
json.dump({'session_id': stagehand.session_id}, f)
# ... perform task ...
await stagehand.close()
Sessions created in Browserbase have a maximum duration (typically 15 minutes for free tier). For longer tasks, break them into multiple sessions and use a database to store intermediate state rather than relying on browser session persistence.Common Failure Patterns
| Problem | Cause | Fix |
|---|---|---|
| act() does nothing | Instruction too vague for model to locate element | Be more specific: include element type, location, or unique text |
| extract() returns empty or wrong data | Page uses heavy JS rendering, content not yet visible | Add await page.wait_for_load_state('networkidle') before extract() |
| High latency (5+ seconds per action) | LLM call overhead on every act() | Batch related actions; use raw Playwright for known-stable selectors |
| Authentication breaks after session resume | Session expired or cookies invalidated | Re-authenticate at start of each session rather than relying on resumption |