Stagehand SDK: Natural Language Browser Automation Without Fighting Selectors

What Stagehand Does Differently

Traditional browser automation breaks when the page changes. A button moves, a class name updates, and your CSS selector stops working. Stagehand uses a vision model to interpret the page and translate natural language instructions into browser actions — making automations resilient to UI changes.

The tradeoff: LLM calls on every action add latency (typically 1-3 seconds per act() or extract() call). For interactive workflows that run once per user request this is fine. For high-volume batch scraping, raw Playwright is faster.

The Three Core Methods

Method	What it does	Returns
page.act(instruction)	Performs an action on the page (click, fill, navigate)	None — side effect only
page.extract(instruction)	Extracts structured data from the current page	Dict matching your schema
page.observe(instruction)	Returns a list of actionable elements without acting	List of element descriptors

act(): Performing Page Actions

from stagehand import Stagehand
import asyncio
 
async def main():
    stagehand = Stagehand()
    await stagehand.init()
    page = stagehand.page
 
    await page.goto('https://news.ycombinator.com')
 
    # Click actions
    await page.act('Click on the first story link')
 
    # Form filling
    await page.goto('https://github.com/login')
    await page.act('Fill in the username field with myuser@example.com')
    await page.act('Fill in the password field with my_password')
    await page.act('Click the Sign in button')
 
    await stagehand.close()
 
asyncio.run(main())

Write act() instructions as specific commands, not descriptions. 'Click the blue Submit button in the checkout form' is better than 'Submit the form' — the model has more context to locate the correct element.

extract(): Getting Structured Data

extract() returns data matching a schema you describe. Use Zod (TypeScript) or Pydantic (Python) schemas for structured output.

from pydantic import BaseModel
from typing import List
 
class Product(BaseModel):
    name: str
    price: str
    rating: str
    in_stock: bool
 
class ProductList(BaseModel):
    products: List[Product]
 
async def scrape_products(url: str):
    stagehand = Stagehand()
    await stagehand.init()
    page = stagehand.page
 
    await page.goto(url)
 
    # Extract with schema validation
    result = await page.extract(
        'Extract all product listings visible on this page',
        schema=ProductList,
    )
    print(result.products)  # List[Product]
 
    await stagehand.close()
    return result.products

observe(): Planning Before Acting

observe() is useful when you want to check what is available on a page before deciding what to do. It returns actionable elements without executing any action — useful for building agents that reason about their options.

async def smart_navigation(page, goal: str):
    # First observe what options are available
    options = await page.observe(
        f'What navigation links or buttons are available to help accomplish: {goal}'
    )
 
    print('Available actions:')
    for option in options:
        print(f'  - {option.description}')
 
    # Then decide and act based on observations
    if options:
        await page.act(options[0].description)

Session Management and Resumption

For long-running tasks, save your session ID and resume it if interrupted. This preserves cookies, local storage, and authentication state.

import json
 
async def run_with_session_persistence(task_id: str):
    stagehand = Stagehand()
 
    # Try to resume existing session
    session_file = f'sessions/{task_id}.json'
    if os.path.exists(session_file):
        with open(session_file) as f:
            saved = json.load(f)
        await stagehand.init(session_id=saved['session_id'])
    else:
        await stagehand.init()
        # Save session for potential resumption
        os.makedirs('sessions', exist_ok=True)
        with open(session_file, 'w') as f:
            json.dump({'session_id': stagehand.session_id}, f)
 
    # ... perform task ...
 
    await stagehand.close()

Sessions created in Browserbase have a maximum duration (typically 15 minutes for free tier). For longer tasks, break them into multiple sessions and use a database to store intermediate state rather than relying on browser session persistence.

Common Failure Patterns

Problem	Cause	Fix
act() does nothing	Instruction too vague for model to locate element	Be more specific: include element type, location, or unique text
extract() returns empty or wrong data	Page uses heavy JS rendering, content not yet visible	Add await page.wait_for_load_state('networkidle') before extract()
High latency (5+ seconds per action)	LLM call overhead on every act()	Batch related actions; use raw Playwright for known-stable selectors
Authentication breaks after session resume	Session expired or cookies invalidated	Re-authenticate at start of each session rather than relying on resumption