OpenAI Agents SDK: Building Multi-Agent Systems with Handoffs

The Agents SDK handoff system lets agents delegate to specialists. Here is how it works and the patterns that hold up in production.

What Handoffs Are

The OpenAI Agents SDK evolved from Swarm, which introduced the handoff concept: one agent can transfer control of a conversation to another agent when the task is better handled by a specialist. The handoff passes the full conversation context to the receiving agent, which then continues as if it had been running the whole time.

This is the SDK's primary multi-agent primitive. Understanding it properly is the key to building systems where agents collaborate without losing context.

Basic Handoff Setup

from agents import Agent, Runner
 
# Specialist agents
billing_agent = Agent(
    name="Billing Agent",
    instructions=(
        "You handle billing questions: refunds, invoices, payment methods. "
        "Be precise and reference specific amounts and dates when discussing charges."
    ),
    model="gpt-4o",
)
 
technical_agent = Agent(
    name="Technical Agent",
    instructions=(
        "You handle technical support: bugs, errors, configuration issues. "
        "Ask for error messages and steps to reproduce."
    ),
    model="gpt-4o",
)
 
# Triage agent with handoffs declared
triage_agent = Agent(
    name="Triage Agent",
    instructions=(
        "You are a customer support triage agent. "
        "Route billing questions to the Billing Agent. "
        "Route technical questions to the Technical Agent. "
        "Handle simple general questions yourself."
    ),
    model="gpt-4o-mini",  # cheaper model is fine for routing
    handoffs=[billing_agent, technical_agent],
)
 
# Run -- the SDK handles the handoff automatically
result = await Runner.run(
    triage_agent,
    input="I was charged twice for my subscription last month",
)
print(result.final_output)
# Triage agent detects billing issue, hands off to Billing Agent

How Handoffs Actually Work

Under the hood, a handoff is a special tool call. When the triage agent decides to hand off, it calls a transfer_to_billing_agent() tool that the SDK generates automatically from your handoffs list. The SDK then switches the active agent and re-runs with the receiving agent as the executor.

The full conversation history is preserved. The receiving agent sees everything the user said and everything the triage agent said. This is what makes handoffs feel seamless from the user's perspective.

from agents import Agent, Runner, HandoffInputData
 
# You can customise handoff behaviour with a filter
def billing_handoff_filter(handoff_input: HandoffInputData) -> HandoffInputData:
    # Strip internal triage notes before handing off to billing
    # (e.g. remove internal classification reasoning from context)
    filtered_messages = [
        msg for msg in handoff_input.input_history
        if not msg.get("content", "").startswith("[INTERNAL]")
    ]
    return HandoffInputData(
        input_history=filtered_messages,
        pre_handoff_items=handoff_input.pre_handoff_items,
    )
 
from agents import handoff
 
billing_agent_handoff = handoff(
    agent=billing_agent,
    input_filter=billing_handoff_filter,
)
 
triage_agent = Agent(
    name="Triage Agent",
    instructions="Route to the appropriate specialist agent.",
    handoffs=[billing_agent_handoff, technical_agent],
)

Custom Handoff Messages

By default, the triage agent's last message before handing off is visible to the receiving agent. You can add a structured handoff message that gives the receiving agent useful context without it having to infer everything from the conversation.

from pydantic import BaseModel
from agents import handoff, Agent
 
class BillingHandoffData(BaseModel):
    issue_type: str      # "duplicate_charge", "refund_request", "invoice_query"
    amount_mentioned: float | None
    urgency: str         # "high", "medium", "low"
 
billing_agent = Agent(
    name="Billing Agent",
    instructions="Handle the billing issue described in the handoff context.",
    model="gpt-4o",
)
 
# Handoff with structured input type
billing_handoff = handoff(
    agent=billing_agent,
    input_type=BillingHandoffData,
    tool_name="transfer_to_billing",
    tool_description=(
        "Transfer to the Billing Agent for: duplicate charges, refund requests, "
        "invoice questions, or payment method issues. "
        "Provide the issue_type, any amount mentioned, and urgency level."
    ),
)
 
triage_agent = Agent(
    name="Triage",
    handoffs=[billing_handoff],
    model="gpt-4o-mini",
)

Multi-Turn Handoff Conversations

Handoffs work across multiple conversation turns, not just single messages. The user can continue the conversation with the specialist agent across multiple exchanges.

from agents import Runner
 
# The runner maintains state across turns
async def run_conversation(session_id: str, user_messages: list[str]):
    runner = Runner()
    current_input = []
 
    for user_message in user_messages:
        current_input.append({"role": "user", "content": user_message})
 
        result = await runner.run(
            triage_agent,
            input=current_input,
        )
 
        # Add the full output (including handoff context) to history
        current_input = result.to_input_list()
 
        print(f"Agent: {result.last_agent.name}")
        print(f"Response: {result.final_output}")
        print()

Common Mistakes

Not specifying handoff criteria clearly -- if the triage agent's instructions are vague, it will guess wrong about when to hand off
Giving all specialists the same broad instructions -- defeats the purpose of specialisation
Expecting handoffs to be instantaneous -- each handoff is at least one additional LLM call
Not testing specialist agents in isolation before wiring up the triage layer -- bugs in specialists are hard to debug through a triage agent
Using an expensive model for the triage agent when a cheaper model (gpt-4o-mini) handles routing perfectly well

Quick Reference

Declare handoffs=[agent_a, agent_b] on the triage agent -- the SDK generates transfer tools automatically
Full conversation history is passed to the receiving agent on handoff
Use input_filter to strip internal notes before handing off
Use input_type with a Pydantic model to pass structured context on handoff
Use result.to_input_list() to maintain conversation state across turns
Use cheap models (gpt-4o-mini) for triage/routing, stronger models for specialist tasks