Group chat is AutoGen's most powerful and most opaque feature. Here is a toolkit for when it goes wrong.
The Group Chat Debugging Problem
AutoGen's GroupChat is powerful: multiple agents collaborate, debate, and delegate in a shared conversation. It is also opaque. When something goes wrong -- agents looping, the wrong agent picking up a task, costs exploding, the group getting stuck -- it is hard to know why without the right tools.
This article covers the five most common group chat failure modes and exactly how to debug and fix each one.
Failure 1: Agents Loop Forever
The group chat runs, agents keep responding to each other, and the conversation never terminates. This is almost always a missing or broken termination condition.
The fix: always set max_round and a termination string
import ag2
groupchat = ag2.GroupChat(
agents=[assistant, critic, user_proxy],
messages=[],
max_round=12, # hard cap on total turns
speaker_selection_method="auto",
)
manager = ag2.GroupChatManager(
groupchat=groupchat,
llm_config={"model": "gpt-4o", "api_key": "..."},
)
user_proxy = ag2.UserProxyAgent(
name="user",
human_input_mode="NEVER",
# Terminate when any agent says TERMINATE
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
max_consecutive_auto_reply=3, # prevent any single agent from dominating
)Add explicit instructions in each agent's system prompt: 'When the task is complete, end your message with TERMINATE.' Without this instruction, agents will keep elaborating indefinitely.
Set max_round conservatively -- lower than you think you need. If the task genuinely requires more turns, increase it incrementally. A runaway group chat with GPT-4o can accumulate significant cost in minutes.Failure 2: The Wrong Agent Gets Selected
In auto speaker selection mode, the GroupChatManager LLM decides which agent should speak next based on the conversation context. If the LLM makes a poor selection (e.g. always picking the same agent, or picking an agent that is clearly wrong for the task), the group chat produces bad outputs.
Diagnosis
# Enable verbose logging to see speaker selection decisions
manager = ag2.GroupChatManager(
groupchat=groupchat,
llm_config={"model": "gpt-4o", "api_key": "..."},
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
)
# Print full conversation to trace which agent was selected each round
for message in groupchat.messages:
print(f"[{message['role']} / {message.get('name', 'unknown')}]: "
f"{message['content'][:200]}")Fix Option 1: Use round_robin for predictable turn order
groupchat = ag2.GroupChat(
agents=[researcher, writer, reviewer],
messages=[],
max_round=9,
speaker_selection_method="round_robin", # guaranteed order, no LLM selection
)Fix Option 2: Use a custom speaker selection function
def custom_speaker_selector(last_speaker, groupchat):
messages = groupchat.messages
if not messages:
return researcher # always start with researcher
last_content = messages[-1].get("content", "")
if "RESEARCH COMPLETE" in last_content:
return writer
if "DRAFT COMPLETE" in last_content:
return reviewer
return last_speaker # default: same agent continues
groupchat = ag2.GroupChat(
agents=[researcher, writer, reviewer],
messages=[],
max_round=12,
speaker_selection_method=custom_speaker_selector,
)Failure 3: Cost Explosion
Group chat passes the full conversation history to every agent on every turn. In a 10-round group chat with 3 agents, each using GPT-4o, you are paying for the growing context window 30 times. Costs scale non-linearly with round count and message length.
Fixes
- Set max_round aggressively -- start low (6-8 rounds) and increase only if needed
- Use a cheaper model for the GroupChatManager (it only selects speakers, not the task work)
- Use a cheaper model for lower-stakes agents (e.g. critic, reviewer) -- reserve GPT-4o for the primary worker
- Summarise long tool outputs before they enter the conversation (a 5000-token search result becomes 200 tokens)
# Cost-optimised group chat: cheap manager, mixed models
manager = ag2.GroupChatManager(
groupchat=groupchat,
llm_config={"model": "gpt-4o-mini", "api_key": "..."}, # cheap selector
)
researcher = ag2.AssistantAgent(
name="researcher",
llm_config={"model": "gpt-4o", "api_key": "..."}, # best model for research
)
reviewer = ag2.AssistantAgent(
name="reviewer",
llm_config={"model": "gpt-4o-mini", "api_key": "..."}, # cheaper for review
)Failure 4: Agents Produce Outputs the Next Agent Cannot Use
Agent A produces a JSON blob. Agent B expects plain text. Agent C produces a numbered list when the downstream code expects a dict. The group chat 'works' but produces garbage that fails silently downstream.
The fix: enforce output contracts in system prompts
researcher = ag2.AssistantAgent(
name="researcher",
system_message=(
"You are a research analyst. When your research is complete, output "
"EXACTLY this structure and nothing else:\n"
"FINDINGS:\n"
"- [finding 1]\n"
"- [finding 2]\n"
"SOURCES:\n"
"- [url 1]\n"
"Then write RESEARCH COMPLETE on a new line."
),
llm_config={"model": "gpt-4o", "api_key": "..."},
)Alternatively, use structured outputs (if your model supports them) to enforce a schema at the API level rather than relying on prompt instructions alone.
Failure 5: Human Input Mode Blocks Automation
In development, human_input_mode='ALWAYS' is useful -- it lets you step through the conversation. In production automation, it blocks execution waiting for terminal input that never comes. This is a common reason why group chats that work locally hang silently in production.
# Development:
user_proxy = ag2.UserProxyAgent(
human_input_mode="ALWAYS", # prompts for input each turn
...
)
# Production:
user_proxy = ag2.UserProxyAgent(
human_input_mode="NEVER", # fully automated
max_consecutive_auto_reply=5,
is_termination_msg=lambda m: "TERMINATE" in m.get("content", ""),
...
)Quick Reference
- Always set max_round -- start at 6-8, increase only if needed
- Always set is_termination_msg with a TERMINATE string convention
- Use verbose=True or print groupchat.messages to trace speaker selection
- Use round_robin or a custom function for deterministic turn order
- Use cheaper models for GroupChatManager and lower-stakes agents
- Summarise long tool outputs before injecting them into the conversation
- Switch human_input_mode to NEVER in all production deployments