Loops, stuck states, and invisible failures are LangGraph's hardest debugging problems. Here's a toolkit to solve them.
The Debugging Problem
LangGraph's cyclic graphs are powerful -- agents can loop, retry, reflect, and route dynamically. They're also much harder to debug than linear pipelines. When something goes wrong in a cycle, the symptoms are often indirect: the agent gets stuck, the output is wrong, or the workflow runs forever and hits a token limit.
The core problem is visibility. Without the right tools, you can't tell which node executed, in what order, what state looked like at each step, or why the conditional edge took a particular path.
This article covers five debugging techniques that turn LangGraph from a black box into a traceable system.
Technique 1: Add Recursion Limits
The fastest win. LangGraph will execute cycles indefinitely by default. Set a recursion limit to cap the number of node executions per invocation -- this turns infinite loops into recoverable errors.
# Set a recursion limit to prevent infinite loops
config = {
"configurable": {"thread_id": "session-1"},
"recursion_limit": 25, # max node executions before raising an error
}
try:
result = graph.invoke(input_state, config)
except GraphRecursionError as e:
print(f"Graph hit recursion limit: {e}")
# Handle gracefully -- log, notify, return partial resultA good starting limit for most agents is 25–50. If a legitimate task requires more than 50 node executions, your graph design probably needs review -- break it into sub-graphs.Technique 2: Stream Node Execution in Real Time
Instead of waiting for the final output, stream each node's execution as it happens. This shows you exactly which nodes ran, in what order, and what state they produced -- without any external tooling.
# stream_mode="updates" emits state changes after each node
for chunk in graph.stream(input_state, config, stream_mode="updates"):
for node_name, state_update in chunk.items():
print(f"Node: {node_name}")
print(f"State update: {state_update}")
print("---")
# stream_mode="values" emits the full state after each node (more verbose)
for state in graph.stream(input_state, config, stream_mode="values"):
print(f"Full state: {state}")This is the single most useful debugging tool for LangGraph. Run it locally whenever an agent behaves unexpectedly -- you'll see exactly where the problem is within seconds.
Technique 3: Inspect State at Any Point
With a checkpointer configured, you can fetch the current state of a thread at any time -- even after the graph has finished running. This lets you inspect what the agent decided, what tools it called, and what the final state was.
# Get the current state of a thread
state = graph.get_state(config)
print("Current state:", state.values)
print("Next nodes to execute:", state.next)
print("Metadata:", state.metadata)
# Get full history of all checkpoints for a thread
history = list(graph.get_state_history(config))
for checkpoint in history:
print(f"Step {checkpoint.metadata.get('step')}: {checkpoint.values}")
# Rewind to a specific checkpoint and re-run from there
# (useful for testing different paths without re-running from scratch)
target_config = history[2].config
result = graph.invoke(None, target_config) # resume from checkpointTechnique 4: Add Explicit Debug Nodes
For persistent issues, add dedicated debug nodes to your graph that log state details to stdout or a file. These are temporary -- remove them before production -- but they're invaluable when streaming alone doesn't give you enough context.
import json
from datetime import datetime
def debug_node(state: dict) -> dict:
"""Drop this node anywhere in your graph to inspect state."""
print(f"\n{'='*50}")
print(f"DEBUG NODE @ {datetime.now().isoformat()}")
print(f"Messages count: {len(state.get('messages', []))}")
print(f"Last message: {state.get('messages', [{}])[-1]}")
print(f"Tool calls pending: {state.get('tool_calls', [])}")
print(f"{'='*50}\n")
return state # pass state through unchanged
# Add to your graph wherever you need visibility
builder.add_node("debug", debug_node)
builder.add_edge("suspicious_node", "debug")
builder.add_edge("debug", "next_node")Technique 5: LangSmith Tracing
For production observability, LangSmith gives you a web UI showing every graph execution as a full trace: each node, its inputs and outputs, latency, token usage, and the path taken through the graph. It's the closest thing LangGraph has to a debugger.
# Enable LangSmith tracing -- just set environment variables
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your_langsmith_key"
os.environ["LANGSMITH_PROJECT"] = "my-agent-project"
# That's it -- all graph invocations now appear in the LangSmith UI
result = graph.invoke(input_state, config)LangSmith has a free tier. Even if you don't use it in production, running it locally during development cuts debugging time significantly.
Diagnosing Common Failure Patterns
| Symptom | Likely cause | Fix |
|---|---|---|
| Agent loops forever | Conditional edge condition never returns END | Check edge logic; add recursion limit |
| Agent stops after 1 step | Edge condition returns END prematurely | Inspect state after step 1 with get_state() |
| Wrong tool called | Tool descriptions are ambiguous or overlapping | Rewrite tool descriptions; use stream to trace |
| State appears empty | Node returning None instead of updated state | Ensure every node returns a dict or TypedDict |
| Works locally, fails in prod | InMemorySaver used in prod -- state wipes on restart | Switch to SqliteSaver or PostgresSaver |
Quick Reference
- Set recursion_limit in config to prevent infinite loops -- start at 25
- Use graph.stream(stream_mode='updates') to trace execution node by node
- Use graph.get_state() and get_state_history() to inspect checkpointed state
- Add temporary debug_node functions to inspect state at specific points
- Enable LangSmith tracing with two environment variables for production observability
- Every node must return a dict -- returning None silently corrupts state