High latency is Voiceflow's most-reported production pain point. Here are the root causes and the fixes that actually work.

The Problem

Voiceflow regularly reports response times above 600-700ms -- enough to create a noticeable pause that disrupts natural conversation, especially in voice interfaces. Users notice anything above 400ms in a chat and 300ms in voice. This latency is the top complaint across Voiceflow reviews on G2, Reddit, and the community forum.

Latency in a Voiceflow agent comes from four stacked sources: the LLM call, the knowledge base retrieval, the integration call (if any), and Voiceflow's own platform overhead. You cannot eliminate all of it, but you can significantly reduce it by optimising each layer.

Layer 1: LLM Model Selection

The biggest single factor in response time is which LLM you use. GPT-4o and Claude Sonnet are significantly slower than GPT-4o-mini or Claude Haiku for most conversational turns. For chatbots where speed matters more than maximum reasoning depth, switching models is the fastest win.

Model Typical latency Best for
GPT-4o / Claude Sonnet 600-1200ms Complex reasoning, nuanced responses
GPT-4o-mini / Claude Haiku 150-400ms Most conversational turns, FAQ, routing
GPT-3.5-turbo 100-300ms Very simple responses -- quality trade-off

For most Voiceflow chatbots, a hybrid approach works best: use a fast model (GPT-4o-mini or Haiku) for the majority of turns, and route only complex questions to the slower, more capable model.

Audit your flows for which steps actually need GPT-4o-level reasoning vs which are simple FAQ or routing. You will likely find 70-80% of turns can run on a faster model with no quality difference the user can detect.

Layer 2: Knowledge Base Retrieval

If your agent searches a knowledge base on every turn, retrieval adds 200-500ms. Two optimisations reduce this significantly:

Only search when necessary

Add a classification step before the knowledge base search: ask the LLM (using a fast model) whether this query requires a knowledge base lookup. Simple chitchat, clarification, and acknowledgement turns do not need retrieval.

// In your Voiceflow flow: add a Condition block before the KB step
// Route to KB search ONLY if the query is a knowledge question
 
Condition:
  If [AI Step: Is this question answerable without documentation? YES]
    --> Direct response (no KB search)
  Else
    --> Knowledge Base Search --> Response

Limit the number of chunks retrieved

The default Voiceflow knowledge base retrieves 3-5 chunks per query. For most queries, 2 chunks is sufficient and retrieves faster. Reduce this in your Knowledge Base settings and monitor whether answer quality changes.

Layer 3: Integration Calls

Every API call to an external system (CRM, database, third-party service) adds its own latency. Two rules:

  • Only call integrations when the agent actually needs the data -- not pre-emptively at conversation start
  • Cache results that are unlikely to change mid-conversation (e.g. user account details fetched at session start) in Voiceflow variables rather than re-fetching on every turn
// Pattern: fetch user data once at session start, store in variables
// Do NOT re-fetch on every agent step
 
[Start of conversation]
  --> API Call: Get User Account --> Store in {user_name}, {user_plan}, {user_id}
  --> Begin conversation (use stored variables throughout -- no more API calls)

Layer 4: Flow Architecture

Poorly structured flows add latency through unnecessary AI steps, redundant condition checks, and sequential steps that could be parallelised. Review your flows for:

  • AI steps that run on every turn but only add value on specific turns -- gate them with conditions
  • Sequential API calls that are independent of each other -- Voiceflow does not support true parallel calls, but you can restructure to avoid unnecessary sequential ones
  • Long system prompts -- trim aggressively. Every token in the system prompt is processed on every turn. A 2000-token prompt adds latency vs a 400-token prompt.

Trim your system prompt

// Before: 2000+ token system prompt with everything in it
You are a helpful customer service assistant for Acme Corp. Acme Corp was founded in...
[500 words of company history]
[200 words of product descriptions]
[300 words of policy details]
 
// After: tight system prompt, knowledge in the KB not the prompt
You are Acme's customer service assistant. Answer questions about products and policy
using the knowledge base. Be concise. Escalate billing disputes to human agents.

Measuring Your Latency

Before optimising, establish a baseline. Voiceflow's analytics panel shows response times for each step. Identify which steps are the slowest contributors and focus your optimisation there rather than guessing.

  • Open your project and go to Analytics
  • Look at the Step Duration breakdown for your most-used flows
  • The AI Response step and any API Integration steps are usually the top contributors
  • Set a target: most users tolerate up to 400ms for chat, 300ms for voice

Quick Reference

  • Switch to GPT-4o-mini or Claude Haiku for most conversational turns -- 3-5x faster
  • Gate knowledge base searches with a classifier -- only retrieve when necessary
  • Reduce chunk retrieval to 2 per query and monitor quality
  • Cache per-session data in Voiceflow variables -- do not re-fetch on every turn
  • Trim system prompts aggressively -- move knowledge to the KB, not the prompt
  • Use Analytics > Step Duration to find which steps are slowest before optimising