Voiceflow Testing and Version Management: The Workflow the Docs Don't Show You

Comparing agents, managing versions, and running regression tests in Voiceflow requires workarounds. Here they are.

The Testing Gap

Voiceflow has a built-in prototype tester -- you can chat with your agent in the canvas and see responses. What it does not have is a structured testing workflow: no test suites, no regression testing, no side-by-side version comparison, no automated conversation playback.

When you change a knowledge base, update a prompt, or add a new flow path, you have no way to verify the change did not break existing behaviour -- other than manually testing every scenario again. This article covers the workarounds that Voiceflow power users have built.

Version Management with Backups

Voiceflow's version control is basic: you can publish a version, but rolling back and comparing versions requires manual work. The community workaround: treat each significant change as a named backup.

Before making significant changes, duplicate your project (Project Settings > Duplicate). Name the duplicate with a version label and date: 'MyBot v2.3 -- 2026-03-15'.
Make your changes in the original project.
If the changes cause problems, you have a named backup to reference or restore from.
Delete old backups monthly to avoid accumulating dozens of stale copies.

Voiceflow does not sync duplicated projects -- they diverge immediately. A duplicate is a snapshot, not a branch. Do not use it as a long-term parallel development environment.

A/B Testing Two Versions

To compare two versions of your agent, the most reliable approach is to run them as separate Voiceflow projects and use the API to route a percentage of traffic to each.

// In your backend: route traffic between two Voiceflow projects
const AGENT_A_API_KEY = process.env.VOICEFLOW_AGENT_A_KEY;
const AGENT_B_API_KEY = process.env.VOICEFLOW_AGENT_B_KEY;
 
function getApiKeyForUser(userId) {
  // Stable routing: same user always goes to same variant
  const hash = userId.split('').reduce((acc, c) => acc + c.charCodeAt(0), 0);
  return hash % 100 < 50 ? AGENT_A_API_KEY : AGENT_B_API_KEY;
}
 
async function sendMessage(userId, message) {
  const apiKey = getApiKeyForUser(userId);
  const response = await fetch(
    `https://general-runtime.voiceflow.com/state/user/${userId}/interact`,
    {
      method: 'POST',
      headers: { Authorization: apiKey, 'Content-Type': 'application/json' },
      body: JSON.stringify({ action: { type: 'text', payload: message } }),
    }
  );
  // Log which variant was used for analysis
  console.log({ userId, variant: apiKey === AGENT_A_API_KEY ? 'A' : 'B' });
  return response.json();
}

Building a Regression Test Suite

Because Voiceflow has no built-in test runner, the community approach is to build a simple script that replays a set of test conversations via the Voiceflow Runtime API and checks the responses against expected outputs.

// regression-test.js
// Run after every significant change to catch regressions
const TEST_CASES = [
  {
    name: "Greeting response",
    input: "Hello",
    expectedContains: ["hi", "hello", "welcome"],  // any of these should appear
  },
  {
    name: "Refund policy question",
    input: "What is your refund policy?",
    expectedContains: ["30 day", "refund", "return"],
  },
  {
    name: "Escalation trigger",
    input: "I want to speak to a human",
    expectedContains: ["transfer", "agent", "representative"],
  },
];
 
async function runTests() {
  let passed = 0, failed = 0;
 
  for (const test of TEST_CASES) {
    // Start a fresh session for each test
    const userId = `test-${Date.now()}-${Math.random()}`;
 
    const response = await fetch(
      `https://general-runtime.voiceflow.com/state/user/${userId}/interact`,
      {
        method: 'POST',
        headers: {
          Authorization: process.env.VOICEFLOW_API_KEY,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({ action: { type: 'text', payload: test.input } }),
      }
    );
 
    const data = await response.json();
    const responseText = data
      .filter(t => t.type === 'text')
      .map(t => t.payload.message.toLowerCase())
      .join(' ');
 
    const pass = test.expectedContains.some(keyword => responseText.includes(keyword));
    console.log(`${pass ? 'PASS' : 'FAIL'}: ${test.name}`);
    if (!pass) {
      console.log(`  Expected one of: ${test.expectedContains.join(', ')}`);
      console.log(`  Got: ${responseText.substring(0, 200)}`);
      failed++;
    } else {
      passed++;
    }
  }
 
  console.log(`\nResults: ${passed} passed, ${failed} failed`);
  process.exit(failed > 0 ? 1 : 0);
}
 
runTests();

Run your regression test suite via the Voiceflow Runtime API, not through the canvas tester. The API gives you programmatic access and lets you run many test cases quickly without manual clicking. Add it to your CI/CD pipeline or run it manually before publishing a new version.

Fixing the Transcript Export Gap

A common issue: when exporting transcripts to Google Sheets via the Voiceflow integration, only the start message appears instead of the full conversation. This is a known limitation of the native integration. The fix is to use the Transcripts API directly.

// Fetch full transcripts via the Voiceflow API instead of native export
const response = await fetch(
  `https://api.voiceflow.com/v2/transcripts/${PROJECT_ID}`,
  {
    headers: { Authorization: `Bearer ${API_KEY}` }
  }
);
 
const transcripts = await response.json();
 
// Each transcript has the full conversation history
for (const transcript of transcripts.data) {
  const turns = transcript.turns.map(turn => ({
    role: turn.type,
    message: turn.payload?.message || '',
    timestamp: turn.startTime,
  }));
  // Write to your own storage, Google Sheets via API, or analytics tool
  await writeToSheets(turns);
}

Quick Reference

Version management: duplicate the project before significant changes, name with version + date
A/B testing: run as two separate projects, route traffic via your backend using the Runtime API
Regression testing: use the Runtime API to replay test cases programmatically after each change
Transcript export: use the Transcripts API directly -- the native Google Sheets integration only captures the start message
Run regression tests before every publish, not just before major releases