LlamaIndex has at least five ways to query your data. Most tutorials only show one. Here is when to use each.

The Problem

LlamaIndex has a rich API surface: QueryEngine, ChatEngine, RouterQueryEngine, SubQuestionQueryEngine, AgentRunner, and more. The docs explain each individually, but they do not tell you which to use for your specific use case. Most tutorials default to the simplest option (VectorStoreIndex.as_query_engine()) without explaining when that is the wrong choice.

This guide gives you the decision framework.

The Options at a Glance

Abstraction Best for Remembers conversation? Uses tools?
QueryEngine Single-turn Q&A over documents No No
ChatEngine Multi-turn conversation over documents Yes No
RouterQueryEngine Routing queries to different indexes/sources No No
SubQuestionQueryEngine Complex multi-part questions No No
AgentRunner (ReAct/Function) Tasks requiring reasoning, tool use, or multi-step logic Optional Yes

Option 1: QueryEngine -- Single-Turn Q&A

QueryEngine is the simplest abstraction. You ask a question, it retrieves relevant chunks, synthesises an answer, and returns it. No conversation history, no tool use, no multi-step reasoning.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
 
# Load documents and build index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
 
# Create a query engine
query_engine = index.as_query_engine(
    similarity_top_k=5,       # retrieve top 5 chunks
    response_mode="compact",  # concise answers
)
 
# Single-turn query
response = query_engine.query("What is the refund policy?")
print(response)

Use QueryEngine when: you are building a search feature, a document Q&A endpoint, or any use case where each question is independent.

Option 2: ChatEngine -- Multi-Turn Conversation

ChatEngine wraps a QueryEngine with conversation memory. It maintains chat history and uses it to resolve follow-up questions contextually. 'What is the refund policy?' followed by 'How long does it take?' will correctly interpret 'it' as the refund.

chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",  # best for most use cases
    similarity_top_k=5,
    verbose=True,
)
 
# Multi-turn conversation
response1 = chat_engine.chat("What is the refund policy?")
response2 = chat_engine.chat("How long does it take?")   # resolves 'it' correctly
response3 = chat_engine.chat("Can I get a partial refund?")
 
# Reset conversation for a new session
chat_engine.reset()
chat_mode option When to use it
condense_plus_context Best for most conversational RAG -- condenses history and retrieves fresh context
context Simple: always retrieves and injects context, no history condensation
simple No retrieval -- pure conversation with the LLM, no document context
react Uses ReAct reasoning with the index as a tool -- closest to an agent
If users ask follow-up questions ('tell me more', 'what about X?', 'and for Y?'), you need ChatEngine. QueryEngine will treat each follow-up as an isolated question and lose the context.

Option 3: RouterQueryEngine -- Multiple Sources

RouterQueryEngine lets you build indexes over multiple data sources (or the same data with different configurations) and route each query to the most relevant source. The router LLM reads the query and chooses which engine to use.

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool
 
# Build separate indexes for different document types
policy_engine = policy_index.as_query_engine()
product_engine = product_index.as_query_engine()
pricing_engine = pricing_index.as_query_engine()
 
# Wrap as tools with descriptions
tools = [
    QueryEngineTool.from_defaults(
        query_engine=policy_engine,
        description="Use for questions about policies, terms, and legal documents",
    ),
    QueryEngineTool.from_defaults(
        query_engine=product_engine,
        description="Use for questions about product features and specifications",
    ),
    QueryEngineTool.from_defaults(
        query_engine=pricing_engine,
        description="Use for questions about pricing, plans, and billing",
    ),
]
 
router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=tools,
)
 
# Router automatically picks the right source
response = router_engine.query("How much does the Pro plan cost?")

Option 4: AgentRunner -- Reasoning and Tool Use

AgentRunner is LlamaIndex's agent abstraction. Use it when your use case requires multi-step reasoning, combining information from multiple queries, calling external APIs, or doing computation alongside retrieval.

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool
 
# Your index as a tool
knowledge_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="knowledge_base",
    description="Search the company knowledge base for policies, products, and pricing",
)
 
# A custom function tool
def calculate_discount(base_price: float, percent: float) -> str:
    discounted = base_price * (1 - percent / 100)
    return f"Discounted price: ${discounted:.2f}"
 
calc_tool = FunctionTool.from_defaults(fn=calculate_discount)
 
# Build agent with both tools
agent = ReActAgent.from_tools(
    [knowledge_tool, calc_tool],
    verbose=True,
    max_iterations=10,
)
 
# Agent can reason across multiple steps
response = agent.chat(
    "What is the Pro plan price, and what would it cost with a 20% discount?"
)

Decision Guide

If your use case is... Use this
Single Q&A over documents, no history needed QueryEngine
Conversational assistant with follow-up questions ChatEngine (condense_plus_context)
Multiple document sources, route by topic RouterQueryEngine
Complex multi-part questions ('Compare X and Y and then...') SubQuestionQueryEngine
Needs tool use, calculation, or multi-step reasoning AgentRunner (ReAct or Function)
Fully autonomous task completion AgentRunner with broad toolset

Quick Reference

  • QueryEngine: single-turn, document Q&A, no history
  • ChatEngine: multi-turn conversation, use condense_plus_context mode for most cases
  • RouterQueryEngine: multiple indexes or topics, LLM picks the right source
  • AgentRunner: tool use, multi-step reasoning, external API calls alongside retrieval
  • When in doubt: start with ChatEngine and upgrade to AgentRunner only when you need tools or reasoning