LlamaIndex: Query Engine, Chat Engine, or Agent? A Decision Guide

LlamaIndex has at least five ways to query your data. Most tutorials only show one. Here is when to use each.

The Problem

LlamaIndex has a rich API surface: QueryEngine, ChatEngine, RouterQueryEngine, SubQuestionQueryEngine, AgentRunner, and more. The docs explain each individually, but they do not tell you which to use for your specific use case. Most tutorials default to the simplest option (VectorStoreIndex.as_query_engine()) without explaining when that is the wrong choice.

This guide gives you the decision framework.

The Options at a Glance

Abstraction	Best for	Remembers conversation?	Uses tools?
QueryEngine	Single-turn Q&A over documents	No	No
ChatEngine	Multi-turn conversation over documents	Yes	No
RouterQueryEngine	Routing queries to different indexes/sources	No	No
SubQuestionQueryEngine	Complex multi-part questions	No	No
AgentRunner (ReAct/Function)	Tasks requiring reasoning, tool use, or multi-step logic	Optional	Yes

Option 1: QueryEngine -- Single-Turn Q&A

QueryEngine is the simplest abstraction. You ask a question, it retrieves relevant chunks, synthesises an answer, and returns it. No conversation history, no tool use, no multi-step reasoning.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
 
# Load documents and build index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
 
# Create a query engine
query_engine = index.as_query_engine(
    similarity_top_k=5,       # retrieve top 5 chunks
    response_mode="compact",  # concise answers
)
 
# Single-turn query
response = query_engine.query("What is the refund policy?")
print(response)

Use QueryEngine when: you are building a search feature, a document Q&A endpoint, or any use case where each question is independent.

Option 2: ChatEngine -- Multi-Turn Conversation

ChatEngine wraps a QueryEngine with conversation memory. It maintains chat history and uses it to resolve follow-up questions contextually. 'What is the refund policy?' followed by 'How long does it take?' will correctly interpret 'it' as the refund.

chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",  # best for most use cases
    similarity_top_k=5,
    verbose=True,
)
 
# Multi-turn conversation
response1 = chat_engine.chat("What is the refund policy?")
response2 = chat_engine.chat("How long does it take?")   # resolves 'it' correctly
response3 = chat_engine.chat("Can I get a partial refund?")
 
# Reset conversation for a new session
chat_engine.reset()

chat_mode option	When to use it
condense_plus_context	Best for most conversational RAG -- condenses history and retrieves fresh context
context	Simple: always retrieves and injects context, no history condensation
simple	No retrieval -- pure conversation with the LLM, no document context
react	Uses ReAct reasoning with the index as a tool -- closest to an agent

If users ask follow-up questions ('tell me more', 'what about X?', 'and for Y?'), you need ChatEngine. QueryEngine will treat each follow-up as an isolated question and lose the context.

Option 3: RouterQueryEngine -- Multiple Sources

RouterQueryEngine lets you build indexes over multiple data sources (or the same data with different configurations) and route each query to the most relevant source. The router LLM reads the query and chooses which engine to use.

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool
 
# Build separate indexes for different document types
policy_engine = policy_index.as_query_engine()
product_engine = product_index.as_query_engine()
pricing_engine = pricing_index.as_query_engine()
 
# Wrap as tools with descriptions
tools = [
    QueryEngineTool.from_defaults(
        query_engine=policy_engine,
        description="Use for questions about policies, terms, and legal documents",
    ),
    QueryEngineTool.from_defaults(
        query_engine=product_engine,
        description="Use for questions about product features and specifications",
    ),
    QueryEngineTool.from_defaults(
        query_engine=pricing_engine,
        description="Use for questions about pricing, plans, and billing",
    ),
]
 
router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=tools,
)
 
# Router automatically picks the right source
response = router_engine.query("How much does the Pro plan cost?")

Option 4: AgentRunner -- Reasoning and Tool Use

AgentRunner is LlamaIndex's agent abstraction. Use it when your use case requires multi-step reasoning, combining information from multiple queries, calling external APIs, or doing computation alongside retrieval.

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool
 
# Your index as a tool
knowledge_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="knowledge_base",
    description="Search the company knowledge base for policies, products, and pricing",
)
 
# A custom function tool
def calculate_discount(base_price: float, percent: float) -> str:
    discounted = base_price * (1 - percent / 100)
    return f"Discounted price: ${discounted:.2f}"
 
calc_tool = FunctionTool.from_defaults(fn=calculate_discount)
 
# Build agent with both tools
agent = ReActAgent.from_tools(
    [knowledge_tool, calc_tool],
    verbose=True,
    max_iterations=10,
)
 
# Agent can reason across multiple steps
response = agent.chat(
    "What is the Pro plan price, and what would it cost with a 20% discount?"
)

Decision Guide

If your use case is...	Use this
Single Q&A over documents, no history needed	QueryEngine
Conversational assistant with follow-up questions	ChatEngine (condense_plus_context)
Multiple document sources, route by topic	RouterQueryEngine
Complex multi-part questions ('Compare X and Y and then...')	SubQuestionQueryEngine
Needs tool use, calculation, or multi-step reasoning	AgentRunner (ReAct or Function)
Fully autonomous task completion	AgentRunner with broad toolset

Quick Reference

QueryEngine: single-turn, document Q&A, no history
ChatEngine: multi-turn conversation, use condense_plus_context mode for most cases
RouterQueryEngine: multiple indexes or topics, LLM picks the right source
AgentRunner: tool use, multi-step reasoning, external API calls alongside retrieval
When in doubt: start with ChatEngine and upgrade to AgentRunner only when you need tools or reasoning