Dify's default knowledge base setup works for demos. Here's what you need to change before it's production-ready.
The Gap
Dify's built-in knowledge base is one of its strongest features: upload documents, connect the knowledge base to an agent or workflow, and it retrieves relevant chunks on demand. The default setup gets you to a working demo in minutes.
But the default settings -- auto-chunking, no metadata, no filtering -- are not production defaults. They're onboarding defaults. When your knowledge base grows beyond a handful of documents, retrieval quality drops, stale content surfaces, and you have no way to restrict what gets retrieved for a given query.
This article walks through the four settings that matter most for production RAG in Dify.
1. Chunking Strategy
Dify offers three indexing modes: High Quality, Economy, and QA Pairing. Most tutorials use High Quality without explaining what it actually does, or how to tune it.
| Mode | What it does | When to use it |
|---|---|---|
| High Quality | Uses an LLM to generate embeddings -- better semantic understanding, higher cost | Most production use cases |
| Economy | Uses keyword-based indexing -- fast and cheap, lower recall quality | Large document sets where cost matters more than precision |
| QA Pairing | Automatically generates Q&A pairs from your documents -- very high precision for FAQ-style retrieval | Support docs, product manuals, structured knowledge |
Within High Quality mode, the key settings to tune are chunk size and overlap. Dify's defaults (1000 characters, 200 overlap) work for general text. Adjust them for your content type:
| Content type | Recommended chunk size | Overlap |
|---|---|---|
| General text / articles | 800–1000 chars | 150–200 |
| Technical documentation / code | 400–600 chars | 80–100 |
| Legal / dense structured text | 1200–1500 chars | 200–300 |
| Short FAQ entries | 200–400 chars | 50 |
If retrieval returns chunks that seem off-topic, your chunks are too large -- the similarity score is being diluted by irrelevant surrounding text. If retrieved chunks are missing critical context, they're too small. Tune chunk size first before touching anything else.2. Metadata: The Feature Most Builders Skip
Every document uploaded to a Dify knowledge base can have custom metadata attached: source, author, date, category, product version, department -- whatever dimensions matter for filtering. Almost nobody sets this up during onboarding, and almost everybody wishes they had when the knowledge base grows.
Adding metadata to documents
- Go to your Knowledge Base in Dify.
- Click on a document to open its settings.
- Add metadata key-value pairs under Document Metadata.
- Repeat for all documents, or use the API to batch-set metadata.
{
"source": "help-center",
"category": "billing",
"product_version": "3.2",
"last_updated": "2026-01-15",
"audience": "admin"
}Using metadata in retrieval
Once metadata is set, you can filter retrieval in two ways: statically (always filter by a fixed value) or dynamically (filter based on the user's context, detected by the agent or workflow).
In a Dify workflow, use the Knowledge Retrieval node and set a Metadata Filter condition. For example, only retrieve documents where category equals the value passed in from the user query classifier.
Dify's metadata filter support varies by the vector store backend you've configured. Weaviate and Qdrant have full filter support. If you're using the built-in vector store, check the current filter capabilities in the Dify docs -- they expand with each release.3. Handling Document Updates
When a document in your knowledge base changes, you need to re-index it. Dify does not automatically detect source changes -- you have to trigger re-indexing manually or via the API.
The re-indexing workflow
- In the Knowledge Base, select the outdated document and click Re-Index.
- For automated re-indexing, use the Dify API: POST /datasets/{dataset_id}/documents/{document_id}/reindex.
- Build a scheduled Dify workflow (or external cron job) that calls this endpoint for time-sensitive documents.
# Re-index a specific document via Dify API
curl -X POST 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}/reindex' \
-H 'Authorization: Bearer {api_key}'
# List documents to find stale ones (check updated_at vs your source)
curl 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
-H 'Authorization: Bearer {api_key}'For knowledge bases that pull from external sources (Notion, Google Drive, web URLs), Dify's Knowledge Pipeline feature (released 2025) provides a visual ETL pipeline that can be scheduled to sync and re-index automatically -- this is the cleanest production option if you're on a recent Dify version.
4. Retrieval Mode: Vector vs Full-Text vs Hybrid
Dify supports three retrieval modes. Most demos use vector search only. For production, hybrid search almost always outperforms either mode alone.
| Mode | How it works | Best for |
|---|---|---|
| Vector search | Semantic similarity -- finds conceptually related chunks | Natural language queries, fuzzy matching |
| Full-text search | Keyword matching -- finds exact terms | Structured queries, product names, IDs, codes |
| Hybrid (recommended) | Combines vector + keyword scores with a reranker | Most production use cases |
To enable hybrid search in Dify: go to Knowledge Base Settings > Retrieval Settings > select Hybrid Search. You can also enable a Reranker model (Cohere Rerank, BGE Reranker, etc.) to re-score the top results before returning them to the agent. This consistently improves answer quality.
If your users regularly ask about specific product names, version numbers, or IDs, pure vector search will miss them -- it finds concepts, not exact strings. Hybrid search fixes this.Production RAG Checklist for Dify
- Indexing mode set to High Quality (not Economy) for production use cases
- Chunk size tuned for your document type -- not left at default
- Metadata added to all documents (source, category, date at minimum)
- Metadata filters configured in Knowledge Retrieval nodes
- Re-indexing pipeline set up for documents that change -- manual or automated via API
- Retrieval mode set to Hybrid Search with a reranker enabled
- Knowledge Pipeline used for external sources (Notion, Drive, URLs) if on recent Dify version