Dify's default knowledge base setup works for demos. Here's what you need to change before it's production-ready.

The Gap

Dify's built-in knowledge base is one of its strongest features: upload documents, connect the knowledge base to an agent or workflow, and it retrieves relevant chunks on demand. The default setup gets you to a working demo in minutes.

But the default settings -- auto-chunking, no metadata, no filtering -- are not production defaults. They're onboarding defaults. When your knowledge base grows beyond a handful of documents, retrieval quality drops, stale content surfaces, and you have no way to restrict what gets retrieved for a given query.

This article walks through the four settings that matter most for production RAG in Dify.

1. Chunking Strategy

Dify offers three indexing modes: High Quality, Economy, and QA Pairing. Most tutorials use High Quality without explaining what it actually does, or how to tune it.

Mode What it does When to use it
High Quality Uses an LLM to generate embeddings -- better semantic understanding, higher cost Most production use cases
Economy Uses keyword-based indexing -- fast and cheap, lower recall quality Large document sets where cost matters more than precision
QA Pairing Automatically generates Q&A pairs from your documents -- very high precision for FAQ-style retrieval Support docs, product manuals, structured knowledge

Within High Quality mode, the key settings to tune are chunk size and overlap. Dify's defaults (1000 characters, 200 overlap) work for general text. Adjust them for your content type:

Content type Recommended chunk size Overlap
General text / articles 800–1000 chars 150–200
Technical documentation / code 400–600 chars 80–100
Legal / dense structured text 1200–1500 chars 200–300
Short FAQ entries 200–400 chars 50
If retrieval returns chunks that seem off-topic, your chunks are too large -- the similarity score is being diluted by irrelevant surrounding text. If retrieved chunks are missing critical context, they're too small. Tune chunk size first before touching anything else.

2. Metadata: The Feature Most Builders Skip

Every document uploaded to a Dify knowledge base can have custom metadata attached: source, author, date, category, product version, department -- whatever dimensions matter for filtering. Almost nobody sets this up during onboarding, and almost everybody wishes they had when the knowledge base grows.

Adding metadata to documents

  1. Go to your Knowledge Base in Dify.
  2. Click on a document to open its settings.
  3. Add metadata key-value pairs under Document Metadata.
  4. Repeat for all documents, or use the API to batch-set metadata.
{
  "source": "help-center",
  "category": "billing",
  "product_version": "3.2",
  "last_updated": "2026-01-15",
  "audience": "admin"
}

Using metadata in retrieval

Once metadata is set, you can filter retrieval in two ways: statically (always filter by a fixed value) or dynamically (filter based on the user's context, detected by the agent or workflow).

In a Dify workflow, use the Knowledge Retrieval node and set a Metadata Filter condition. For example, only retrieve documents where category equals the value passed in from the user query classifier.

Dify's metadata filter support varies by the vector store backend you've configured. Weaviate and Qdrant have full filter support. If you're using the built-in vector store, check the current filter capabilities in the Dify docs -- they expand with each release.

3. Handling Document Updates

When a document in your knowledge base changes, you need to re-index it. Dify does not automatically detect source changes -- you have to trigger re-indexing manually or via the API.

The re-indexing workflow

  1. In the Knowledge Base, select the outdated document and click Re-Index.
  2. For automated re-indexing, use the Dify API: POST /datasets/{dataset_id}/documents/{document_id}/reindex.
  3. Build a scheduled Dify workflow (or external cron job) that calls this endpoint for time-sensitive documents.
# Re-index a specific document via Dify API
curl -X POST 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}/reindex' \
  -H 'Authorization: Bearer {api_key}'
 
# List documents to find stale ones (check updated_at vs your source)
curl 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
  -H 'Authorization: Bearer {api_key}'

For knowledge bases that pull from external sources (Notion, Google Drive, web URLs), Dify's Knowledge Pipeline feature (released 2025) provides a visual ETL pipeline that can be scheduled to sync and re-index automatically -- this is the cleanest production option if you're on a recent Dify version.

4. Retrieval Mode: Vector vs Full-Text vs Hybrid

Dify supports three retrieval modes. Most demos use vector search only. For production, hybrid search almost always outperforms either mode alone.

Mode How it works Best for
Vector search Semantic similarity -- finds conceptually related chunks Natural language queries, fuzzy matching
Full-text search Keyword matching -- finds exact terms Structured queries, product names, IDs, codes
Hybrid (recommended) Combines vector + keyword scores with a reranker Most production use cases

To enable hybrid search in Dify: go to Knowledge Base Settings > Retrieval Settings > select Hybrid Search. You can also enable a Reranker model (Cohere Rerank, BGE Reranker, etc.) to re-score the top results before returning them to the agent. This consistently improves answer quality.

If your users regularly ask about specific product names, version numbers, or IDs, pure vector search will miss them -- it finds concepts, not exact strings. Hybrid search fixes this.

Production RAG Checklist for Dify

  • Indexing mode set to High Quality (not Economy) for production use cases
  • Chunk size tuned for your document type -- not left at default
  • Metadata added to all documents (source, category, date at minimum)
  • Metadata filters configured in Knowledge Retrieval nodes
  • Re-indexing pipeline set up for documents that change -- manual or automated via API
  • Retrieval mode set to Hybrid Search with a reranker enabled
  • Knowledge Pipeline used for external sources (Notion, Drive, URLs) if on recent Dify version