Managing Your Knowledge Base

Last updated: May 20, 2025

Effectively Managing Your ConvoDocs Knowledge Base

Your Knowledge Base is the core of your ConvoDocs agent's intelligence. It's the repository of information your agent uses to understand queries and formulate accurate responses. This guide details how to build, manage, and optimize your knowledge base for peak performance.

Understanding Content Types

ConvoDocs supports several content types to build a comprehensive knowledge base:

Plain Text Content:
- Ideal for FAQs, short answers, definitions, or specific pieces of information not already in a document.
- You can directly type or paste text into the system.
- Supports basic formatting if you input HTML directly (though Markdown is often preferred for easier input if your system processes it before saving as HTML).
Website URLs:
- Allows your agent to learn from existing online content like blog posts, FAQ pages, product documentation, or any public webpage.
- The system will typically crawl the provided URL, extract relevant text content, and process it.
- Ensure the URLs are publicly accessible and contain meaningful textual content. Sites heavy on JavaScript rendering without server-side content might be challenging.
- For multi-page documentation, you might need to add several key URLs.
Documents (e.g., PDF, DOCX, TXT):
- Upload your existing documents directly. This is perfect for manuals, policy documents, product specifications, legal texts, and more.
- Supported formats commonly include: PDF (Portable Document Format), DOCX (Microsoft Word), TXT (Plain Text). Check your ConvoDocs version for a full list of supported types.
- The system extracts text from these documents for processing. For PDFs, ensure they are text-based and not image-only scans for best results.

Adding and Processing Content

The process of adding content is straightforward:

Navigate to Your Agent's Knowledge Base: From the agent's settings or dashboard, find the "Knowledge Base" or "Sources" section.
Choose Content Type: Select whether you're adding text, a URL, or uploading a document.
Provide Content:
- For text, type or paste it in.
- For URLs, enter the full web address.
- For documents, use the upload interface to select files from your computer.
Initiate Processing: After adding the content, the system will need to process it. This involves several steps:
- Text Extraction: Getting the raw text from documents or URLs.
- Cleaning: Removing irrelevant elements like HTML tags (if not desired), excessive whitespace, or boilerplate text.
- Chunking: Breaking down large pieces of text into smaller, manageable segments suitable for AI processing.
- Embedding Generation: This is a crucial step where each text chunk is converted into a numerical representation (an "embedding" or vector). These embeddings allow the AI to understand the semantic meaning of the text and find relevant information even if the query doesn't use the exact same keywords.
Wait for Completion: Processing time can vary depending on the size and type of content. The system usually indicates the status (e.g., "Processing," "Completed," "Error").

Understanding Document Processing & Embeddings

PDF Text Extraction: Prioritizes extracting actual text layers. OCR (Optical Character Recognition) might be applied to scanned PDFs if available, but results can vary.
DOCX Parsing: Extracts text content while generally preserving structure like headings and paragraphs better than a simple copy-paste.
Markdown Support (if applicable): If you input content as Markdown and the system is configured to process it (like our template filter does), it will be converted to HTML for consistent display and understanding.
Automatic Embedding Generation: This is the AI magic. High-quality embeddings are key to your agent's ability to find relevant information. The quality of the source text directly impacts embedding quality.

Best Practices for Knowledge Base Management

Organize Content Logically: While the AI can find information across disparate sources, a well-organized knowledge base (e.g., clear document titles, well-structured text) can sometimes aid in maintenance and understanding what information the agent has.
Keep Documents Up-to-Date: Regularly review and update your knowledge base content to ensure accuracy. Outdated information leads to incorrect agent responses. Establish a content review cycle.
Prioritize Quality over Quantity: Clear, concise, and accurate information is more valuable than large volumes of poorly written or irrelevant content.
Monitor Processing Status: After uploading new content, check that it processed successfully. Address any errors promptly.
Test Embedding Quality (Iterative Process):
- After adding new content, test your agent with various questions related to that content.
- If the agent struggles to find relevant information, the source text might need to be clearer, or the content might need to be broken down or rephrased.
- Consider using different phrasing or adding synonyms for key concepts within your source documents.
Use Specific and Descriptive Titles for Text Entries/Documents: This helps you manage your sources more easily.
Avoid Redundancy Where Possible: While some overlap is okay, massive redundancy can sometimes confuse the agent or lead to inconsistent answers if one source is updated and the other isn't.
Consider the User's Perspective: Structure information in a way that aligns with how your users are likely to ask questions.

A well-maintained knowledge base is the foundation of a successful ConvoDocs agent, enabling it to provide timely, accurate, and helpful support.