Best PracticestrainingAIcustomization

Training Chatbots on Custom Data: Complete Knowledge Base Guide

AI chatbots are only as good as their training data. Learn how to audit, structure, and maintain a knowledge base that produces accurate, helpful responses.

Nedim Mehic

Nedim Mehic

8 min read
Training Chatbots on Custom Data: Complete Knowledge Base Guide

AI chatbots learn from your content. Give them good content, they give good answers. Give them incomplete or outdated content, they'll confidently deliver incomplete or outdated answers.

This guide covers how to audit, structure, and maintain a knowledge base that produces actually helpful chatbot responses.

The Content Quality Problem

Content Quality Chatbot Result
Complete answers "Here's everything you need to know..."
Partial answers "Returns accepted within 30 days." (unhelpful)
Missing answers "I don't have information about that."
Conflicting answers Wrong answer 50% of the time
Outdated answers Confidently incorrect

Most chatbot accuracy issues aren't AI problems—they're content problems.

Typical Failure Analysis

Analyzing failed chatbot conversations typically reveals:

Issue Frequency Root Cause
"That doesn't fully answer my question" 30-40% Incomplete content
"That's not what I asked" 15-20% Wrong content matched
"Is that still accurate?" 10-15% Outdated content
"But what about [edge case]?" 20-25% Edge cases not documented
Working correctly 10-15% Content was good

Only 10-15% of conversations work well when content quality is poor.

The Knowledge Base Retraining Process

Step 1: Export Failure Data

Download all conversations where:

  • Customer rated unhelpful
  • Customer asked follow-up questions
  • Conversation escalated to human
  • Bot said "I don't have information"

This identifies what to fix.

Step 2: Categorize the Failures

Group failures by type:

Category Example Questions
Returns/refunds "What if I opened it?" "What about sale items?"
Pricing details "Does that include tax?" "Annual discount?"
Shipping specifics "How long to California?" "International rates?"
Product comparisons "Which one is better for X?"
Account issues "How do I cancel?" "Change my password?"
Technical specs Various product-specific questions

Now you know what to fix.

Step 3: Rewrite Content for Completeness

Before (typical website content):

Returns accepted within 30 days of purchase.

After (complete knowledge base entry):

RETURN POLICY - COMPLETE GUIDE

Standard Returns:
- 30-day return window from delivery date
- Items must be unused and in original packaging
- Include original receipt or order confirmation email
- Refunds processed within 5-7 business days

How to Return:
1. Email returns@company.com with your order number
2. We'll send a prepaid shipping label within 24 hours
3. Pack the item securely and drop at any carrier location
4. Track your return with the provided tracking number

Exceptions:
- Sale items: 14-day return window (not 30)
- Opened software: No returns after seal is broken
- Custom/personalized items: Non-refundable
- Gift purchases: Store credit issued (no cash refunds
  without original purchaser)

Common Questions:
Q: What if it's day 31?
A: Contact us—we often approve slightly late returns
   for first-time issues.

Q: What if I lost my receipt?
A: We can look up orders by email address or credit
   card last 4 digits.

Q: Can I exchange instead of refund?
A: Yes! Same process, just let us know what you want instead.

Q: Do you refund shipping?
A: Original shipping is non-refundable unless we made an error.

One topic goes from 1 sentence to comprehensive coverage.

Step 4: Add Custom Q&A Pairs

For questions your website doesn't explicitly answer, add custom training:

Question variations: (AI matches any of these)

- How much does shipping cost?
- What's shipping?
- Shipping rates?
- How much to ship?

Answer:

Shipping rates depend on your location:

US Standard (5-7 days):
- Orders under $50: $5.99
- Orders $50-99: $3.99
- Orders $100+: FREE

US Express (2-3 days):
- Flat rate: $12.99

International:
- Calculated at checkout based on destination
- Typically $15-35 for most countries
- 7-14 business days delivery

Step 5: Add Edge Cases

Search support history for "but what about" and "what if" phrases. Every edge case answered manually becomes training data:

Q: What if my package says delivered but I never got it?
A: Here's what to do:

1. Check with neighbors and around your property
2. Wait 24 hours (sometimes tracking updates before
   actual delivery)
3. If still not found, contact us with your order number

For verified missing packages, we'll either reship at
no charge or issue a full refund—your choice. No need
to file claims yourself; we handle it.

Step 6: Clean Up Conflicting Information

Common issue: Website says different things in different places:

  • Homepage: "Free shipping on orders over $50"
  • Cart page: "Free shipping at $75"
  • Footer: "Free shipping on orders $50+"

Fix: Designate one source as authoritative with content priority:

Priority 1: Policy pages (returns, shipping, terms)
Priority 2: Product pages
Priority 3: FAQ page
Priority 4: Blog posts (lowest priority)

Then fix the website to be consistent.

Step 7: Exclude Outdated Content

Remove pages that confuse the AI:

Excluded URLs:

/blog/2023-pricing-announcement (old pricing)
/holiday-sale-2023 (expired promotion)
/press-releases/* (marketing speak, not helpful)
/careers/* (not relevant to customers)

Conversation Improvement Examples

Before vs. After: Return Question

Before:

Customer: Can I return this if I opened it?
Bot: Returns accepted within 30 days of purchase.
Customer: But I opened the package
Bot: Returns accepted within 30 days of purchase.
Customer: [Escalates to human]

After:

Customer: Can I return this if I opened it?
Bot: Yes, you can return opened items as long as:

- The item itself is unused (packaging can be open)
- You're within the 30-day window
- You have your receipt or order confirmation

The only exceptions are software (no returns after
seal is broken) and custom/personalized items.

Want me to start the return process for you?
Customer: Yes please

Before vs. After: Shipping Question

Before:

Customer: How long does shipping take to California?
Bot: Shipping times vary by location.
Customer: I'm in California
Bot: Shipping times vary by location. Please see our
     shipping page for details.
Customer: [Leaves frustrated]

After:

Customer: How long does shipping take to California?
Bot: For California (from our warehouse in Texas):

- Standard shipping: 4-5 business days
- Express shipping: 2-3 business days

If you order before 2 PM CT today, it ships same day.
Express would arrive by [specific date].

Want me to check availability on something specific?

Expected Results After Retraining

Metric Before After
Resolution rate 30-40% 75-85%
"Unhelpful" ratings 35-45% 5-10%
Follow-up questions 2-3 per conversation Under 1
Human escalations 40-50% 10-20%
Customer satisfaction 2.5-3/5 4-4.5/5

Same AI technology. Dramatically better results from better content.

Ongoing Training Routine

Retraining isn't one-time. Establish a maintenance schedule:

Daily (5 minutes)

  • Skim conversation log for obvious failures
  • Quick-fix anything urgent

Weekly (30 minutes)

  • Review all "unhelpful" rated conversations
  • Identify patterns (same question failing repeatedly?)
  • Add 2-3 custom Q&A pairs for common gaps
  • Update anything that changed this week

Monthly (2 hours)

  • Full performance review by topic
  • Analyze most common questions
  • Update pricing, policies, seasonal info
  • Refresh website crawl

Quarterly (Half day)

  • Comprehensive content audit
  • Competitor comparison content
  • New feature documentation
  • Strategy review

Common Training Mistakes

Training on marketing copy "Our revolutionary solution transforms your business!" isn't helpful when customers ask "what does it do?" Train on specific, factual content.

Missing follow-up questions You answer the main question but not the obvious next questions. Anticipate what they'll ask after.

Internal jargon Your team calls it "SKU discontinuation." Customers call it "out of stock." Use customer language.

Forgetting edge cases The 80% case is documented. The 20% edge cases create 80% of escalations. Document the exceptions.

Not refreshing content Prices change. Policies update. Products launch. Set a reminder to re-crawl monthly.

Getting Started

Start free with Kya to improve your chatbot's accuracy:

  1. Export your last 100 conversations from Analytics
  2. Categorize: What questions fail? What topics?
  3. For each failing topic, ask: Is the content complete? Current? Clear?
  4. Rewrite content to be actually helpful
  5. Add custom Q&A for gaps your website doesn't cover
  6. Re-test with the same questions
  7. Watch resolution rate climb

The improvement is visible within a week. Not because the AI got smarter—because you gave it better material.


Chatbot accuracy isn't about AI technology.

It's about content quality.

Complete, current, clear content produces complete, current, clear answers.

Fix the input, and the output fixes itself.

About the Author

Nedim Mehic

Founder of Kya. Building AI tools that help businesses grow.

Ready to transform your customer support?

Add an AI chatbot to your website in under 60 seconds. No coding required.