2026-06-12 — knowledge-base prep toolkit
Working companions to เตรียมข้อมูลให้ AI Chatbot: คู่มือทำ Knowledge Base ฉบับ SME ไทย 2026. MIT licensed.
- faq_sheet_to_jsonl.py — FAQ CSV → clean JSONL chunks (validates 1-1-1 rule, drops expired promos, dedupes)
- thai_chunker.py — Thai-aware chunker: structure-first, sentence-safe splits, overlap (no deps)
- kb_freshness_audit.py — flag KB entries untouched > 60 days + expired promos (cron-ready exit codes)
- kb_coverage_gap.py — mine real chat logs for questions your KB doesn't cover yet (difflib, no embeddings)
- kb_lint.js — CI linter for the 1-1-1 rule: dangling refs, priceless price answers, dupes, promo without expiry
- qdrant_upsert_kb.py — idempotent kb.jsonl → Qdrant upsert via any OpenAI-compatible embeddings endpoint
Interactive: KB readiness checker — paste your FAQ sheet, get a score + fix list
← all snippets · blog · korpai.co