Loading module
Resolving locale, route permissions, and workspace projection.
AI RAG Source Map
Scope
This document maps the real knowledge and retrieval surfaces already present in the repository that could support an AI assistant or RAG workflow.
It distinguishes between:
- already queryable
- partially queryable
- not yet queryable
- only conceptually present
Source Map
| Source | Location | Verified | State | Queryability | Surface Type | Notes |
| --- | --- | --- | --- | --- | --- | --- |
| Managed docs corpus | docs/** via apps/web/src/lib/docs/documentRegistry.ts and apps/web/lib/docsRagBootstrap.ts | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE | Current assistant retrieval already uses this source. |
| Document metadata registry | apps/web/src/lib/docs/documentRegistry.ts | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE | Strong metadata for category, visibility, audience, status, tags. |
| Governance/document manifests | , | VERIFIED | IMPLEMENTED | partially queryable | KNOWLEDGE SURFACE / DOC-ONLY | Structured metadata exists and is RAG-friendly, but not all manifest structure is yet used by the assistant path. |
| AI governance policy corpus | | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE / DOC-ONLY | Strong source for guardrails, memory, audit, and policy-grounded answers. |
| Architecture and procedure docs | , , governance procedures | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE / DOC-ONLY | Already inside the docs knowledge surface if registered/read by the docs bootstrap. |
| Butkhuzi norms rows | and API/gateway routes | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE | Structured norms list/browse surface. |
| Butkhuzi suggest | | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE | Useful for assistant autocomplete and intent narrowing. |
| Butkhuzi chunk search | | VERIFIED | IMPLEMENTED | already queryable | KNOWLEDGE SURFACE | Strongest domain-semantic retrieval surface in the repo. |
| Butkhuzi ingestion/chunk rebuild | , | VERIFIED | IMPLEMENTED | operationally queryable | INFRA SURFACE / KNOWLEDGE SURFACE | Real corpus maintenance path for assistant-quality norms retrieval. |
| Docs access/acknowledgement history | | VERIFIED | IMPLEMENTED | partially queryable | KNOWLEDGE SURFACE / INFRA SURFACE | Could support assistant memory or compliance awareness later. |
| AI audit history | and AI audit routes | VERIFIED | IMPLEMENTED | partially queryable | KNOWLEDGE SURFACE / INFRA SURFACE | Good for traceability and policy review, not yet broad answer grounding. |
| Founder chat memory | | VERIFIED | IMPLEMENTED | partially queryable | INFRA SURFACE / KNOWLEDGE SURFACE | Recent conversation recall exists, but not advanced semantic memory retrieval. |
| Evidence attachments / declaration evidence | and evidence APIs | VERIFIED | IMPLEMENTED | partially queryable | KNOWLEDGE SURFACE | Operational evidence exists, but no assistant-oriented indexing or retrieval layer was verified. |
| Structured service APIs | and | VERIFIED | PARTIAL | partially queryable | KNOWLEDGE SURFACE | Rich operational data exists, but not normalized into assistant context retrieval. |
| KES traceability surface | | VERIFIED | PARTIAL | partially queryable | UI-ONLY / KNOWLEDGE SURFACE | Product surface exists, but current trace data is not a verified backend knowledge substrate. |
| Event catalog docs | | VERIFIED | IMPLEMENTED | already queryable through docs | KNOWLEDGE SURFACE / DOC-ONLY | Useful as documented event knowledge, not as live event-history retrieval. |
| Kafka/outbox/event history | | VERIFIED | PARTIAL | not yet queryable for assistant use | INFRA SURFACE / KNOWLEDGE SURFACE | Real backbone exists, but no assistant-facing retrieval abstraction was verified. |
| Persistent vector knowledge store | repo-wide | VERIFIED | MISSING | not yet queryable | INFRA SURFACE | No generalized persisted vector DB for assistant knowledge was verified. |
READ 2026-03-28T23:42:21.903Z
READ 2026-03-29T04:52:41.869Z
CORE STRICT SAFE DELETE AFTER RERUN REPORT
PUBLIC | DRAFT | v1.0.0
READ 2026-03-29T03:13:33.020Z
docs/_manifest/documents.manifest.json
docs/_manifest/governance.manifest.json
docs/10_normative/KVARY_AI_*
docs/CURRENT_ARCHITECTURE.md
docs/EVENT_DRIVEN_ARCHITECTURE.md
services/svc-butkhuzi/src/butkhuzi/repository.ts
GET /butkhuzi/suggest
GET /butkhuzi/search
POST /butkhuzi/upsert
POST /butkhuzi/chunks/rebuild
apps/web/src/lib/docs/documentAccessLog.ts
apps/web/var/log/ai-audit.jsonl
packages/memory-layer/*
services/svc-tenders/src/evidenceStorage.ts
services/*
services/api/*
apps/web/src/features/kesTrace/*
docs/80_chain/EVENT_CATALOG.md
services/*/src/kafka/*
Retrieval Paths Already Present
1. Docs RAG path
docs/**
documentRegistry.ts
readDocumentMarkdown(...)
docsRagBootstrap.ts
InMemoryRagVectorStore
retrieveRagMatches(...)
/api/ai/ask
- implemented
- already queryable
- currently the main assistant retrieval substrate
- in-memory runtime store
- not multi-source
- not persisted across broader assistant corpora
2. Butkhuzi norms retrieval path
- Butkhuzi persistence
svc-butkhuzi routes
- API gateway route
- web portal client
- KES and SNIP product surfaces
- implemented
- already queryable
- already used in product workflows
- this is the strongest existing domain-semantic retrieval surface outside docs
3. Assistant memory path
/api/ai/ask
createSession(...)
appendMessage(...)
getRecentMessages(...)
- implemented
- partially queryable
- founder-chat scoped naming and policy model
- recent-message recall only
- no verified retrieval/ranking over embeddings despite schema support
4. Audit and document history path
- AI audit logger and verification
- document access log
- governance dashboard snapshots
- implemented
- partially queryable
- strong for traceability, compliance, and later assistant memory/context features
- not yet a primary answer-retrieval pipeline
Knowledge Domains By Practical Assistant Readiness
Already queryable
Management-system docs, procedures, and policies
- directly loaded into docs RAG
- well-tagged and registered
- already the main assistant corpus
Butkhuzi norms and semantic chunks
- real search and suggestion endpoints
- real product consumption
- explicit semantic-search intent in repo docs
Partially queryable
AI audit and governance traceability
- readable audit chain exists
- verification route exists
- not yet normalized into assistant retrieval
Document access and acknowledgement history
- structured records exist
- potentially useful for operational context
- not currently part of assistant context assembly
Evidence and records
- storage and service paths exist
- no verified chunking/indexing/ranking path for assistant use
Structured platform data
- many APIs and read models exist
- no unified assistant access layer over them
Not yet queryable for assistant use
Live event backbone history
- event infra exists operationally
- no verified assistant retrieval interface over Kafka/outbox/idempotency/DLQ history
Generic persistent vector knowledge layer
- current docs vector store is in-memory
- no repo-wide persisted cross-corpus vector substrate was verified
Only conceptually present
Broad operational memory assistant
- several good ingredients exist
- no verified unified architecture currently joins them into a general operational memory layer
Current RAG Shape
The current RAG foundation is best described as:
- real and usable
- docs-first
- norms-capable through Butkhuzi, but not yet fused into the main assistant path
- policy- and audit-aware
- not yet unified across all platform knowledge domains
This means the platform is closer to:
- search-ready and docs-RAG-ready
- norms-ready in a neighboring domain
- full multi-source enterprise RAG
Best Next Technical Direction
Recommended next step
Normalize the first multi-source assistant retrieval layer around:
- docs/governance corpus
- Butkhuzi norms corpus
Then make /api/ai/ask source-aware across both.
Minimum credible direction:
- add Butkhuzi retrieval into the assistant route as a second source
- preserve explicit source labels in citations
- keep current audit and policy controls
- keep current memory use as limited recent conversational context
- defer evidence/event-history retrieval until indexing and trust boundaries are clearer
Why this is the strongest direction
- it reuses working retrieval surfaces instead of inventing new ones
- it connects the strongest structured domain knowledge source to the current assistant surface
- it avoids pretending that evidence and event history are already assistant-ready
- it turns the repo from “fragmented assistant foundation” into “real multi-source assistant foundation”
Bottom Line
The repo already has real RAG ingredients, but the usable assistant-ready sources are concentrated in:
- management-system documentation
- governance procedures and policies
- Butkhuzi norms and chunk search
Everything else is either partial, infrastructural, or only conceptually ready.