RAG Bots in Regulated Industries: 4 Key Lessons Learned

Key Implementation Details

This case study covers deployment of a RAG-powered AI assistant for Australian workplace compliance use cases across construction sites, aged care facilities, and mining operations.

Technical Lessons Learned

Query expansion matters more than chunk size: Instead of obsessing over chunk size (400 words? 512 tokens?), the developer found that generating 4 alternative phrasings of each query via Haiku, running all 4 against ChromaDB, then merging and deduplicating results significantly improved retrieval quality. This was particularly effective for domain-specific jargon where users phrase things differently than document authors.
Source boost for named documents: If a user's query contains words that match an indexed document title, force-include chunks from that document regardless of semantic similarity. For example, "What does our FIFO policy say about R&R flights?" should always pull from the FIFO policy — not just semantically similar chunks that happen to mention flights.
Layer your prompts — don't let clients break Layer 1: Implemented a three-layer system: core security/safety rules (immutable), vertical personality (swappable per industry), client custom instructions (additive only). Clients cannot override Layer 1 via their custom instructions. This prevented "ignore previous instructions" attacks and clients accidentally jailbreaking their own bots.
Local embeddings are good enough: Used sentence-transformers all-MiniLM-L6-v2 running locally on ChromaDB with no external embedding API. For document Q&A in a specific domain, it performs close enough to ada-002 that the cost and latency savings are worth it. The LLM quality (Claude Haiku) is doing more work than the embeddings anyway.
One droplet per client: Tried shared infrastructure first but found the operational overhead of keeping ChromaDB collections isolated, managing API keys, and preventing cross-contamination was worse than just spinning a $6/mo VM per client. Each client owns their vector store, and their documents never touch shared infrastructure.

The developer has made the RAG engine available on GitHub for others to examine.

📖 Read the full source: r/LocalLLaMA

Practical Lessons from Deploying RAG Bots in Regulated Industries

Key Implementation Details

Technical Lessons Learned

👀 See Also

OpenClaw Agent Automates AI News Pipeline with LLM Curation

Non-developer builds healthcare SaaS in 3 weeks using Claude and Gemini: lessons learned

Building a Steam Game in 10 Days Using Claude Code: Technical Challenges and Workflow

Karis CLI Architecture: Using Claude for Planning, Not Execution