Practical Lessons from Deploying RAG Bots in Regulated Industries

Key Implementation Details
This case study covers deployment of a RAG-powered AI assistant for Australian workplace compliance use cases across construction sites, aged care facilities, and mining operations.
Technical Lessons Learned
- Query expansion matters more than chunk size: Instead of obsessing over chunk size (400 words? 512 tokens?), the developer found that generating 4 alternative phrasings of each query via Haiku, running all 4 against ChromaDB, then merging and deduplicating results significantly improved retrieval quality. This was particularly effective for domain-specific jargon where users phrase things differently than document authors.
- Source boost for named documents: If a user's query contains words that match an indexed document title, force-include chunks from that document regardless of semantic similarity. For example, "What does our FIFO policy say about R&R flights?" should always pull from the FIFO policy — not just semantically similar chunks that happen to mention flights.
- Layer your prompts — don't let clients break Layer 1: Implemented a three-layer system: core security/safety rules (immutable), vertical personality (swappable per industry), client custom instructions (additive only). Clients cannot override Layer 1 via their custom instructions. This prevented "ignore previous instructions" attacks and clients accidentally jailbreaking their own bots.
- Local embeddings are good enough: Used sentence-transformers all-MiniLM-L6-v2 running locally on ChromaDB with no external embedding API. For document Q&A in a specific domain, it performs close enough to ada-002 that the cost and latency savings are worth it. The LLM quality (Claude Haiku) is doing more work than the embeddings anyway.
- One droplet per client: Tried shared infrastructure first but found the operational overhead of keeping ChromaDB collections isolated, managing API keys, and preventing cross-contamination was worse than just spinning a $6/mo VM per client. Each client owns their vector store, and their documents never touch shared infrastructure.
The developer has made the RAG engine available on GitHub for others to examine.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw Agent Automates AI News Pipeline with LLM Curation
An OpenClaw agent runs a fully automated AI news pipeline that scans 25 RSS feeds, 13 Reddit subreddits, Twitter, GitHub, and web searches, then uses Gemini Flash for editorial curation and Claude Sonnet for writing. The system costs about $5/month and publishes to a Telegram channel.

Non-developer builds healthcare SaaS in 3 weeks using Claude and Gemini: lessons learned
A medical device sales rep with no coding background built FastCredentials.com, a healthcare compliance credentialing platform, in three weeks using AI coding assistants. The project used Python/Django, Gunicorn, Nginx, Stripe, WeasyPrint, SQLite, and the Claude API for automated blog content.

Building a Steam Game in 10 Days Using Claude Code: Technical Challenges and Workflow
A developer built and released a game on Steam in 10 days using Claude Code without writing any code personally, but encountered significant challenges with logic design and debugging AI-generated code.

Karis CLI Architecture: Using Claude for Planning, Not Execution
Karis CLI uses a three-layer architecture where Claude handles planning and reasoning while pure code executes tasks reliably, creating a stable agent setup that separates LLM capabilities from execution.