llmLibrarian: Local RAG Engine with MCP Integration for File-Based AI Search

What This Is
llmLibrarian is a local RAG (Retrieval-Augmented Generation) engine that exposes retrieval capabilities through the Model Context Protocol (MCP). It allows you to index folders into silos (ChromaDB collections), then query them from any MCP client—including Claude—to get grounded, cited answers.
Key Features and Architecture
The tool indexes folders into silos, which are ChromaDB collections. When you want direct answers instead of raw chunks, Ollama handles the synthesis layer. Everything runs locally on your machine.
The developer highlights the multi-silo capability as particularly powerful: combining silos allows patterns to surface across domains that would be difficult to catch manually. For example, a journal folder becomes a thinking partner that remembers what you've written, and a codebase becomes an agent that knows your actual files.
MCP Tools Exposed
retrieve— hybrid RRF vector search that returns raw chunks with confidence scores for Claude to reason overretrieve_bulk— multi-angle queries in one call, useful when aggregating across document typesask— Ollama-synthesized answer directly from retrieved context (defaults to llama3.1:8b, but you can swap in whatever model you have pulled)list_silos,inspect_silo,trigger_reindex— index management tools
Technical Stack
- ChromaDB for vector storage
- Ollama for model synthesis
- sentence-transformers (all-mpnet-base-v2, MPS-accelerated) for embeddings
- fastmcp for the MCP layer
The developer mentions that the multi-silo metadata tagging in ChromaDB took several iterations to get right and is open to discussing the architecture.
This type of tool is useful for developers who want to build AI agents that can reference and reason over their local files without sending data to external services.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Multi-LLM Paper-Trading Bot with Claude Opus as Lead Engineer and Gemini as Strategist: Architecture Breakdown
A solo builder shares a 4,900-LOC paper-trading bot on Alpaca where Claude Opus 4 (Engineer) has veto power over Gemini Pro (Strategist), with a 270+ entry disagreement log called the Strategist Codex.

OpenEvol: Offline Self-Improvement Pipeline for LLMs Using Conversation History
OpenEvol v0.1.1 is an offline pipeline that automatically mines AI conversation history to create fine-tuning datasets without manual labeling. It runs on CPU initially and supports five teacher backends including OpenAI-compatible APIs and HuggingFace Transformers.

Clawhub Skill Enables OpenClaw to Analyze Apple Health Data via API
A new Clawhub skill called 'apple-health-export-analyzer' allows OpenClaw to read and analyze Apple Health data by serving it as an API, parsing large XML files to extract relevant metrics and provide daily health updates with actionable suggestions.

Xiaozhen: A Claude Code skill that digs three layers into root causes
Xiaozhen (小真) is a Claude Code skill that uses three mechanics—The Gift, Three Layers Deep, and The Prediction—to help users uncover what's actually bothering them rather than giving direct advice. It's installed with a one-line curl command and activated by typing /小真 in Claude Code.