llmLibrarian: Local RAG Engine with MCP Integration for File-Based AI Search

✍️ OpenClawRadar📅 Published: March 30, 2026🔗 Source
llmLibrarian: Local RAG Engine with MCP Integration for File-Based AI Search
Ad

What This Is

llmLibrarian is a local RAG (Retrieval-Augmented Generation) engine that exposes retrieval capabilities through the Model Context Protocol (MCP). It allows you to index folders into silos (ChromaDB collections), then query them from any MCP client—including Claude—to get grounded, cited answers.

Key Features and Architecture

The tool indexes folders into silos, which are ChromaDB collections. When you want direct answers instead of raw chunks, Ollama handles the synthesis layer. Everything runs locally on your machine.

The developer highlights the multi-silo capability as particularly powerful: combining silos allows patterns to surface across domains that would be difficult to catch manually. For example, a journal folder becomes a thinking partner that remembers what you've written, and a codebase becomes an agent that knows your actual files.

Ad

MCP Tools Exposed

  • retrieve — hybrid RRF vector search that returns raw chunks with confidence scores for Claude to reason over
  • retrieve_bulk — multi-angle queries in one call, useful when aggregating across document types
  • ask — Ollama-synthesized answer directly from retrieved context (defaults to llama3.1:8b, but you can swap in whatever model you have pulled)
  • list_silos, inspect_silo, trigger_reindex — index management tools

Technical Stack

  • ChromaDB for vector storage
  • Ollama for model synthesis
  • sentence-transformers (all-mpnet-base-v2, MPS-accelerated) for embeddings
  • fastmcp for the MCP layer

The developer mentions that the multi-silo metadata tagging in ChromaDB took several iterations to get right and is open to discussing the architecture.

This type of tool is useful for developers who want to build AI agents that can reference and reason over their local files without sending data to external services.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Multi-LLM Paper-Trading Bot with Claude Opus as Lead Engineer and Gemini as Strategist: Architecture Breakdown
Tools

Multi-LLM Paper-Trading Bot with Claude Opus as Lead Engineer and Gemini as Strategist: Architecture Breakdown

A solo builder shares a 4,900-LOC paper-trading bot on Alpaca where Claude Opus 4 (Engineer) has veto power over Gemini Pro (Strategist), with a 270+ entry disagreement log called the Strategist Codex.

OpenClawRadar
OpenEvol: Offline Self-Improvement Pipeline for LLMs Using Conversation History
Tools

OpenEvol: Offline Self-Improvement Pipeline for LLMs Using Conversation History

OpenEvol v0.1.1 is an offline pipeline that automatically mines AI conversation history to create fine-tuning datasets without manual labeling. It runs on CPU initially and supports five teacher backends including OpenAI-compatible APIs and HuggingFace Transformers.

OpenClawRadar
Clawhub Skill Enables OpenClaw to Analyze Apple Health Data via API
Tools

Clawhub Skill Enables OpenClaw to Analyze Apple Health Data via API

A new Clawhub skill called 'apple-health-export-analyzer' allows OpenClaw to read and analyze Apple Health data by serving it as an API, parsing large XML files to extract relevant metrics and provide daily health updates with actionable suggestions.

OpenClawRadar
Xiaozhen: A Claude Code skill that digs three layers into root causes
Tools

Xiaozhen: A Claude Code skill that digs three layers into root causes

Xiaozhen (小真) is a Claude Code skill that uses three mechanics—The Gift, Three Layers Deep, and The Prediction—to help users uncover what's actually bothering them rather than giving direct advice. It's installed with a one-line curl command and activated by typing /小真 in Claude Code.

OpenClawRadar