Local semantic search for AI conversations with fastembed and LanceDB

✍️ OpenClawRadar📅 Published: March 20, 2026🔗 Source
Local semantic search for AI conversations with fastembed and LanceDB
Ad

A developer has implemented a local semantic search system for AI conversation history, processing 368K messages without cloud dependencies or API keys. The project uses fastembed with the BAAI/bge-small-en-v1.5 model for CPU-based embeddings and LanceDB as a vector store that operates as a single directory without a server process.

Technical Stack

  • Embeddings: fastembed with BAAI/bge-small-en-v1.5 model (384 dimensions)
  • Vector store: LanceDB - single directory, no server process, append-friendly
  • Ingest: Pulls from JSONL session transcripts (Claude Code, any chat export)
  • Embedding performance: ~500 docs/sec on M4 CPU

Key Implementation Details

The developer learned several practical lessons during the 4-month iteration:

  • Selective embedding: Early versions embedded every message, which reduced signal-to-noise. The current implementation only embeds user messages and assistant messages with substance (skipping responses like "sure, here's that code"), cutting vector count by 60% while improving search quality.
  • Chunking strategy: Switching from fixed-size chunks to conversation-turn chunks made a massive difference in retrieval relevance. Model choice (tried nomic-embed-text, bge-large, all-MiniLM) showed marginal differences compared to chunking approach.
  • LanceDB advantages: The developer found LanceDB "stupidly underrated for personal-scale" - no server, no Docker, just a directory with instant appending of new vectors, replacing an overengineered pgvector setup.
  • Re-embedding workflow: The bge-small-en-v1.5 model at 384 dimensions is fast enough to re-embed hourly as a cron job. A full re-index of 117K vectors takes approximately 4 minutes on M2 hardware.
Ad

Performance Metrics

  • Total messages ingested: 407K
  • Vectors indexed: 87K
  • Search latency (p50): 12ms across 117K vectors
  • Full re-index time: ~4 minutes (M2)
  • Storage: ~180MB on disk
  • API keys needed: 0

The project is open source under MIT license and available at github.com/mordechaipotash/brain-mcp. Installation is via pipx install brain-mcp && brain-mcp setup.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

ETL-D MCP Server: Deterministic CSV Parsing for Claude to Prevent Financial Hallucinations
Tools

ETL-D MCP Server: Deterministic CSV Parsing for Claude to Prevent Financial Hallucinations

A developer built ETL-D, an open-source MCP server for Claude Desktop that processes CSVs in three deterministic layers to prevent decimal point hallucinations in financial data. It uses Python parsers for known formats, achieves ~70ms response times with 0 LLM calls for 200 parallel requests, and only uses LLMs as a fallback for high-entropy text.

OpenClawRadar
OpenClaw Kubernetes Operator with Embedded Ollama Support
Tools

OpenClaw Kubernetes Operator with Embedded Ollama Support

A community member has created an OpenClaw Kubernetes operator that includes embedded Ollama support, allowing AI agents to run with local models in the same namespace. The setup includes installation commands, configuration details for both local and cloud Ollama models, and dashboard access instructions.

OpenClawRadar
Measuring Off-Task Token Spend in Claude Code: The 'Undeclared-Intent' Metric
Tools

Measuring Off-Task Token Spend in Claude Code: The 'Undeclared-Intent' Metric

A developer built a metric to quantify compute spent on unintended execution paths in Claude Code sessions, finding that 22.8% of tokens went to off-task work.

OpenClawRadar
NaNMesh MCP checks GitHub issues before Claude recommends libraries
Tools

NaNMesh MCP checks GitHub issues before Claude recommends libraries

NaNMesh MCP is an open-source Model Context Protocol server that crawls GitHub Issues, Stack Overflow, and Reddit for known bugs in development tools. When Claude recommends a library, it can check for real problems before integration.

OpenClawRadar