Local MCP Memory System with Consolidation for AI Conversations

What This Is
A developer created a local memory system for AI conversations that consolidates and synthesizes information rather than just storing it. Built as an MCP server, it works with compatible clients like Claude Desktop and Claude Code, running 100% locally with no data leaving your hardware.
How It Works
The key differentiator from standard RAG systems is the consolidation process. Every 6 hours, a local LLM (Qwen 2.5-7B running in LM Studio) clusters recent memories by topic and consolidates them into structured knowledge documents. It extracts facts, solutions, and preferences, merging them with existing knowledge and versioning everything.
Technical Stack
- Embeddings: nomic-embed-text-v1.5 via LM Studio
- Vector search: FAISS (semantic + keyword hybrid)
- Consolidation LLM: Qwen 2.5-7B (Q4) via LM Studio
- Storage: SQLite for episodes, FAISS for vectors
- Protocol: MCP — works with anything that supports it
- Config: TOML
Features
- Semantic dedup with cosine similarity 0.95 threshold
- Adaptive surprise scoring — frequently accessed memories get boosted, stale ones decay
- Atomic writes with tempfile + os.replace for crash protection
- Tombstone-based FAISS deletion — O(1) instead of rebuilding the whole index
- Graceful degradation — if LM Studio goes down, storage still works, consolidation pauses
- 88 tests passing
MCP Tools
memory_store— save an episode with type, tags, surprise scorememory_recall— semantic search across episodes + consolidated knowledgememory_forget— mark an episode for removalmemory_correct— update a knowledge docmemory_export— full JSON backupmemory_status— health check
Why MCP Was Chosen
Models get replaced frequently, but accumulated knowledge shouldn't disappear with them. MCP makes the memory portable — one store, many interfaces. The memory layer becomes more valuable than any individual model.
Practical Results
After about a week of use, the system built knowledge documents about PC hardware, VR setup, coding preferences, and project architectures — all synthesized from normal conversation. When starting new chats, the AI already knows the user's context without re-explaining.
Requirements
- Python 3.11+
- LM Studio with Qwen 2.5-7B and nomic-embed-text-v1.5 loaded
- Any MCP client
📖 Read the full source: r/LocalLLaMA
👀 See Also

Logira: eBPF Runtime Auditing for AI Agent Runs
Logira is an observe-only Linux CLI tool that records exec, file, and network events via eBPF during AI agent runs, with per-run local storage in JSONL and SQLite and built-in detection rules for credential access, persistence changes, and suspicious patterns.

MAGELLAN: A 15-Agent Autonomous Scientific Discovery System Built on Claude Code
MAGELLAN is a 15-agent autonomous scientific discovery system built entirely on Claude Code. It uses Opus for deep reasoning and Sonnet for structured tasks, generating cross-disciplinary hypotheses without human direction, with 260 hypotheses proposed and 60% killed by adversarial validation in 19 sessions.

Engram: Hybrid Memory Plugin for OpenClaw Agents — Vector + Semantic Search with Decay
Engram gives OpenClaw agents persistent memory across sessions using SQLite+FTS5 for exact recall and LanceDB for semantic search, with decay classes and auto-capture hooks.

Claude Workflow Library Now Tracks and Rates Reddit- Sourced Workflows Automatically
A searchable, auto-updated index of Claude and Claude Code workflows from major subreddits, with steps, artifacts, and community ratings.