Local MCP Memory System with Consolidation for AI Conversations

✍️ OpenClawRadar📅 Published: February 26, 2026🔗 Source

What This Is

A developer created a local memory system for AI conversations that consolidates and synthesizes information rather than just storing it. Built as an MCP server, it works with compatible clients like Claude Desktop and Claude Code, running 100% locally with no data leaving your hardware.

How It Works

The key differentiator from standard RAG systems is the consolidation process. Every 6 hours, a local LLM (Qwen 2.5-7B running in LM Studio) clusters recent memories by topic and consolidates them into structured knowledge documents. It extracts facts, solutions, and preferences, merging them with existing knowledge and versioning everything.

Technical Stack

Embeddings: nomic-embed-text-v1.5 via LM Studio
Vector search: FAISS (semantic + keyword hybrid)
Consolidation LLM: Qwen 2.5-7B (Q4) via LM Studio
Storage: SQLite for episodes, FAISS for vectors
Protocol: MCP — works with anything that supports it
Config: TOML

Features

Semantic dedup with cosine similarity 0.95 threshold
Adaptive surprise scoring — frequently accessed memories get boosted, stale ones decay
Atomic writes with tempfile + os.replace for crash protection
Tombstone-based FAISS deletion — O(1) instead of rebuilding the whole index
Graceful degradation — if LM Studio goes down, storage still works, consolidation pauses
88 tests passing

MCP Tools

memory_store — save an episode with type, tags, surprise score
memory_recall — semantic search across episodes + consolidated knowledge
memory_forget — mark an episode for removal
memory_correct — update a knowledge doc
memory_export — full JSON backup
memory_status — health check

Why MCP Was Chosen

Models get replaced frequently, but accumulated knowledge shouldn't disappear with them. MCP makes the memory portable — one store, many interfaces. The memory layer becomes more valuable than any individual model.

Practical Results

After about a week of use, the system built knowledge documents about PC hardware, VR setup, coding preferences, and project architectures — all synthesized from normal conversation. When starting new chats, the AI already knows the user's context without re-explaining.

Requirements

Python 3.11+
LM Studio with Qwen 2.5-7B and nomic-embed-text-v1.5 loaded
Any MCP client

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Logira: eBPF Runtime Auditing for AI Agent Runs

Logira is an observe-only Linux CLI tool that records exec, file, and network events via eBPF during AI agent runs, with per-run local storage in JSONL and SQLite and built-in detection rules for credential access, persistence changes, and suspicious patterns.

Mar 2, 2026, 07:45 AM UTC

OpenClawRadar

Tools

MAGELLAN: A 15-Agent Autonomous Scientific Discovery System Built on Claude Code

MAGELLAN is a 15-agent autonomous scientific discovery system built entirely on Claude Code. It uses Opus for deep reasoning and Sonnet for structured tasks, generating cross-disciplinary hypotheses without human direction, with 260 hypotheses proposed and 60% killed by adversarial validation in 19 sessions.

Mar 30, 2026, 04:45 AM UTC

OpenClawRadar

Tools

Engram: Hybrid Memory Plugin for OpenClaw Agents — Vector + Semantic Search with Decay

Engram gives OpenClaw agents persistent memory across sessions using SQLite+FTS5 for exact recall and LanceDB for semantic search, with decay classes and auto-capture hooks.

Jun 2, 2026, 12:17 PM UTC

OpenClawRadar

Tools

Claude Workflow Library Now Tracks and Rates Reddit- Sourced Workflows Automatically

A searchable, auto-updated index of Claude and Claude Code workflows from major subreddits, with steps, artifacts, and community ratings.

May 12, 2026, 12:16 PM UTC

OpenClawRadar