Local LLM Memory System: LTP & Selective Oblivion Guide

Bio-Inspired Memory Architecture for Local LLMs

A developer has created a local MCP server that simulates human memory mechanics to maintain clean context for local LLMs. The system implements three bio-inspired layers in Python/TypeScript instead of a static RAG pipeline.

Core Memory Mechanics

Reinforcement (Long-Term Potentiation): Each time a topic is queried, its access_count increases, strengthening frequently accessed memories.
Selective Oblivion: Unused connections decay over time, with the system automatically archiving weak atoms to prevent context pollution.
Consolidation: A weekly "sleep" cycle distills recent logs into core knowledge atoms using a lightweight SLM.

Technical Implementation Details

Hybrid Search: Combines sqlite-vec for semantic search with text fallbacks to prevent timeouts even if embeddings fail.
Non-Blocking MCP: Wraps synchronous database and embedding operations in asyncio executors to keep LM Studio responsive.
Identity Layer: Uses a persistent "Soul" file (soul.md) to maintain state and persona across sessions.
Access-Based Reinforcement: The access_count mechanism enables the model to evolve based on interaction patterns rather than just retrieving static facts.

Development Context and Validation

The project was developed to address context limits in standard RAG implementations for local AI. The developer validated the architecture by having a local LLM (running Gemini) analyze the codebase, which highlighted three innovations: true cognitive agents using access-based reinforcement and decay, robust hybrid search with fallbacks, and non-blocking architecture for responsiveness.

The goal is to create a system that remembers what matters and forgets noise, similar to human memory during sleep. The developer is exploring whether bio-inspired memory architectures can solve context limitations locally without cloud dependencies or black boxes.

📖 Read the full source: r/LocalLLaMA