Approach to Self-Improving Memory in Local AI Agents

Memory Architecture for Persistent AI Agents
A developer on r/LocalLLaMA has shared their approach to creating AI agents that don't repeat mistakes across sessions. The core problem addressed is that every session starts from zero, with context windows resetting and corrections being lost between sessions.
Memory Implementation
The system uses markdown as the source of truth instead of a database. MEMORY.md is human-editable - delete a line in vim and the agent forgets it. SQLite and FAISS (HNSW, 768-dim) are derived caches that are rebuildable from markdown anytime. This allows users to version-control their agent's memory with git.
Episode Scoring and Rule Learning
Each execution gets scored +1/-1 and saved as an episode. On similar future tasks, relevant episodes get pulled into context. When the same error signature (SHA256 of tool name + normalized error) shows up twice within 7 days, a rule learner generates a one-line prevention rule.
Rules start at 0.40 confidence and need 0.60 to actually get injected into future prompts. Success bumps confidence +0.03, failure drops it -0.05. Rules that don't help eventually decay away.
Trust Escalation System
Instead of configuring permission levels upfront, the agent tracks approval patterns. 5 approvals at 90%+ rate = auto-promote. One revert = demote back. There's a shadow mode for auditing.
Task Decomposition and Safety
Complex goals become a DAG (Directed Acyclic Graph). Circular dependencies are caught via topological sort, failure cascades to dependents via DFS (Depth-First Search). A completion gate checks 18 requirements (R01-R18) - did the agent actually read files, write changes, verify results, stay in the workspace?
Safety features include 43 bash risk patterns, dual-pass analysis (raw + decoded), fail-closed design (Guardian crash = deny), and minimum writable depth of 3 to prevent rm -rf /.
The developer is seeking feedback on whether the confidence decay on rules feels right and whether the +0.03/-0.05 asymmetry is optimal. They're also wondering if there are better alternatives to HNSW for this scale (typically <10k episodes).
📖 Read the full source: r/LocalLLaMA
👀 See Also

Mnemos: an MCP server for persistent Claude Code memory
Mnemos is an open-source MCP server that gives Claude Code persistent memory across sessions, recording corrections as structured patterns and pushing ranked context at startup. Single 15 MB Go binary, no Docker or vector DB needed.

Librarian MCP: Local AI Server for Persistent Context with Documents
Librarian MCP is an open-source Model Context Protocol server that runs locally and connects to Jan, LM Studio, or Claude Desktop, enabling AI models to search and analyze document collections while maintaining full conversation context and data privacy.

Telegram Bot for Claude Code CLI Control from Mobile
A developer built a Telegram bot that bridges to the Claude Code CLI, allowing control via mobile commands like /commit, /code_review, and /simplify. The bot auto-discovers custom skills, processes photos/documents/voice notes, and supports group chat sessions.

Deploy OpenClaw on VPS with a One-Command CLI
A Reddit user claims to have developed a CLI that deploys OpenClaw on a $4.99/month VPS with a single command, offering a cost-effective alternative to using Mac Minis.