Vektori's Memory Architecture: Principles from Claude's Leaked System

Memory Architecture Principles
The Claude Code team shared how their memory system works, revealing key principles: memory is an index, not storage. MEMORY.md contains just pointers (150 chars per line), with real knowledge in separate files fetched on demand. Raw transcripts are never loaded—only grepped when needed. Three layers exist, each with different access costs. The sharpest principle: if something is derivable, do not store it. Retrieval is skeptical—memory is a hint, not truth, and the model verifies before using.
Vektori's Implementation
Vektori applies the same principles with a different shape. While Claude uses a file hierarchy, Vektori implements a hierarchical sentence graph with three layers:
- FACT LAYER (L0) — Crisp statements serving as the search surface. Cheap and always queryable.
- EPISODE LAYER (L1) — Episodes across conversations, auto-discovered.
- SENTENCE LAYER (L2) — Raw conversation, only fetched when explicitly needed.
Same access model applies: L0 is your index, L2 is your transcript (grepped not dumped). You pay for what you need.
Strict Write Discipline
Nothing enters L0 without passing quality filters: minimum character count, content density check, pronoun ratio. If a sentence is too vague or purely filler, it never becomes a fact. This matches Claude's principle of not storing derivable things.
Retrieval Mechanics
Retrieval works as Claude describes: scored, thresholded, skeptical. Minimum score of 0.3 before anything surfaces. Results are ranked by vector similarity plus temporal decay, not retrieved blindly.
Architectural Divergence on Corrections
Claude's approach optimizes for single-user project contexts where the latest state matters. Vektori, designed for agents working across hundreds of sessions, preserves correction history. When a user changes their mind, the old fact stays in the graph with its sentence links, allowing tracing back to what was said before the change and why it got superseded.
Performance and Future
On LongMemEval-S, Vektori achieved 73% accuracy at L1 depth using BGE-M3 + Gemini Flash-2.5-lite. Multi-hop conflict resolution—where you reason about how a fact changed over time—is where triple-based systems (subject-object-predicate) collapse. The next layer involves storing why: causal edges between events ("user corrected X, agent updated Y, user disputed again") extracted asynchronously and queryable as a graph. Agent trajectories become memory—the agent's own behavior becomes part of what it can reason about.
📖 Read the full source: r/ClaudeAI
👀 See Also

hipEngine: Fast Native Qwen 3.6 Inference for RDNA3 (Strix Halo, 7900 XTX)
hipEngine is a new open-source (AGPLv3) ROCm-native inference engine for Qwen 3.6 MoE on RDNA3 GPUs. Benchmarks show prefill up to 2718 tok/s on 7900 XTX, competitive with llama.cpp, and INT8 KV cache enabling full 256K context in under 24GB.

Tessera: Open-Source GUI Workspace for Managing Multiple Claude Code Sessions
Tessera is an open-source GUI that lets you run multiple Claude Code sessions side by side with Git worktree isolation, Kanban task tracking, live diffs, and agent activity inspection.

Agint: A Rust CLI tool that detects contradictions in AI agent instruction files
Agint is a free, open-source Rust CLI tool that scans instruction files like CLAUDE.md and AGENTS.md for contradictions, missing file references, and sync issues. It uses static analysis for structural problems and optionally calls Claude API for semantic contradiction detection.

Local semantic search for AI conversations with fastembed and LanceDB
A developer indexed 368K AI conversation messages locally using fastembed for CPU-based embeddings and LanceDB as a serverless vector store, achieving 12ms p50 search latency without API keys.