Opus 4.6 vs GPT 5.4: Peer-Reviewing a Memory Stack

A developer documented their process of designing a memory stack for OpenClaw by having two AI models peer-review each other's work. They used Claude Opus 4.6 via API tokens as their primary model to design the architecture, then sent the complete design to GPT 5.4 for quality assurance.

The AI peer-review process

The developer researched multiple memory plugins including Mem0, Supermemory, Cognee, Hindsight, QMD, Lossless Claw, LanceDB, and MemOS before concluding that no single plugin solves every memory problem. Opus 4.6 was used to design a full implementation prompt for OpenClaw, which GPT 5.4 then reviewed.

GPT 5.4 identified several issues during peer review: feedback loop risks, a cron job with excessive authority, FTS5 verification gaps, version pinning concerns, and token overhead problems. After three rounds of feedback between the models, they converged on a final design both approved.

The developer noted that Opus was stronger on architecture and plugin-level details, while GPT excelled at identifying operational risks, edge cases, and failure scenarios.

The three-layer memory stack

Layer 1: Lossless Claw (LCM) – Replaces default compaction entirely. Instead of summarizing old messages and deleting them, it preserves every message in a SQLite database and builds a tree of progressively compressed summaries (a DAG). The model sees summaries plus the most recent messages but can drill back into full detail using tools like lcm_grep and lcm_expand. Summarization runs on Haiku to control costs.
Layer 2: SQLite Hybrid Search – Not a plugin, just a configuration change. Enables BM25 keyword matching alongside default vector search, allowing exact terms (project names, error codes, IDs) to be found in addition to semantically similar content. Also enables MMR for diverse results and temporal decay so recent notes rank higher. This feature is built into OpenClaw but disabled by default.
Layer 3: Mem0 Cloud – Provides cross-session persistent memory. Auto-recall injects relevant facts before every response, while auto-capture extracts facts after every response. Configured with topK=3 and a higher search threshold (0.45) to reduce token overhead.

Supporting configuration

7-day session idle timeout to prevent unnecessary session resets
Anthropic cache-ttl context pruning aligned with prompt cache retention
Pre-compaction memory flush allowing the agent to write durable notes before compaction events
Nightly consolidation cron at 3 AM that reads past 7 days of daily logs and writes a consolidated summary to a dated file (summarize-only, cannot delete, trim, or modify existing files, cannot write to MEMORY.md, idempotent)
Deterministic archive script at 4 AM (system cron, not OpenClaw) that moves daily logs older than 30 days to an archive directory outside the indexed memory path

Excluded plugins and reasoning

QMD – Excluded due to open bugs including gateway restart loops, memory_search not calling QMD, and permanent fallback after timeout. SQLite hybrid search provides similar benefits without the instability.
Cognee – Knowledge graph functionality considered overkill for a single-user personal setup. Deferred for potential later implementation if needed.
Supermemory – Most performance claims are vendor-originated, while Mem0 is more battle-tested.

Key risks identified

During peer review, the models identified feedback loop risks between Mem0 and LCM/cron jobs, though the source text cuts off before detailing all identified risks.

📖 Read the full source: r/openclaw