agentmemory V4 achieves 96.2% on LongMemEval benchmark, outperforms commercial AI memory systems

agentmemory V4 is an open-source memory system for AI agents that just achieved a world record score of 96.2% on LongMemEval, the standard benchmark for long-term AI agent memory.
Benchmark Performance
The system outperformed several funded AI memory companies:
- PwC Chronos: 95.6%
- Mastra: 94.87%
- OMEGA: 93.2% (raw)
- Supermemory: 85.86%
- Emergence AI: 86%
- Zep: 71.2%
Development Details
Built solo in 16 days on a mid-range gaming PC (i3-12100F) with a total cost of $1,000. The system uses Claude Opus as a generator and GPT-4o as a judge, but the retrieval architecture is the core innovation.
Technical Architecture
The system combines multiple retrieval techniques in a single SQLite-backed system:
- HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search
- BM25 for traditional text retrieval
- Cross-encoder for relevance scoring
- Knowledge graph integration
- Temporal grounding for time-aware memory retrieval
Availability
The system is open source under the MIT license and available at: github.com/JordanMcCann/agentmemory
📖 Read the full source: r/LocalLLaMA
👀 See Also

HomeButler: MCP Server for Managing Homelab Servers from Claude Without API Keys
HomeButler is an MCP server that lets Claude install, monitor, and manage self-hosted apps on homelab servers without requiring API keys. It runs locally, keeps everything on your network, and was built with Claude Code.

Improving Claude Code Sessions with claude-self-improve
Claude-self-improve is a CLI tool that enhances Claude Code's AI performance by analyzing session data and updating memory files automatically.

OpenUtter: Query Google Meet Transcripts Live via OpenClaw
OpenUtter is a skill that joins Google Meet as a guest via a headless browser, captures live captions, and streams them to your OpenClaw event bus. You can query the live transcript mid-call via Telegram, WhatsApp, Slack, or Discord.

Grape Root Tool Reduces Claude Code Token Usage by Caching Repository Context
A free experimental tool called Grape Root addresses redundant token consumption in Claude Code by maintaining lightweight state about previously explored repository files, preventing unnecessary re-reads of unchanged files during follow-up prompts.