MemAware Benchmark Tests AI Memory Beyond Keyword Search

MemAware is an open-source benchmark designed to test whether AI assistants with memory can surface relevant context from past conversations when current queries don't explicitly hint at that information.
How the Benchmark Works
The benchmark contains 900 questions across three difficulty levels. It tests scenarios where relevant context exists in memory but the current question doesn't contain keywords that would trigger a search match. For example: you told your AI assistant about your 45-minute commute months ago, then later ask "What time should I set my alarm for my 8:30 AM meeting?" The assistant should factor in your commute, but searching "alarm 8:30 meeting" won't find conversations about commuting.
Key Findings
- Search barely helps: BM25 search scored 2.8% vs 0.8% with no memory — a tiny improvement that costs 5x the tokens.
- Vector search fails on hard questions: It helps when keywords overlap (6%) but drops to 0.7% on cross-domain connections — the same as no memory. Example hard question: "How should I bid at the charity auction?" should recall a past $800 handbag purchase as a spending baseline, but embedding similarity can't connect these concepts.
- Searching when you shouldn't is expensive: The "always search" pattern reads ~4.7K tokens of results per question regardless of whether they help. Most of the time, the results are irrelevant noise.
The Core Problem
Current AI memory implementations are essentially just search systems. True memory awareness — knowing what information is stored and proactively surfacing relevant context — is a different problem that search alone can't solve.
The benchmark is available for testing different approaches at: https://github.com/kevin-hs-sohn/memaware
📖 Read the full source: r/ClaudeAI
👀 See Also
ClaudeAI Brainstorming Mode Gets Visual Companion for Mockups and UI Approval
A user discovers a new 'Visual companion' feature in ClaudeAI brainstorming mode that serves mockups on a local web server, enabling back-and-forth UI tweaks before building.

Paper Lantern MCP Server Connects Claude Code to Research Papers
Paper Lantern is an MCP server built with Claude Code that connects coding agents to over 2 million CS and 43 million biomedical research papers, enabling them to find benchmarked methods instead of defaulting to training data.

Agentic Context Engine: Automated Agent Improvement Loop with 34.2% Accuracy Gain
An open-source tool automates the entire agent improvement loop from trace analysis to fix implementation, achieving 34.2% accuracy improvement on Tau-2 Bench in one iteration. The system uses Claude Code in a REPL environment to analyze failures and decide between prompt or code fixes.

PreToolUse Hook Fixes Claude Code Image Crash Problem
A developer created a PreToolUse hook that intercepts Claude Code's Read calls on images, converts them safely, and proxies them through a Haiku subprocess to prevent API Error 400 crashes from problematic images.