Measuring Claude Code MCP Stack: Cache Friendliness vs. Byte Savings, and a 2-Line Fix for Prompt Cache

When optimizing a Claude Code MCP stack, it's easy to focus on one metric: byte savings. But Greg Shevchenko's new analysis shows that a single-axis benchmark can recommend a system that's strictly worse in production. The missing axis: cache friendliness, i.e., whether the same input produces byte-identical bytes across runs so Anthropic's prompt cache hits.
Shevchenko's biggest byte-saver—a retrieval MCP that cut context 60–70%—was actually defeating the 5-minute TTL prompt cache on every call. Two runs of the same query produced different bytes because rg --files-with-matches output order leaked through a Map insertion sequence into the final context. The fix was two lines: sort the rg hits before slicing, and sort the Map entries by path. After the change, byte savings remained unchanged, but cache_friendly_score went from ~0% to 100%.
What the Harness Measures
Shevchenko released an open-source benchmark harness (stdlib-only Python, offline) that measures:
- Mean ratio + CV across N≥5 runs per fixture → byte-saving axis
- Unique MD5 count == 1 check → cache-friendliness axis (0–100%)
- 12-anti-pattern audit on tool definitions (DSA reference)
Any compressor as (str) -> str can be plugged in. The harness uses cluster-bootstrap CIs, Wilson CIs, preregistration, and real-data Cohen's κ.
Public Alternatives Surveyed
Shevchenko surveyed public docs for: Cursor codebase index, Sourcegraph Cody, Aider repo-map, Microsoft LLMLingua/LLMLingua-2, Firecrawl/Jina Reader, RouteLLM/Martian (as of May 2026). None disclosed cache-friendliness metrics.
Limitations
He hypothesized that the prep layer triggers more downstream cache hits on subsequent turns, but it didn't reach significance (Welch p=0.32, Cohen's d≈0.18, N=137). Two-judge Cohen's κ on the corpus was 0.5955 (moderate, below 0.7 threshold), with 4 of 5 disagreements on one ambiguous task—fixing the spec would push κ to ~0.83.
The harness is MIT-licensed. If you're running a Claude Code MCP stack, measuring cache_friendly_score is now a concrete, actionable step.
📖 Read the full source: r/ClaudeAI
👀 See Also

LystBot: An MCP Server for Claude to Manage Lists and Tasks
LystBot is a list management app with a native MCP server that allows Claude to directly interact with grocery lists, todos, and packing lists. Built primarily with Claude Code, it includes a Flutter mobile app, REST API, CLI, and open-source Node.js MCP server.

Efficient Token Management with Open-Source MCP Servers: Pare
Pare MCP servers reduce token waste and enhance efficiency when AI coding agents use developer tools by providing structured output.

SubQ: A Sub-Quadratic LLM with 12M-Token Context Window
SubQ is a fully sub-quadratic sparse-attention LLM offering a 12M-token context window at 150 tokens/s, with SWE-Bench Verified 81.8% and RULER @ 128K 95.0%. It reduces attention compute ~1000× compared to transformers.

Claude Code v2.1.90 adds mouse support with CLAUDE_CODE_NO_FLICKER flag
Anthropic released Claude Code v2.1.90 with a new feature that enables mouse support in the chat interface. Users can activate it by setting the CLAUDE_CODE_NO_FLICKER=1 environment variable before running claude.