Fixing Claude Code's KV Cache Invalidation with Local Backends

Claude Code versions 2.1.36 and above inject dynamic content into system prompts on every request, causing KV cache invalidation when using local inference backends like llama.cpp, llama-server, or LM Studio. This forces hardware to reprocess 20K+ token system prompts from scratch for minor tool calls.
The Problem
llama.cpp relies on exact string matching for KV cache reuse. When the beginning of a prompt changes, the entire cache is flushed and the full prompt must be reprocessed. Claude Code introduces two dynamic elements that mutate prompts on every turn:
- Telemetry Hash: Injects a billing/telemetry header (
x-anthropic-billing-header: cch=xxxxx) with a hash that changes on every request - Git Snapshot: Injects
git statusoutput into the environment block, changing the prompt whenever files are modified
This results in server logs showing "forcing full prompt re-processing due to lack of cache data" and 60+ second processing times for what should be minor operations.
The Solution
Configure Claude Code to disable dynamic prompt elements and route to your local hardware. Open ~/.claude/settings.json (or your project's local config) and ensure the following configuration:
{
"includeGitInstructions": false,
"env": {
"ANTHROPIC_BASE_URL": "<your-llama-server-here>",
"ANTHROPIC_API_KEY": "<any-string>",
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0",
"DISABLE_TELEMETRY": "1",
"DISABLE_ERROR_REPORTING": "1",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}After restarting Claude Code, llama-server logs should show improved cache recognition. Instead of processing 24,000 tokens, you'll see messages like "selected slot by LCP similarity, sim_best = 0.973" followed by "prompt processing progress, n_tokens = 24270, batch.n_tokens = 4" - indicating only 600 tokens of delta processing instead of full reprocessing.
This reduces local tool call times from over a minute to approximately 4 seconds on hardware like Turing-era Quadro RTX-8000.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code Cheat Sheet with 140 Tips and LLMs.txt File
A GitHub repository contains a Claude Code cheat sheet with 140 tips organized into 14 sections, tagged by difficulty. The repository includes an llms.txt file that can be fed directly to Claude for learning or applying the tips.

The LLM Voice Problem: Avoiding AI-Generated Writing Patterns
A developer discusses the common issue of LLM-assisted writing having recognizable "LLM-isms" that trigger immediate AI detection, and shares an article on identifying these patterns and editing for authenticity.

AGENTS.md Done Right: A 25% Correctness Boost — or a 30% Drop
Augment Code tested AGENTS.md files head-to-head: the best ones rival a model upgrade from Haiku to Opus; the worst ones hurt output. Decision tables, procedural workflows, and progressive disclosure win.

Setting up OpenClaw on macOS with a unified AI provider endpoint
A developer shares their experience installing OpenClaw on macOS, including the requirement for Node.js 24, using Homebrew for installation, configuring a custom OpenAI-compatible provider like ZenMux, and setting up a background daemon. Key troubleshooting tips include WhatsApp's default message blocking and using the openclaw doctor command.