Claude Code Token Waste Fix: Disable Attribution Header for Better Cache Hits

Claude Code has been wasting tokens on every new session since version 2.1.69 due to a billing attribution header that breaks prompt caching. The issue is documented in multiple GitHub issues (#40652, #34629, #40524) with no official response from Anthropic as of the source publication.
What's Happening
Since v2.1.69, Claude Code inserts a billing attribution string into the first block of your system prompt: x-anthropic-billing-header: cc_version=2.1.88.a3f; cc_entrypoint=cli; cch=00000;
The .a3f part is a 3-character hash computed from your first message in each conversation using this function:
function computeHash(firstUserMessage, version) {
const chars = [4, 7, 20].map(i => firstUserMessage[i] || "0").join("");
return sha256("59cf53e54c78" + chars + version).slice(0, 3);
}Different conversations with different first messages generate different hashes every time.
Why This Breaks Caching
Anthropic's caching requires 100% identical prompt segments. The cache is shared across your entire Organization or Workspace, not per session. The billing header sits at the front of the ~23K token system prompt, and since it changes per conversation, the prefix never matches, causing cache misses on every new chat.
Benchmark Results
A controlled A/B test showed:
- Header ON (default): 48% cache hit rate, ~12K tokens rebuilt per session
- Header OFF: 99.98% cache hit rate, zero cache creation on 3 out of 4 sessions
The result is 7x cheaper per session on system prompt processing.
The Fix
Add this to your shell configuration:
export CLAUDE_CODE_ATTRIBUTION_HEADER=falseFor zsh users:
echo 'export CLAUDE_CODE_ATTRIBUTION_HEADER=false' >> ~/.zshrc
source ~/.zshrcNew sessions pick it up automatically. Existing sessions don't need restarting—the hash doesn't change mid-conversation, and they don't interfere with new sessions.
Safety and Background
This is not a hack—the environment variable exists in the source code as a proper feature toggle. claude-code-router and CLIProxyAPI have been shipping with this disabled in production with no reported issues.
Anthropic likely implemented this to track which version and entrypoint (CLI vs SDK vs GitHub Action) made each API call, placing it in the system prompt because Bedrock/Vertex don't forward custom headers.
📖 Read the full source: r/ClaudeAI
👀 See Also

Using project narratives to manage memory in large OpenClaw projects
A developer shares a process where after each major milestone, they spawn a separate OpenClaw worker to analyze the codebase and write a 'project narrative' document, which helps identify broken pipelines, redundancies, and missing pieces that the main worker might overlook.

Claude Prompt Codes Retested: L99 Sharper, OODA Narrower, ARTIFACTS Faded, and 3 New Codes to Use
A 6-month retest of L99, OODA, and ARTIFACTS prompt codes on Claude shows L99 sharper on Sonnet 4.6/Opus 4.7, OODA failing on strategic prompts, ARTIFACTS unnecessary for code, and three new codes (/skeptic, /blindspots, /decompose) earning daily use. Stack no more than 2 codes.

Governance Layer for Claude Agents: Hard Safety Boundaries and Live Traces in Production
A Claude API user built a lightweight governance layer below the agent to add hard safety boundaries, real-time traces, human-in-the-loop control via Telegram, and automatic checkpointing — solving silent failures and runaway token costs in long-running agent loops.

Claude Code's tendency to validate flawed assumptions and prompting workarounds
A developer reports Claude Code will enthusiastically implement flawed architectures without questioning incorrect assumptions, leading to wasted debugging time. The workaround is to explicitly add "assume I might be wrong about the framing" to complex requests.