Token Waste in Claude Code: A User's Self-Audit Shows Behavioral Fixes Beat Model Switching

A Reddit user spent a week measuring where their Claude Code tokens actually went, rather than just complaining about the May price changes. Their conclusion: most burn was self-inflicted, and behavioral changes bought back more headroom than switching models would have.
Biggest Wins
/clearbetween unrelated tasks — a stale 200k-token context riding along for a one-line fix was the single most expensive habit.- Make it plan before it touches files. One planning pass, then execute — cheaper and better than explore-edit-explore in a loop.
- Stop letting it re-read files it just touched. If it just edited a file, it does not need to reopen it to "verify." Say so once in your rules.
- Search with a subagent, not the main thread. Grep-and-read across a repo dumps the whole haystack into your main context permanently. A subagent returns just the answer.
- Kill always-on and
-ploops you are not watching. Background agents burning tokens while you sleep are most of the horror-story bills.
None of these fixes required a new subscription, a wrapper, or an MCP server. It was discipline the user admits being too lazy to apply while limits felt infinite.
The post acknowledges that none of this fixes the actual price hikes — it just stops you burning extra on top of them.
📖 Read the full source: r/ClaudeAI
👀 See Also

The Mother-In-Law Method: Weaponizing Claude's Agreeableness for Brutal Code Reviews
A Reddit user tricks Claude into harsh code reviews by framing the code as written by a hated mother-in-law, resulting in 27 issues found across 4 hostile reviewer agents after 31 minutes of deep analysis.

How to Set Up an AI-Powered Morning Briefing

MTP Acceptance Rate: 50% Threshold Determines Speculative Decoding Benefit
MTP (Multi-Token Prediction) via speculative decoding on Gemma-4 26B shows benefit only when draft token acceptance rate exceeds 50% — based on mlx-vlm benchmarks on M4 Max Studio.

OpenClaw v2026.3.13 adds per-agent cacheRetention config for OpenAI token cost savings
OpenClaw v2026.3.13 adds per-agent cacheRetention configuration that enables OpenAI's 24-hour prompt cache retention, potentially cutting input token costs by up to 90% for agents with heartbeat cycles longer than 10 minutes.