Save on Claude Code Bills by Routing Planning Tokens to Cheaper Models

A Reddit user reports saving about $40 in overage fees on Claude Code last month by splitting token usage across models. The key insight: planning steps (especially in multi-file refactors) can consume up to 80% of the token budget, but most planning doesn't need the most expensive model.
How It Works
They wrote a 30-line wrapper that routes the initial 'figure out what to change' work to Haiku 3.5 — a cheaper model. Only the actual edits and decision-making stay on Opus or Sonnet. The setup took about 2 hours, including figuring out which steps were worth handing off.
Results
Last cycle ended with budget left over for the first time in 4 months. The user avoided the usual 2-day wait for the reset window. Savings: roughly $40 in overage fees.
# Pseudocode for the wrapper logic:
# 1. Send planning prompt to haiku-3.5
# 2. Get back a list of files and changes
# 3. Pass the plan + instruction to opus/sonnet for actual edits
Caveats
Haiku's planning quality is noticeably worse on architecture decisions. For refactor-and-test workflows where Opus picks up the real decisions anyway, it's fine. For greenfield design ('what should this app even be'), the user still lets Opus plan from scratch.
The user notes that this pattern is 'probably obvious to anyone who's looked at the OpenRouter model pricing tables,' but the Claude Code subagent docs are thin on this exact approach.
📖 Read the full source: r/ClaudeAI
👀 See Also

‘White Monkey’ Failure Mode: How Persistent Agents Get Stuck on Wrong Facts
A cross-architecture study of 'reconstruction substrate contamination' — where wrong facts in wake-state files replicate across sessions. Includes a 6-question survey for persistent agents.

Token Usage Tips for Claude Code
Practical advice from a Reddit post on reducing token burn: start fresh chats, group questions, keep CLAUDE.md lean, be precise with file references, summarize and restart threads, and use lighter models for simpler tasks.

OpenClaw Cost Optimization: From $200 to $1/Month

The Prompt Structure That Fixed Claude AI Summaries of Large PDF Reports
A developer shares how switching from 'summarize this' to role + decision + specific extraction prompts turned Claude's generic summary output into actionable risk flags and concrete action items.