How to Save Claude Code Tokens: Route Simple Questions to Haiku

One developer on r/ClaudeAI was hitting their $20 Claude Code weekly cap by Thursday every week. After auditing the last 50 prompts, they realized most were simple chat questions that didn't need an agent: “what's this stack trace saying”, “regex to match X”, “explain what this bash one-liner does”, “convert this curl to httpie”, and “what's the jq for pulling field Y out of this”.

Every one of those prompts in Claude Code was paying the full agent tax — context loading, tool definitions, planning tokens — for a one-line answer. The fix: route all chat-shaped questions to a regular chat window using a cheap model (Haiku or GPT-mini). Reserve Claude Code for multi-file edits, refactors, and debugging that actually needs codebase reading.

Results after ~3 weeks

Went from hitting the weekly cap by Thursday to not hitting it at all, doing the same amount of work.
Extra spend on cheap-model API calls: roughly $3–4/week — negligible.
Side benefit: cheap-model answers come back faster than Claude Code spinning up its agent loop, so quick questions feel quicker too.

Workflow note

To avoid alt-tabbing between the terminal (Claude Code) and a chat window, they now use a terminal called yaw.sh that puts a multi-provider chat at the prompt next to Claude Code. But any chat tool in another window works — the workflow change is what saves the tokens.

TL;DR: If you're hitting the Claude Code weekly cap, audit your last 50 prompts. Most probably don't need an agent. Move those off and you'll likely stop hitting the cap.

📖 Read the full source: r/ClaudeAI

Stop Burning Claude Code Tokens on Chat Questions

Results after ~3 weeks

Workflow note

👀 See Also

Claude Code Headless Mode with --print Flag

Governance Layer for Claude Agents: Hard Safety Boundaries and Live Traces in Production

Loading Every MCP Server on Every Prompt Quietly Destroys Token Budget

Claude's /btw Command Enables Parallel Communication During Tasks