Routing Agent Subtasks to Cheaper Models Dropped Cost from $18 to $4 on Same Refactor

One developer on r/ClaudeAI describes a practical cost-optimization strategy for agent loops: route routine subtasks to cheap models and reserve expensive models (Opus 4.7) only for complex reasoning. Their refactoring agent — handling CSS variable renames, YAML config updates, and linter runs via MCP — originally sent every step to Opus 4.7 at a total of about $18. After implementing routing logic, 178 of 212 steps went to cheap models, reducing cost to roughly $4 with no observable quality difference on routine changes.
Routing Logic
- Hard subtasks → Opus 4.7: Component architecture, debugging 2am code, anything requiring sustained reasoning across long conversations. The author notes Opus is genuinely unmatched at that kind of work — a previous attempt to route an auth middleware bug to a cheaper model silently broke session handling, costing an hour to trace.
- Routine subtasks → cheaper models: Lint, rename, config edits, tool orchestration. The author settled on DeepSeek V4 Pro for general coding chores and Tencent Hunyuan Hy3 preview for heavy tool calling. As of late April, Hunyuan Hy3 was ranked #1 on OpenRouter by tool call volume and almost never botches a function call when the schema is clean.
Cost Comparison
- Opus 4.7: ~$0.18 per million input tokens (estimated from context of ~28x cheaper alternative).
- Tencent Hunyuan Hy3: $0.18 per million input tokens, $0.59 per million output — roughly 28x cheaper than Opus 4.7 on input.
- Same 212-step refactor: 178 steps to cheap tier, 34 steps to Opus. Cost dropped from $18 to ~$4.
Failure Modes
- The tool-calling model hallucinates parameters when schemas are sloppy (author admits schemas were bad).
- DeepSeek V4 Pro occasionally writes syntactically perfect code that does the opposite of what was asked, surviving a quick skim.
- Neither cheap model can match Opus for debugging deep issues (e.g., auth flow silently eating a cookie).
Decision Heuristic
The author's routing rule of thumb: "How expensive is a wrong answer to catch?" A bad lint fix costs a 2-second git revert; a bad architecture call costs the whole afternoon.
The savings enabled previously skipped tasks — like writing and running tests for every CSS change, or regenerating all Open Graph images — because at fractions of a cent per tool call there's no reason not to.
📖 Read the full source: r/ClaudeAI
👀 See Also

How a Non-Coder Built a Reusable Claude Workflow for Founder Content Marketing
A former magazine editor with zero coding background shares how they accidentally built a repeatable Claude workflow for solo founder content marketing: dump raw thoughts, then restructure with Claude into platform-specific formats.

Managing Claude AI Token Consumption: Practical Tips from Developer Experience
A developer reports burning 94,000 tokens in 3 minutes using Claude's Explore feature, leading to rate limiting for 4 hours, and shares concrete strategies including maintaining an ARCHITECTURE.md file and using surgical prompts to control token usage.

Three Overlooked Bottlenecks in AI Agent Workflows: Ingestion, Context Management, and Model Routing
A deep dive into the three layers often skipped when optimizing AI agents: clean input ingestion, context window management across steps, and task-appropriate model routing. Practical fixes include using structured parsing, summarized step outputs, typed schemas, and matching models to task complexity.

Check Claude's project summaries into your repo — they're better than human docs
A developer suggests committing Claude-generated project summaries to your repo. They're good enough, take seconds to generate, and can help future readers.