Claude Code Sub-Agents: Save Tokens With 90% Cache Hit

Anthropic numbers often ignored in the "use sub-agents!" hype: multi-agent systems consume about 15× more tokens than a single chat, and they are "less effective for tightly interdependent tasks such as coding" (source). However, cached tokens cost only 10% of normal (90% discount) — but only if the content flagged for caching is identical across requests (source).

Multi-agent multiplies token use by 15. The cache divides it by 10. Whether sub-agents save or burn comes down to one thing: do all sub-agents share the same prefix?

Three Ways to Delegate, Ordered by Cost

1. Sub-agent with subagent_type set. Custom system prompt, custom tools, custom permissions (Anthropic). Different prompt = different cache. No sharing with the parent. Full price every spawn. Use when you actually need isolation.
2. Clone that inherits the parent. No subagent_type. Inherits the parent's prompt, tools, and history exactly. Children 2..N hit the cache at 10% price. Five clones reading files in parallel ≈ 5× speed at ~1.5× cost.
3. No sub-agent. Stay in the main agent. Cheapest per turn. Right answer when the work depends on itself — refactors where step 2 needs step 1's result.

When NOT to Delegate (Anthropic's Own Line)

"Best for tasks that can be divided into parallel strands of research." Translation:

Good: read 7 files in parallel, audit folders for a pattern, gather info from many sources.
Bad: refactor a module, fix a bug where each step depends on the previous. Main agent only.

If you slice tightly coupled work into sub-agents, you pay 15× and gain nothing.

What Breaks the Cache

Anthropic: editing tool definitions, switching models, adding or removing images, or changing the earlier prompt structure breaks the cached prefix (source). So:

Install your MCPs at session start, not mid-session.
Pick the model up front.
Don't edit CLAUDE.md or auto-memory mid-session — they live inside the cached prefix.

📖 Read the full source: r/ClaudeAI

Parallel Sub-Agents in Claude Code: When They Save vs. Burn Tokens

Three Ways to Delegate, Ordered by Cost

When NOT to Delegate (Anthropic's Own Line)

What Breaks the Cache

👀 See Also

Fine-Tuning Qwen 14B for Discord Autocomplete

Ghostbar: A ~5MB native macOS Swift AI client that hides from screen sharing

Claude Code Voice Mode: Hands-Free AI Conversations for Developers

Open Source Dashboard Reveals Actual Claude Code Compute Costs