Parallel Sub-Agents in Claude Code: When They Save vs. Burn Tokens

Anthropic numbers often ignored in the "use sub-agents!" hype: multi-agent systems consume about 15× more tokens than a single chat, and they are "less effective for tightly interdependent tasks such as coding" (source). However, cached tokens cost only 10% of normal (90% discount) — but only if the content flagged for caching is identical across requests (source).
Multi-agent multiplies token use by 15. The cache divides it by 10. Whether sub-agents save or burn comes down to one thing: do all sub-agents share the same prefix?
Three Ways to Delegate, Ordered by Cost
- 1. Sub-agent with
subagent_typeset. Custom system prompt, custom tools, custom permissions (Anthropic). Different prompt = different cache. No sharing with the parent. Full price every spawn. Use when you actually need isolation. - 2. Clone that inherits the parent. No
subagent_type. Inherits the parent's prompt, tools, and history exactly. Children 2..N hit the cache at 10% price. Five clones reading files in parallel ≈ 5× speed at ~1.5× cost. - 3. No sub-agent. Stay in the main agent. Cheapest per turn. Right answer when the work depends on itself — refactors where step 2 needs step 1's result.
When NOT to Delegate (Anthropic's Own Line)
"Best for tasks that can be divided into parallel strands of research." Translation:
- Good: read 7 files in parallel, audit folders for a pattern, gather info from many sources.
- Bad: refactor a module, fix a bug where each step depends on the previous. Main agent only.
If you slice tightly coupled work into sub-agents, you pay 15× and gain nothing.
What Breaks the Cache
Anthropic: editing tool definitions, switching models, adding or removing images, or changing the earlier prompt structure breaks the cached prefix (source). So:
- Install your MCPs at session start, not mid-session.
- Pick the model up front.
- Don't edit
CLAUDE.mdor auto-memory mid-session — they live inside the cached prefix.
📖 Read the full source: r/ClaudeAI
👀 See Also

Fine-Tuning Qwen 14B for Discord Autocomplete
A developer fine-tuned the Qwen 14B model using his Discord message dataset to create an autocomplete tool.

Ghostbar: A ~5MB native macOS Swift AI client that hides from screen sharing
Ghostbar is a native Swift macOS menu bar AI client (~5MB) that uses window.sharingType = .none to become invisible to screen recorders. Works with Ollama, vLLM, llama.cpp, and any OpenAI-compatible backend.

Claude Code Voice Mode: Hands-Free AI Conversations for Developers
Claude's voice mode beta lets you speak to the AI and hear responses, with hands-free and push-to-talk options. It works on web and mobile, counts toward regular usage limits, and allows switching between text and voice in the same conversation.

Open Source Dashboard Reveals Actual Claude Code Compute Costs
A developer reverse-engineered Claude Code's rate limit formula to build a local dashboard that shows real-time usage percentage, actual dollar costs, burn rate, peak hours, and which skills/hooks are firing. The tool revealed a $100/month plan consumed $13,286 in equivalent API compute in one month.