Analysis: Anthropic's actual compute costs for Claude Code users are far lower than reported $5k figure

A recent Forbes article claimed Anthropic's $200/month Claude Code Max plan can consume about $5,000 in compute, suggesting the company is losing money on inference. This analysis examines why that figure is misleading.
API pricing vs actual compute costs
The $5,000 figure comes from Anthropic's retail API pricing: $5 per million input tokens and $25 per million output tokens for Opus 4.6. At these prices, a heavy user could indeed rack up $5,000/month in API-equivalent usage.
However, API pricing doesn't reflect what it actually costs Anthropic to serve those tokens. To estimate real inference costs, look at competitive pricing for similar models on OpenRouter:
- Qwen 3.5 397B-A17B (comparable to Opus 4.6): $0.39 per million input tokens, $2.34 per million output tokens
- Kimi K2.5 1T params with 32B active: $0.45 per million input tokens, $2.25 per million output tokens
- DeepInfra cache reads on Kimi K2.5: $0.07/MTok vs Anthropic's $0.50/MTok
The real math
These OpenRouter providers are running businesses with margins, not taking enormous losses. If they can serve comparable models at roughly 10% of Anthropic's API price, actual compute costs are likely in that range.
Therefore:
- Heavy user consuming $5,000 in API-equivalent tokens ≈ $500 in actual compute cost
- Loss on extreme power users: $300/month (not $4,800)
- Most users don't approach limits: Anthropic says fewer than 5% of subscribers would be affected by weekly caps
- Typical Max 20x plan usage around 50% of weekly token budget ≈ break-even or profitable for Anthropic
Who actually faces $5,000 costs?
The $5,000 figure comes from Cursor's internal analysis. For Cursor, the number is roughly correct because they pay Anthropic's retail API prices (or close to them) for Opus 4.6 access.
Developers want Anthropic models in Cursor due to brand awareness and current performance advantages over cheaper open-weight alternatives.
Broader implications
Anthropic isn't profitable overall due to training costs, researcher salaries, and compute commitments - not inference. On a per-user, per-token basis for inference, Anthropic is likely profitable on average Claude Code subscribers.
The "AI inference is a money pit" narrative plays into frontier labs' hands by discouraging competition and making their moats appear deeper than they are.
📖 Read the full source: HN AI Agents
👀 See Also

Three Inverse Laws of Robotics: Human Guidelines for AI Use
Susam Pal proposes three inverse laws of robotics for humans: don't anthropomorphize AI, don't blindly trust its output, and remain fully accountable. Practical warnings against over-reliance on generative AI.

When Everyone Has AI but the Company Still Learns Nothing: The Messy Middle of Enterprise AI Adoption
Ethan Mollick's framework shows that individual AI productivity gains don't automatically become organizational learning. The article explores why companies are stuck in a 'messy middle' where AI use is uneven, hidden, and disconnected from shared knowledge.

DeepSeek Makes Permanent 75% Discount on Flagship AI Model
DeepSeek is making permanent a 75% discount on its flagship AI model. The price cut applies to API access and was originally a temporary promotion.

Quumble Convergence Protocol v5: Cross-Architecture LLM Experiment Results
The Quumble Convergence Protocol v5 tests whether independent LLM instances converge on descriptions of imaginary creatures when given nonsense words. Results show both Claude (Opus 4.6 & Sonnet 4.6) and GPT-5.3 independently produced a small, round, soft, lavender-tinted, bioluminescent creature that hums from the word 'quumble'.