Anthropic API Billing Bug: Sonnet Model Charged at Opus Rates

Bug Details
A significant billing discrepancy has been identified in the Anthropic API for the claude-sonnet-4-6 model. While the API correctly reports the model as Sonnet in the response, the actual billing calculation uses Opus pricing, resulting in higher charges than expected.
Evidence from Raw Event Data
The bug was discovered through analysis of a high-token request with heavy prompt caching. The specific data points from the raw event are:
- Reported Model: claude-sonnet-4-6
- Input Tokens: 6
- Output Tokens: 4,034
- Cache Creation (Write): 61,920 tokens
- Cache Read: 171,391 tokens
- Billed totalCostUsd: $0.5735755
The user noted that this total cost matches exactly what would be expected for Opus pricing, not Sonnet pricing, creating a significant cost difference for API users.
Impact and Context
This bug affects developers using the Anthropic Claude API with the Sonnet model. Since Opus is Anthropic's most expensive model tier, this discrepancy could result in substantially higher costs than anticipated. The bug appears to be in the billing calculation logic rather than the model selection itself, as the API correctly identifies the model as Sonnet in responses.
For developers monitoring API costs, this means current billing reports may be inaccurate for Sonnet usage. The issue was reported on the ClaudeAI subreddit where users are discussing potential workarounds and monitoring for an official fix from Anthropic.
📖 Read the full source: r/ClaudeAI
👀 See Also

ICML 2026 Desk-Rejects 2% of Papers for LLM Review Policy Violations
ICML 2026 rejected 497 papers (~2% of submissions) after detecting 795 reviews (~1% of all reviews) where reviewers violated explicit agreements not to use LLMs. The detection method involved watermarking PDFs with hidden LLM instructions.

Anthropic Acquires Stainless for $300M+ — Now Owns Dominant MCP Server Generator
Anthropic bought SDK generator Stainless for $300M+. Stainless generates most production MCP servers from OpenAPI specs. The hosted product is winding down; new signups stopped Monday.

Uber burns 2026 AI budget in 4 months on Claude Code — $500–$2k per engineer monthly
Uber spent its entire 2026 AI budget by April on Claude Code and Cursor. Monthly API costs hit $500–$2,000 per engineer. 95% of engineers use AI tools monthly; 70% of committed code is AI-generated.

Qwen3.6-27B Fits on Single 24GB GPU, Beats Former 397B MoE on SWE-bench
Qwen3.6-27B (Apache 2.0, 262K context) runs at Q4_K_M in ~16.8GB, achieving SWE-bench Verified 77.2 — outperforming Qwen3.5-397B-A17B MoE (76.2). Uses Gated DeltaNet linear attention with Thinking Preservation for agent workflows.