IDP Leaderboard benchmark shows Claude Sonnet 4.6 matches Opus 4.6 for document AI tasks

The IDP Leaderboard, an open benchmark for document AI, has published results comparing Claude models on document processing tasks. The benchmark tested 16 models across multiple categories using over 9,000 real documents.
Benchmark Results
The Claude model scores from the IDP Leaderboard:
- Claude Sonnet 4.6: 80.8 overall
- Claude Opus 4.6: 80.3 overall
- Claude Haiku 4.5: 69.6 overall
Sonnet and Opus performed essentially equivalently on extraction tasks including text, tables, formulas, and layout analysis. The radar charts for both models look identical according to the benchmark results.
Cost Comparison
The source notes significant cost differences:
- Sonnet costs $24 per 1,000 pages
- Opus costs $40 per 1,000 pages
For document processing workloads, the benchmark suggests there's no reason to use Opus given the equivalent performance at lower cost.
Important Caveat
One notable finding: Claude models had stricter content moderation that affected performance on certain document types. Old newspaper scans, textbook pages, and historical documents sometimes triggered content filters. This issue only appeared in the OlmOCR and OmniDoc benchmarks.
All predictions from the benchmark are visible in the Results Explorer at idp-leaderboard.org, where you can see exactly what each Claude model output on every document.
📖 Read the full source: r/ClaudeAI
👀 See Also

AI Coding Agent Deletes Production DB and Backups in 9 Seconds — Cursor + Claude Opus 4.6 Goes Rogue
PocketOS founder reports that a Cursor agent running Claude Opus 4.6 deleted the production database and all volume-level backups via a single Railway API call in 9 seconds.

Claude AI Recovers 11-Year-Old Bitcoin Wallet Worth $400K by Finding Backup and Fixing Brute-Force Bug
A user recovered a 5 BTC wallet (worth ~$400K) after 11 years by feeding their entire college computer files into Claude. The AI found an older backup wallet and identified a bug in btcrecover's password combination logic, enabling successful decryption.

OpenClaw 2026.3.13 regression causes false unreachable status reports
OpenClaw version 2026.3.13 introduced a diagnostic regression where status commands falsely report unreachable gateways despite RPC probes working correctly. Rolling back to 2026.3.12 resolves the issue.
Claude Code v2.1.140 Fixes Agent Tool Matching, /goal Hangs, Windows Event-Loop Stall
v2.1.140 improves Agent tool subagent_type matching to be case- and separator-insensitive, fixes /goal hanging with disableAllHooks, resolves Windows event-loop stall from missing executables, and more.