Structured workflow beats plan mode and superpowers on AI DES benchmark

A Reddit post shares results from the new AI-assisted Discrete-Event Simulation (DES) benchmark. The submission using the Ouroboros workflow (ooo) inside Claude Code ranked #1, beating both Claude's built-in plan mode and the 'superpowers' fat-skill stacks.
Benchmark details
The benchmark tests full understanding of a real-world system — a mining haulage system with trucks, loading points, dumping points, routes, and queues. Submissions are judged on:
- Comprehension of system structure
- Abstracting into a discrete-event simulation model
- Designing events, state changes, and KPIs
- Producing executable simulation code
- Interpreting results (bottlenecks, throughput, waiting times)
- Generating human-readable artifacts (topology diagrams, animations)
Ouroboros performance
The Ouroboros submission included working DES code, a topology diagram of the mining system, and an animation of trucks hauling ore. Notably, when the MCP server failed mid-run, Ouroboros fell back to a skills-based path and finished the task — demonstrating recovery and rerouting in real deployments.
Comparison
- Plan mode (lightweight planning) — decent baseline
- Superpowers / fat-skill stacks — worse than plan mode on this task
- Ouroboros (structured: clarify → plan → execute → evaluate → recover → iterate) — best
The result suggests that structuring the workflow around problem definition, planning, execution, evaluation, and recovery is more effective than piling on more instructions and bigger skills.
Ouroboros: https://github.com/Q00/ouroboros
Benchmark: https://simulation-bench.fly.dev/
📖 Read the full source: r/ClaudeAI
👀 See Also

Anthropic Uses Google Forms for Claude Feedback
Anthropic, the company behind Claude, uses a Google Form from 2008 to collect design feedback instead of building a custom tool—highlighting a pragmatic build vs. buy philosophy.

Claude Opus 4.6 and Sonnet 4.6 now offer 1M context at standard pricing
Claude Opus 4.6 and Sonnet 4.6 now include a full 1M context window at standard pricing with no long-context premium, plus expanded media limits to 600 images or PDF pages per request.

OpenAI Training Costs Projected to Exceed Anthropic's by 4-5 Times Annually
According to confidential financials reported by the Wall Street Journal, OpenAI expects to spend 4-5 times more on training than Anthropic each year for the next five years. The expense scale is described as mind-boggling.

Ohio Suspends Data Center Tax Break: AI Cost Pressures Mount for Tech Firms
Ohio halts a sales tax exemption on equipment for new data centers, including those powering AI. The move signals growing state-level scrutiny of tax incentives as AI infrastructure demands surge.