Structured workflow beats plan mode and superpowers on AI DES benchmark

✍️ OpenClawRadar📅 Published: May 1, 2026🔗 Source
Structured workflow beats plan mode and superpowers on AI DES benchmark
Ad

A Reddit post shares results from the new AI-assisted Discrete-Event Simulation (DES) benchmark. The submission using the Ouroboros workflow (ooo) inside Claude Code ranked #1, beating both Claude's built-in plan mode and the 'superpowers' fat-skill stacks.

Benchmark details

The benchmark tests full understanding of a real-world system — a mining haulage system with trucks, loading points, dumping points, routes, and queues. Submissions are judged on:

  • Comprehension of system structure
  • Abstracting into a discrete-event simulation model
  • Designing events, state changes, and KPIs
  • Producing executable simulation code
  • Interpreting results (bottlenecks, throughput, waiting times)
  • Generating human-readable artifacts (topology diagrams, animations)

Ouroboros performance

The Ouroboros submission included working DES code, a topology diagram of the mining system, and an animation of trucks hauling ore. Notably, when the MCP server failed mid-run, Ouroboros fell back to a skills-based path and finished the task — demonstrating recovery and rerouting in real deployments.

Ad

Comparison

  • Plan mode (lightweight planning) — decent baseline
  • Superpowers / fat-skill stacks — worse than plan mode on this task
  • Ouroboros (structured: clarify → plan → execute → evaluate → recover → iterate) — best

The result suggests that structuring the workflow around problem definition, planning, execution, evaluation, and recovery is more effective than piling on more instructions and bigger skills.

Ouroboros: https://github.com/Q00/ouroboros
Benchmark: https://simulation-bench.fly.dev/

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also