Opus 4.7 Reasoning Effort Benchmark: Medium Beats High and Max on Real Tasks
Reddit user ktane tested Claude Opus 4.7 in Claude Code across five reasoning effort settings (low, medium, high, xhigh, max) on 29 real tasks from the open-source GraphQL-go-tools repository. The result: medium reasoning effort consistently outperformed higher settings on test pass rate, semantic equivalence with human-authored patches, code-review pass rate, and aggregate craft/discipline scores.
Key Results
- All-task pass rate: Medium 28/29, Max 27/29, High 26/29, Xhigh 25/29, Low 23/29
- Equivalent patches: Medium 14/29, Max 13/29, High 12/29, Xhigh 11/29, Low 10/29
- Code-review pass rate: Medium 10/29, High 7/29, Max 8/29, Xhigh 4/29, Low 5/29
- Code-review rubric mean: Medium 2.716, High 2.509, Xhigh 2.482, Max 2.431, Low 2.426
- Footprint risk (lower is better): Low 0.155, Medium 0.189, High 0.206, Max 0.227, Xhigh 0.238
- Cost per task: Low $2.50, Medium $3.15, High $5.01, Xhigh $6.51, Max $8.84
- Duration per task: Low 383.8s, Medium 450.7s, High 716.4s, Xhigh 803.8s, Max 996.9s
- Equivalent passes per dollar: Low 4.0, Medium 4.4, High 2.4, Xhigh 1.7, Max 1.5
The author notes that Opus 4.7 uses adaptive thinking — it already allocates reasoning budget per task. The effort knob thus biases an already-adaptive policy rather than adding raw intelligence. Notably, in one PR (#1260), high and xhigh settings wasted extra reasoning on digging up commit hashes from prior PRs and concluded 'no work needed', while medium and max correctly read the control flow and produced a fix.
This contrasts with GPT-5.5 in Codex, which showed the intuitive monotonic curve where more reasoning improved quality. The full interactive report with per-task drilldowns is available at stet.sh.
📖 Read the full source: r/ClaudeAI
👀 See Also

MiniMax M2.7 Model Released with Improved Coding Performance
MiniMax has released M2.7, an AI model that scores 56% on SWE-Pro coding benchmarks and includes self-optimization capabilities. The model maintains pricing at $0.30 per million input tokens.

OpenClaw 2026.4.2 and 2026.3.31 break local LLM connections
OpenClaw versions 2026.4.2 and 2026.3.31 are causing connection timeouts to locally hosted Ollama instances. The issue appears when connecting to Ubuntu boxes running locally, with error logs showing LLM request timeouts and failover decisions.

Liquid AI releases LFM2.5-350M model for agentic loops
Liquid AI released LFM2.5-350M, a 350M parameter model trained for reliable data extraction and tool use. It's under 500MB when quantized and outperforms larger models like Qwen3.5-0.8B in most benchmarks while being faster and more memory efficient.

Docker Containers: The Case Against Cron Jobs
A discussion from r/openclaw highlights the contentious topic of using cron jobs within Docker containers. While easy automation might be the immediate appeal, the community advises against it.