Using a smaller model as a runtime hygiene layer improves OpenClaw agent reliability

Problem: Sloppy outputs degrade long-running agents
When running OpenClaw locally on a Mac Studio M4 (36GB) with Qwen 3.5 27B (4-bit, oMLX) as a household agent, the model didn't become less capable over time—it became sloppy. Specific issues included:
- Tool calls leaking as raw text instead of structured tool use
- Planning thoughts bleeding into final replies
- Parroting tool results and policy text back to the user
- Malformed outputs poisoning the context, causing degradation with each subsequent turn
The core issue wasn't capability but runtime hygiene: the model knew what to do but failed at proper behavior within the OpenClaw runtime environment.
Solution: Four-layer architecture for runtime hygiene
The developer implemented a four-layer approach that proved more effective than simply using a larger model:
- Summarization: Context compaction via lossless-claw (DAG-based, freshTailCount=12, contextThreshold=0.60). This provided the single biggest improvement.
- Sheriff: Regex and heuristic checks that catch malformed replies before they enter OpenClaw. This prevents leaked tool markup, planner ramble, and raw JSON from becoming durable context.
- Judge: A smaller, cheaper model that classifies borderline outputs as "valid final answer" vs "junk." This model isn't for intelligence but for runtime hygiene—it's an immune system rather than a second brain. It also handles all summarization for lossless-claw.
- Ozempic (internal name): Aggressive memory scrubbing that ensures the model re-reads only user requests, final answers, and compact tool-derived facts on future turns—not planner rambling, raw tool JSON, retry artifacts, or policy self-talk.
Why this beats using a bigger model
A single model must simultaneously solve tasks, maintain formatting discipline, manage context coherence, avoid poisoning itself with its own outputs, and recover from bad outputs—especially challenging at local quantization levels. Splitting responsibilities so the main model does the work while a smaller model maintains runtime hygiene proved more effective than adding more parameters.
Result: Sustained operation without resets
The approach moved from needing /new resets every 20-30 minutes to sustained single-session operation on a Mac Studio M4 with 36GB RAM, fully local with no API calls.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Developer Implements AI-Ready Feedback Loop for Feature Shipping
A developer built a feedback system that captures app context and automatically generates structured GitHub issues, then uses Claude Code with a triage skill to turn those issues into scoped development tasks. Two features were shipped using this workflow from mobile devices.

Claude Code vs Codex: A Builder's Workflow Split
A developer shares practical split: Claude Code for focused repo work with clean diffs, Codex for messy cross-tool tasks involving browser, docs, and app testing.

Solo Founder Uses Claude Code for FDA Submission and Patent Review
A solo founder building a contactless sleep monitor used Claude Code for a 10-hour session to file an FDA Pre-Submission, create 8 regulatory documents, run a parallel agent patent review, and update 38 document references after regulatory changes.

OpenClaw Introduces One-Prompt Email Reporting for Seamless Operations
OpenClaw takes operational efficiency to the next level by enabling its agents to generate and send operational reports via a single prompt. This innovative feature simplifies workflow and enhances automation.