OpenClaw Implements Agent History Compression to Reduce Context Usage

Context Management Problem
When running OpenClaw inside Docker, direct code writing by the agent fills context with noise: reading files (5K tokens), writing edits (500 tokens), running tests (200 tokens), and receiving stack traces (3K tokens). A single debug cycle consumes 10K-15K tokens, mostly from console output and stack traces that become useless after bug fixes. With 20-30 debug cycles per session, the entire context window gets consumed by noise.
Brain/Worker Architecture
The solution involves separating responsibilities: OpenClawd (in Docker) acts as the brain for planning, breaking work into subtasks, delegating, and coordinating. A local worker on the macOS host, powered by Qwen3.5-27B running on Apple Silicon via MLX with zero cost, serves as the hands for reading files, writing code, running tests, and debugging. This keeps noisy back-and-forth in the worker's context, with the brain only seeing final results like "task done, here are the files that changed."
Compression Strategy
Even with the brain/worker split, the orchestrator's context still fills up with operating docs: AGENTS (~6.6K tokens), SOUL (~1.5K tokens), LESSONS (~10K tokens), and plans/walkthroughs (~13K tokens on disk), totaling 20K-30K tokens before any work begins. Sessions can reach 100K-200K tokens.
The key insight: finished work doesn't need raw detail. Once a subtask is completed, its raw history becomes dead weight. The agent only needs to know: what was the task, did it succeed, what files changed, and any errors.
Implementation Details
Step 1: Detect lifecycle boundaries - The orchestrator decomposes work into subtasks with lifecycles: Spawn (agent calls sessions_spawn or delegate_task), Execute (tool calls, reasoning), and Complete (System Message "subagent 'task_name' completed"). A 4-pass scanner walks the session JSONL:
- Pass 1: Find spawn events
- Pass 2: Find spawn errors
- Pass 3: Find completion markers
- Pass 4: Compute tokens count and duration per lifecycle
This identifies message ranges belonging to completed subtasks.
Step 2: Summarize in "agent-language" (masking) - Summaries are generated to look like normal agent output to maintain compatibility with the orchestrator's expected message format (roles, content blocks, tool call structures, parent-child ID chains). These masked summaries replace raw task history.
Example compacted task summary:
── COMPACTED TASK ── origin: agent task: Implement idle timeout for MLX server outcome: success result: Added 5-min idle timer to MlxServerManager. Server auto-unloads when no requests received. files+: src/services/mlx_idle_monitor.py files~: src/services/mlx_server.py, config.json errors: none tried_and_failed: threading.Timer — race condition must_remember: MLX server must only reload on explicit worker request, not any tool call ─────────────────
This ~100 token summary replaces 5K tokens of raw tool calls and reasoning (99.2% reduction). Summaries are generated by a cheap LLM (Gemini Flash Lite or local MLX), with fallback mechanisms if generation fails.
📖 Read the full source: r/openclaw
👀 See Also

Snip: Open-source tool reduces Claude Code token usage with YAML filters
Snip is a Go-based tool that sits between Claude Code and the shell, filtering verbose command output through declarative YAML pipelines to reduce token usage by 60-90%. It includes 16 composable pipeline actions and works with multiple AI coding agents.

Custom status line for Claude Code shows context usage, rate limits, and token counts at a glance
A custom script adds a persistent status line to Claude Code, displaying context %, 5-hour rate limit %, KV cache reads, cumulative input/output tokens, model name, and working directory — color-coded for dark terminals.

Alternative AI Coding Setup After Claude Price Increase
A developer shares their current AI coding setup using GPT 5.4 as the primary model, Codex as a fallback included in ChatGPT subscription, and Minimax 2.7 as a backup with coding plan pricing.

Rowboat: Open-Source AI Coworker with Knowledge Graph Memory
Rowboat is an open-source app that transforms your work into a living knowledge graph, storing data locally as Markdown, and offering AI-driven local assistance.