OpenClaw Agent History Compression: Reduce 1M Tokens to 30K

Context Management Problem

When running OpenClaw inside Docker, direct code writing by the agent fills context with noise: reading files (5K tokens), writing edits (500 tokens), running tests (200 tokens), and receiving stack traces (3K tokens). A single debug cycle consumes 10K-15K tokens, mostly from console output and stack traces that become useless after bug fixes. With 20-30 debug cycles per session, the entire context window gets consumed by noise.

Brain/Worker Architecture

The solution involves separating responsibilities: OpenClawd (in Docker) acts as the brain for planning, breaking work into subtasks, delegating, and coordinating. A local worker on the macOS host, powered by Qwen3.5-27B running on Apple Silicon via MLX with zero cost, serves as the hands for reading files, writing code, running tests, and debugging. This keeps noisy back-and-forth in the worker's context, with the brain only seeing final results like "task done, here are the files that changed."

Compression Strategy

Even with the brain/worker split, the orchestrator's context still fills up with operating docs: AGENTS (~6.6K tokens), SOUL (~1.5K tokens), LESSONS (~10K tokens), and plans/walkthroughs (~13K tokens on disk), totaling 20K-30K tokens before any work begins. Sessions can reach 100K-200K tokens.

The key insight: finished work doesn't need raw detail. Once a subtask is completed, its raw history becomes dead weight. The agent only needs to know: what was the task, did it succeed, what files changed, and any errors.

Implementation Details

Step 1: Detect lifecycle boundaries - The orchestrator decomposes work into subtasks with lifecycles: Spawn (agent calls sessions_spawn or delegate_task), Execute (tool calls, reasoning), and Complete (System Message "subagent 'task_name' completed"). A 4-pass scanner walks the session JSONL:

Pass 1: Find spawn events
Pass 2: Find spawn errors
Pass 3: Find completion markers
Pass 4: Compute tokens count and duration per lifecycle

This identifies message ranges belonging to completed subtasks.

Step 2: Summarize in "agent-language" (masking) - Summaries are generated to look like normal agent output to maintain compatibility with the orchestrator's expected message format (roles, content blocks, tool call structures, parent-child ID chains). These masked summaries replace raw task history.

Example compacted task summary:

── COMPACTED TASK ──
origin: agent
task: Implement idle timeout for MLX server
outcome: success
result: Added 5-min idle timer to MlxServerManager.
Server auto-unloads when no requests received.
files+: src/services/mlx_idle_monitor.py
files~: src/services/mlx_server.py, config.json
errors: none
tried_and_failed: threading.Timer — race condition
must_remember: MLX server must only reload on explicit worker request, not any tool call
─────────────────

This ~100 token summary replaces 5K tokens of raw tool calls and reasoning (99.2% reduction). Summaries are generated by a cheap LLM (Gemini Flash Lite or local MLX), with fallback mechanisms if generation fails.

📖 Read the full source: r/openclaw