Claw Compactor: 14-stage token compression engine for LLM pipelines

✍️ OpenClawRadar📅 Published: March 18, 2026🔗 Source
Claw Compactor: 14-stage token compression engine for LLM pipelines
Ad

What is Claw Compactor?

Claw Compactor is an open-source LLM token compression engine built around a 14-stage Fusion Pipeline. Each stage is a specialized compressor — from AST-aware code analysis to JSON statistical sampling to simhash-based deduplication — chained through an immutable data flow architecture where each stage's output feeds the next.

Architecture Details

The Fusion Pipeline includes these stages:

  • QuantumLock → Cortex → Photon → RLE → SemanticDedup → Ionizer
  • LogCrunch → SearchCrunch → DiffCrunch → StructuralCollapse
  • Neurosyntax → Nexus → TokenOpt → Abbrev

Key design principles:

  • Immutable data flow — FusionContext is a frozen dataclass. Every stage produces a new FusionResult; nothing is mutated in-place.
  • Gate-before-compress — Each stage has should_apply() that inspects context type, language, and role before doing any work. Stages that don't apply are skipped at zero cost.
  • Content-aware routing — Cortex auto-detects content type (code, JSON, logs, diffs, search results) and language (Python, Go, Rust, TypeScript, etc.), then downstream stages make type-aware compression decisions.
  • Reversible compression — Ionizer stores originals in a hash-addressed RewindStore. The LLM can call a tool to retrieve any compressed section by its marker ID.
Ad

Benchmark Results

Real-World Compression (FusionEngine v7 vs Legacy Regex):

  • Python source: 25.0% compression (3.4x improvement over legacy)
  • JSON (100 items): 81.9% compression (6.5x improvement)
  • Build logs: 24.1% compression (4.4x improvement)
  • Agent conversation: 31.0% compression (5.4x improvement)
  • Git diff: 15.0% compression (2.4x improvement)
  • Search results: 40.7% compression (7.7x improvement)
  • Weighted average: 53.9% compression (5.9x improvement)

SWE-bench Real Tasks:

  • django__django-11620 (4.5K): 14.5% compression
  • sympy__sympy-14396 (5.5K): 19.1% compression
  • scikit-learn-25747 (11.8K): 15.9% compression
  • scikit-learn-13554 (73K): 11.8% compression
  • scikit-learn-25308 (81K): 14.4% compression

vs LLMLingua-2 (ROUGE-L Fidelity):

  • Compression rate 0.3 (aggressive): Claw Compactor 0.653 vs LLMLingua-2 0.346 (+88.2%)
  • Compression rate 0.5 (balanced): Claw Compactor 0.723 vs LLMLingua-2 0.570 (+26.8%)

Quick Start

git clone https://github.com/open-compress/claw-compactor.git
cd claw-compactor
# Benchmark your workspace (non-destructive)
python3 scripts/mem_compress.py /path/to/workspace benchmark
# Full compression pipeline
python3 scripts/mem_compress.py /path/to/workspace full

Requirements: Python 3.9+. Optional: pip install tiktoken for exact token counts.

API Usage

from scripts.lib.fusion.engine import FusionEngine

engine = FusionEngine() result = engine.compress( text="def hello(): \n # greeting function \n print('hello')", content_type="code", # or let Cortex auto-detect language="python", # optional hint ) print(result["compressed"]) # compressed output print(result["stats"]) # per-stage stats

📖 Read the full source: HN LLM Tools

Ad

👀 See Also