Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%

✍️ OpenClawRadar📅 Published: June 2, 2026🔗 Source

Netflix senior engineer Tejas Chopra open-sourced Project Headroom, a local proxy that compresses context window input before it hits the LLM. Early estimates claim up to 90% of tokens are redundant — and since January 2026, the tool has saved users an aggregate $700,000 across 200 billion tokens.

How It Works

Headroom runs as a proxy on port 8787 on the developer's machine. You wrap your LLM CLI with the headroom wrap command, e.g.:

headroom wrap codex

It parses all input — conversation history, logs, tool outputs, files, RAG chunks — and applies lossless, reversible compression. It's best at cutting:

Server logs: 90% jettisoned
MCP tool outputs: 70% redundant JSON
Database outputs: repetitive schemas
File trees: repeated metadata

Building in Python and Node, Headroom current version is v0.22 with 2,000 GitHub stars and 120 forks.

Why It Matters

Chopra was inspired by a $287 Claude Sonnet bill from routine debugging and refactoring. He found the culprit wasn't his instructions — it was boilerplate, JSON schemas, and machine metadata. "This isn’t prose. This isn’t creative writing. This is compressible data masquerading as text," he wrote.

By default, Claude's prefix cache TTL is only five minutes; after inactivity, the entire context refreshes. You can set a longer TTL but pay double for writes to save 90% on reads. Headroom bypasses those tradeoffs.

Alternatives

Other tools exist: RTK (Rust Token Killer) trims verbose command output, and LeanCTX is a variant. Commercial options like Token Company (Y Combinator funded) offer compression-as-a-service. But Headroom's key feature is reversible compression and staying inside the developer's workflow.

📖 Read the full source: HN AI Agents

👀 See Also

Tools

Holisto Seed: A Local LLM Framework with Persistent Identity and Consensual Memory Consolidation

Holisto Seed is a Relational Individuation Framework that gives LLM agents persistent identity, biographical memory, and co-evolutionary relationships with users. It runs fully local with a Git-based versioning system and features a consensual sleep cycle for memory consolidation.

Apr 15, 2026, 11:58 AM UTC

OpenClawRadar

Tools

Essential OpenClaw plugins for developers using AI coding agents

A developer tested OpenClaw plugins and identified essential tools including env-guard for security, commit-guard for preventing bad commits, composio for connecting to 860+ tools, cortex-memory for long sessions, cost-tracker for spending visibility, and openclaw-better-gateway for fixing flaky connections.

Mar 28, 2026, 10:45 AM UTC

OpenClawRadar

Tools

DocMason: Local Agent Knowledge Base for Complex Office Files

DocMason is a repo-native agent app that builds local knowledge bases from complex office documents like PPTX, DOCX, Excel, and PDFs. It runs entirely within Codex or Claude Code, maintaining document structure and providing traceable answers with provenance.

Apr 15, 2026, 05:04 PM UTC

OpenClawRadar

Tools

Alfred Beta Launches: Simplified OpenClaw Alternative for Non-Technical Users

Alfred is a new beta tool that provides approximately 70% of OpenClaw's functionality with significantly reduced complexity, featuring simple defaults for app connections, memory, usage modes, and infrastructure while allowing customization.

Apr 16, 2026, 07:45 PM UTC

OpenClawRadar