Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%

✍️ OpenClawRadar📅 Published: June 2, 2026🔗 Source
Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%
Ad

Netflix senior engineer Tejas Chopra open-sourced Project Headroom, a local proxy that compresses context window input before it hits the LLM. Early estimates claim up to 90% of tokens are redundant — and since January 2026, the tool has saved users an aggregate $700,000 across 200 billion tokens.

How It Works

Headroom runs as a proxy on port 8787 on the developer's machine. You wrap your LLM CLI with the headroom wrap command, e.g.:

headroom wrap codex

It parses all input — conversation history, logs, tool outputs, files, RAG chunks — and applies lossless, reversible compression. It's best at cutting:

  • Server logs: 90% jettisoned
  • MCP tool outputs: 70% redundant JSON
  • Database outputs: repetitive schemas
  • File trees: repeated metadata

Building in Python and Node, Headroom current version is v0.22 with 2,000 GitHub stars and 120 forks.

Ad

Why It Matters

Chopra was inspired by a $287 Claude Sonnet bill from routine debugging and refactoring. He found the culprit wasn't his instructions — it was boilerplate, JSON schemas, and machine metadata. "This isn’t prose. This isn’t creative writing. This is compressible data masquerading as text," he wrote.

By default, Claude's prefix cache TTL is only five minutes; after inactivity, the entire context refreshes. You can set a longer TTL but pay double for writes to save 90% on reads. Headroom bypasses those tradeoffs.

Alternatives

Other tools exist: RTK (Rust Token Killer) trims verbose command output, and LeanCTX is a variant. Commercial options like Token Company (Y Combinator funded) offer compression-as-a-service. But Headroom's key feature is reversible compression and staying inside the developer's workflow.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Holisto Seed: A Local LLM Framework with Persistent Identity and Consensual Memory Consolidation
Tools

Holisto Seed: A Local LLM Framework with Persistent Identity and Consensual Memory Consolidation

Holisto Seed is a Relational Individuation Framework that gives LLM agents persistent identity, biographical memory, and co-evolutionary relationships with users. It runs fully local with a Git-based versioning system and features a consensual sleep cycle for memory consolidation.

OpenClawRadar
Essential OpenClaw plugins for developers using AI coding agents
Tools

Essential OpenClaw plugins for developers using AI coding agents

A developer tested OpenClaw plugins and identified essential tools including env-guard for security, commit-guard for preventing bad commits, composio for connecting to 860+ tools, cortex-memory for long sessions, cost-tracker for spending visibility, and openclaw-better-gateway for fixing flaky connections.

OpenClawRadar
DocMason: Local Agent Knowledge Base for Complex Office Files
Tools

DocMason: Local Agent Knowledge Base for Complex Office Files

DocMason is a repo-native agent app that builds local knowledge bases from complex office documents like PPTX, DOCX, Excel, and PDFs. It runs entirely within Codex or Claude Code, maintaining document structure and providing traceable answers with provenance.

OpenClawRadar
Alfred Beta Launches: Simplified OpenClaw Alternative for Non-Technical Users
Tools

Alfred Beta Launches: Simplified OpenClaw Alternative for Non-Technical Users

Alfred is a new beta tool that provides approximately 70% of OpenClaw's functionality with significantly reduced complexity, featuring simple defaults for app connections, memory, usage modes, and infrastructure while allowing customization.

OpenClawRadar