How to Compress AI Agent Context: Open-Source Context Gateway Proxy

What Context Gateway Does

Context Gateway is an agentic proxy that sits between AI coding agents (like Claude Code, OpenClaw, or Cursor) and the LLM API. When tool outputs like file reads or grep results dump thousands of tokens into the context window, the proxy compresses this content before it reaches the LLM. The motivation comes from research showing that long-context benchmarks experience steep accuracy drops as context grows—OpenAI's GPT-5.4 evaluation reportedly drops from 97.2% at 32k tokens to 36.6% at 1M tokens.

How the Compression Works

The system uses small language models (SLMs) that examine model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, compression happens conditioned on the intent of the tool call. For example, if an agent called grep looking for error handling patterns, the SLM keeps relevant matches and strips the rest. If the model later needs something that was removed, it can call expand() to fetch the original output.

Key Features and Setup

Background compaction: Triggered at 85% window capacity, with summaries pre-computed so you don't wait for compaction
Lazy-load tool descriptions: The model only sees tools relevant to the current step
Spending caps: Control costs with budget limits
Dashboard: Track running and past sessions
Slack notifications: Get pinged when an agent is waiting on you
Supported agents: Claude Code, Cursor, OpenClaw, or custom configurations

Getting Started

Install with:

curl -fsSL https://compresr.ai/api/install | sh

Then run context-gateway to launch an interactive TUI wizard that helps you:

Choose an agent (claude_code, cursor, openclaw, or custom)
Create/edit configuration including summarizer model and API key
Enable Slack notifications if needed
Set trigger threshold for compression (default: 75%)

The tool is open-source, built primarily in Go (90.9%), and maintained by Compresr, a YC-backed company. You can check compaction logs at logs/history_compaction.jsonl to see what's happening under the hood.

📖 Read the full source: HN LLM Tools