Agent frameworks waste 350,000+ tokens per session resending static files

Token waste benchmark results
Measurements on a local Qwen 3.5 122B setup revealed that agent frameworks waste more than 350,000 tokens per session by repeatedly resending static files. The source describes these numbers as "unreal."
Optimization approach
A compile-time approach was discovered that reduces query context from 1,373 tokens to just 73 tokens. This represents a 95% reduction in token usage for this specific context.
The benchmark also found that naive JSON conversion makes the problem 30% worse, increasing token waste beyond the baseline measurements.
Technical context
Agent frameworks typically include system prompts, tool definitions, and other configuration data that remains static across multiple interactions within a session. When this data is resent with every query, it consumes tokens without providing new information to the model. This is particularly costly with large models like Qwen 3.5 122B where token processing directly impacts both performance and cost.
The compile-time approach likely involves pre-processing static elements so they're referenced rather than resent, similar to how modern web applications cache static assets. For developers working with AI coding agents, reducing this overhead can significantly improve response times and reduce operational costs.
📖 Read the full source: r/LocalLLaMA
👀 See Also

certctl: Self-hosted certificate lifecycle platform with 78 API endpoints for AI agent automation
certctl is a self-hosted certificate lifecycle platform built with Go and TypeScript that exposes 78 REST API endpoints for certificate management. The platform is issuer-agnostic and target-agnostic, with an MCP server planned to expose all functionality as native MCP tools.

Cognitive Science Technique Boosts LLM Creativity: /reframe Slash Command for Claude Code
A Reddit user developed a /reframe slash command for Claude Code that implements a cognitive science technique called distance-engagement oscillation, which improved creative problem-solving by 40% in tests across three open-weight LLMs.

Reddit User Shares AI Tool for Gathering Financial Account Balances
A Reddit post on r/openclaw presents an AI agent designed to streamline the collection of financial account balances using Python. Users discuss automation potential via custom scripts leveraging APIs like Plaid.

MCP Gateway for Secure Remote Access to Internal Tools
An open-source MCP gateway aggregates multiple MCP tool servers into a single connection, enabling secure access via Claude Desktop without exposing public endpoints. It uses OpenZiti/zrok for zero-trust networking and requires only one configuration entry with a share token.