OpenClaw v2026.3.13 adds per-agent cacheRetention config for OpenAI token cost savings

What changed in v2026.3.13
OpenClaw version 2026.3.13 added proper configuration validation for params.cacheRetention on per-agent entries. This allows you to set cache retention declaratively in your openclaw.json configuration file.
The problem with default cache behavior
OpenAI supports extended prompt cache retention (24 hours) via prompt_cache_retention: "24h" in their API, which keeps your prompt prefix cached for 24 hours instead of the default 5-10 minutes. Cached input tokens are billed at 50% off.
If you're running agents on heartbeat cycles longer than 10 minutes (which the source notes is "basically everyone"), your cache goes completely cold between every single turn. This means you're paying full price for the entire input context on every heartbeat.
The source describes a setup with 15 agents on GPT-5.2 with heartbeats every 60-90 minutes where every heartbeat was a guaranteed cold start. The system prompt, bootstrap context, HEARTBEAT.md, AGENTS.md, SOUL.md, tool definitions — all of it was re-sent at full price every cycle because the cache expired in the gap between heartbeats.
How to configure it
You can now set cache retention in your openclaw.json:
{
"agents": {
"list": [
{
"agentId": "my-agent",
"params": {
"cacheRetention": "long"
}
}
]
}
}The "long" value maps to OpenAI's prompt_cache_retention: "24h" through the pi-ai library.
Important caveat: runtime patch required
OpenClaw's resolveCacheRetention() function has a guard clause that blocks OpenAI providers by default. It only allows Anthropic and Bedrock through. So even with the config set, the value gets filtered out before it reaches the API.
You need the runtime patch from issue #27515 to make it work. The patch adds OpenAI to the allowed provider list in the guard clause. Without both the config AND the patch, nothing happens.
The source author notes they had the patch applied for weeks but never set the config value — meaning the patch was checking extraParams?.cacheRetention !== void 0, getting undefined, and still blocking OpenAI. The patch was doing nothing without the configuration.
Cost savings potential
With 15 agents doing heartbeats, each sending ~128K-170K input tokens per turn:
- Without 24h cache: 100% of input tokens at full price, every turn. Cache dies in the ~60-90 min gap between heartbeats.
- With 24h cache: The stable prefix (system prompt, agent config, tool definitions — typically 80-90% of input) stays cached across heartbeats. Those tokens are billed at half price.
On a system running 15 agents across a full business day, that's hundreds of heartbeat cycles per day where the bulk of input tokens shift from full price to half price. The input cost reduction compounds fast.
📖 Read the full source: r/openclaw
👀 See Also

6 Loop Types Found in Production AI Agents: A Week-Long Log Analysis
Analysis of 670 events from 5 production agents over a week reveals 6 high-severity loop patterns including decision oscillation, retry loops, ping pong loops, recall-write loops, reflection loops, and tool non-determinism.

Verification Harness Fixes Claude's Plan Execution Problem
A developer built a 30-50 line bash or Python verification layer that checks whether Claude actually executes each step of its own plans by verifying artifacts like file existence, API responses, and config changes.

Using Dictation Tools for More Effective AI Agent Instructions
A developer found that switching from typed to spoken instructions for OpenClaw improved output quality by providing more natural, detailed context, using SaySo.ai as a dictation tool.

Yes Flow/No Flow: A Simple Technique to Reduce Context Hallucination in AI Coding Sessions
A Reddit user shares the Yes Flow/No Flow technique for maintaining consistency in AI conversations by rewriting prompts instead of stacking corrections, which helps reduce context breakdown and hallucination during long coding sessions.