How to Cut OpenClaw Agent Costs by 80% with Model Switching

A Reddit user spent two weeks manually logging every OpenClaw agent interaction to figure out where their money was going. The results are a clear blueprint for optimizing spend on AI agents.
The Breakdown
Over 14 days on a Telegram + Discord agent, token usage broke down as follows:
- Heartbeats (30-min polls) — 38% of usage. Running on Opus at ~$6.75/M tokens. Complete waste for a status ping.
- File reads and summaries — 29% of usage. Also on Opus. Flash handles these identically.
- Actual conversations — 22% of usage. Here model quality matters.
- Complex tasks — 11% of usage. Where Opus genuinely outperforms Flash.
In total, 67% of spend went to tasks where DeepSeek V4 Flash ($0.14/M) would deliver identical quality to Opus ($6.75/M effective after tokenizer).
The Fix: Default to Flash, Escalate Only When Needed
Set your primary model to deepseek/deepseek-v4-flash in openclaw.json:
"agents": {
"defaults": {
"model": {
"primary": "deepseek/deepseek-v4-flash"
}
}
}Then use /model anthropic/claude-opus-4-7 mid-session when you hit something truly hard. The switch is instant — no restart, same session. Type /model deepseek/deepseek-v4-flash when you're done to drop back to cheap.
Results
Costs dropped from ~$170/month to ~$35/month. The quality difference on heartbeats, file reads, and simple questions was literally zero.
The user notes that BetterClaw's free tier (with BYOK) now shows per-task API spend, which would have caught the heartbeat waste immediately. But the core move — switching primary to Flash and /model-ing up to Opus only when needed — is the real takeaway.
📖 Read the full source: r/openclaw
👀 See Also

Reducing MCP token usage by replacing servers with CLI alternatives
A developer found that MCP servers were consuming 30-40% of their context window with tool definitions, so they replaced four MCP servers with CLI tools where available, reducing from 6 to 2 MCP servers while maintaining functionality.

High CPU/RAM and Gateway Restarts in OpenClaw? Disable IPv6 for Telegram
Setting autoSelectFamily: false and dnsResultOrder: 'ipv4first' in Telegram bot config stops ENETUNREACH errors, fixing high CPU, event loop freezes, and gateway restarts.

Using ntfy for OpenClaw agent notifications
A developer shares their experience using ntfy.sh's self-hosted version for push notifications from OpenClaw agents, avoiding Discord/Telegram bots by running ntfy serve on the same VPS and using HTTP post requests.

Fix Ollama Cloud Model maxTokens: Cap is 16K, Not Config Value
Ollama cloud caps output at 16,384 tokens regardless of maxTokens config. Set to 14,000 to avoid EOF errors. Restructure long outputs or route to direct provider.