Fix Ollama Cloud Model maxTokens: Cap is 16K, Not Config Value

PSA for anyone seeing unexpected EOF from agents on production turns: if your openclaw.json has cloud model entries like { "id": "deepseek-v4-pro:cloud", "maxTokens": 500000 }, that maxTokens isn't real. Ollama cloud caps output at 16,384 tokens server-side regardless of your config. When an agent tries to emit something past that, the upstream kills the socket mid-stream and you see a transport error from ollama.com:443. OpenClaw treats that as a timeout-shaped failover, so it'll try your fallback if configured — but if the fallback is also a :cloud model, same wall.
What Helped
- Fix maxTokens on cloud entries so OpenClaw doesn't ask for output budgets the service won't honor:
{ "id": "deepseek-v4-pro:cloud", "maxTokens": 14000 }
{ "id": "kimi-k2.6:cloud", "maxTokens": 14000 }
14k not 16k — leaves a little headroom because models sometimes get weird right at the absolute cap. - Restructure large structured outputs (long JSON, multi-section content) to emit one section per turn instead of batching everything. Stays under the cap and retries are cleaner.
- Route heavy agents to a direct provider via per-agent model override in
agents.list[]instead of going through:cloud. Leave small-output agents on Ollama cloud. One-time setup:
openclaw onboard --auth-choice deepseek-api-key
Then in agents.list override the ones that need it:
"list": [ { "id": "your-agent", "model": "deepseek/deepseek-v4-pro" } ]
Trade-off: per-token billing instead of flat fee, but scoped to agents that need headroom.
Takeaway
If your agents fail partway through long outputs and you've checked the obvious stuff, look at your provider's actual output cap before going down the OpenClaw-bug rabbit hole. The error message is useless and the config field doesn't tell you it's being overridden server side.
📖 Read the full source: r/openclaw
👀 See Also

Model Routing Cut API Costs by 85% vs Claude Max Subscription – A Developer's Analysis
A Claude Max subscriber tracked token usage and found only 15% of tasks needed Opus. Switching to API routing (Sonnet for routine tasks, Opus for hard reasoning) dropped monthly cost from $200 to ~$30 with identical output quality.

Day 1 Configuration: Prevent 90% of Common OpenClaw Problems
Set spending limits, write a SOUL.md, and adjust heartbeat interval to avoid surprise bills, rogue behavior, and cost shock.

Practical Strategies to Avoid Claude Rate Limits on $200 Max Plan
A developer shares specific techniques that have prevented throttling on Claude's $200 max plan for over a month, including SQLite database queries, context handoff systems, and strategic hardware deployment.

Using project narratives to manage memory in large OpenClaw projects
A developer shares a process where after each major milestone, they spawn a separate OpenClaw worker to analyze the codebase and write a 'project narrative' document, which helps identify broken pipelines, redundancies, and missing pieces that the main worker might overlook.