Debugging OpenClaw + Ollama Local Model Timeouts: Five Fixes for Silent Failures

Problem: OpenClaw Agents Silently Failing with Local Ollama Models
A developer debugging OpenClaw 2026.4.2 with Ollama 0.20.2 and the Gemma 4 26B-A4B Q8_0 model on an M4 Max Mac Studio found that agents would not respond after a /new command, despite the model working instantly via ollama run. No errors appeared in logs, and the agent showed no typing indicator.
Root Causes and Fixes
- Root Cause #1: Slug Generator Blocking: OpenClaw's
session-memoryhook runs a slug generator that sends a request to Ollama with a hardcoded 15-second timeout. If the model cannot process OpenClaw's system prompt in time, OpenClaw abandons the request, but Ollama continues processing it, blocking subsequent agent requests.
Fix:openclaw hooks disable session-memory - Root Cause #2: Large System Prompt: OpenClaw injects approximately 38,500 characters of system prompt (identity, tools, bootstrap files) per request. Local models require 40-60 seconds for the prefill phase.
Fix: Add to config to skip bootstrap injection and limit characters:
This reduces the prompt to ~19K characters.{ "agents": { "defaults": { "skipBootstrap": true, "bootstrapTotalMaxChars": 500 } } } - Root Cause #3: Hidden Idle Timeout: OpenClaw has a
DEFAULT_LLM_IDLE_TIMEOUT_MSof 60 seconds. If the model doesn't produce a first token within this time, it kills the connection and silently falls back to a fallback model (e.g., Claude Sonnet).
Fix: Set an undocumented config key:{ "agents": { "defaults": { "llm": { "idleTimeoutSeconds": 300 } } } } - Root Cause #4: Ollama Serial Processing: Ollama processes requests serially, so abandoned slug generator requests can hold processing slots.
Fix: Add to Ollama plist/service config:OLLAMA_NUM_PARALLEL=4 - Root Cause #5: Thinking Mode Delay: Gemma 4 defaults to a thinking/reasoning phase that adds 20-30 seconds before the first token.
Fix: Disable in config:{ "agents": { "defaults": { "thinkingDefault": "off" } } }
Full Working Configuration
The developer provided this complete config for a working setup:
{ "agents": { "defaults": { "model": { "primary": "ollama/gemma4:26b-a4b-it-q8_0", "fallbacks": ["anthropic/claude-sonnet-4-6"] }, "thinkingDefault": "off", "timeoutSeconds": 600, "skipBootstrap": true, "bootstrapTotalMaxChars": 500, "llm": { "idleTimeoutSeconds": 300 } } } }Additionally, pin the model in memory to prevent unloading between requests:
curl http://localhost:11434/api/generate -d '{"model":"gemma4:26b-a4b-it-q8_0","keep_alive":-1,"options":{"num_ctx":16384}}'Results and Trade-offs
After applying the fixes, the first message after /new takes about 60 seconds due to system prompt prefill, which is described as unavoidable for local models. Subsequent messages are fast because Ollama caches the KV state. The setup uses 31GB VRAM, 100% GPU, and a 16K context window, running fully local with zero API cost.
The initial delay is the trade-off for complete local operation, privacy, and no cost. The developer notes this is worth it if those factors are prioritized.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Fix for Running OpenClaw on Android via proot Ubuntu: Hijack networkInterfaces() to Resolve uv_interface_addresses Error 13
A developer shares a fix for running OpenClaw 2026.3.13 on Android 16 via Termux and proot Ubuntu 25.10, where the app crashes with 'uv_interface_addresses returned Unknown system error 13'. The solution is a JavaScript hijack script that overrides os.networkInterfaces().

OpenClaw Multi-Agent Playbook: 7 Isolated Agents for 5/Month
Complete architecture guide for running specialized AI agents with focused memory, least-privilege permissions, and smart model routing.

How to avoid unexpected OpenRouter costs in OpenClaw automation
A developer team accidentally spent $750 in 3 days on OpenRouter by defaulting to Claude Sonnet 4.6 ($3/M tokens) across all automation tasks. They reduced costs by 97% by changing default models, locking cron jobs and subagents to cheaper options, and reserving expensive models only for sensitive work.

Practical Framework for Choosing Between Claude's Haiku, Sonnet, and Opus Models
A developer tested Claude's three models on a 400-line Express.js refactoring task and found the key difference is reasoning depth, not intelligence. Haiku 4.5 handled straightforward parts but missed middleware ordering, Sonnet 4.6 caught the ordering issue and added TypeScript types, while Opus 4.6 identified a security flaw in auth middleware.