oMLX introduces SSD KV caching for Apple Silicon, reducing OpenClaw response times from 30-90 seconds to 5 seconds

What oMLX solves
Running OpenClaw locally typically means sending the same massive system prompt (20-30k tokens covering tools, skills, workspace context) on every request. While Ollama and LM Studio cache KV state, they invalidate the entire cache and recompute from scratch when context shifts mid-session, resulting in 30-90 second response times.
oMLX fixes this by persisting KV cache blocks to SSD in safetensors format. When a previously seen prefix returns, it's restored from disk instead of recomputed - working across requests and server restarts. Since OpenClaw's system prompt is mostly static (only timestamps and runtime metadata shift), SSD caching means only changed parts get recomputed.
Performance benchmarks
Tested with Qwen3.5-122B-A10B-4bit on M3 Ultra 512GB:
- Single request benchmarks:
- 1k context: 768 tok/s prompt processing, 56.6 tok/s generation, 65.5 GB peak memory
- 8k context: 940 tok/s prompt processing, 51.4 tok/s generation, 69.3 GB peak memory
- 32k context: 764 tok/s prompt processing, 42.4 tok/s generation, 73.4 GB peak memory
- Continuous batching (pp1024/tg128):
- 1x batch: 56.6 tok/s, 1.00x speedup
- 2x batch: 92.1 tok/s, 1.63x speedup
- 4x batch: 135.1 tok/s, 2.39x speedup
- 8x batch: 190.2 tok/s, 3.36x speedup
Setup with OpenClaw
- Download the DMG from releases and drag to Applications
- Point it at your model directory (reuses LM Studio models, no re-download needed)
- Add oMLX as a custom provider in openclaw.json
- The web dashboard generates the exact config - no terminal needed
Additional features
- Multi-model serving: LLM + embedding + reranker simultaneously
- Tool calling for all major formats (JSON, Qwen, Gemma, GLM) + MCP
- Tool result trimming - truncates oversized tool outputs
- OpenAI + Anthropic /v1/messages drop-in compatibility
- Native macOS menu bar app (not Electron)
- Apache 2.0 license, 100% open source
📖 Read the full source: r/openclaw
👀 See Also

PowerShell Script Automates OpenClaw Docker Setup on Windows
A PowerShell script handles Windows-specific networking quirks and Docker configuration for OpenClaw, automating checks, image retrieval, setup guidance, and container deployment.

SWE-CI: New Benchmark Tests AI Agents on Long-Term Code Maintenance via CI
SWE-CI is a repository-level benchmark that evaluates LLM-powered agents on maintaining codebases through continuous integration cycles, shifting focus from static bug fixing to long-term maintainability across 100 real-world tasks.

AgentSwarms: Free Hands-On Playground for Learning Agentic AI
AgentSwarms offers 5 tracks, 40+ lessons, and 30+ runnable agents for free — no setup or API keys required to start. Learn by building from prompts to multi-agent swarms.

Claude IDE Bridge: Open-source tool gives Claude AI direct access to your code editor
Claude IDE Bridge is an open-source, MIT-licensed tool that connects Claude AI directly to your code editor, allowing it to view open files, unsaved changes, and errors live rather than through pasted code snippets. The tool currently works with VS Code and Windsurf.