ClawCut: A Python Proxy That Makes Small Local LLMs Usable with OpenClaw

What ClawCut Does
ClawCut is a Python Flask application that acts as a proxy between local LLM servers (like MLX or Ollama) and the OpenClaw framework. It was created to solve specific technical problems that make small local models (7B/14B) difficult to use as practical assistants with OpenClaw.
Key Problems Solved
- Context poisoning: Small models lose track of tool usage when they see their own old tool calls in chat history
- Infinite loops: Models get stuck repeating patterns instead of executing commands
- Output issues: Models output bash code as plain text in chat or choke on their own history after multiple commands
- Cron job failures: Scheduled background jobs generate responses that disappear because no active chat window is open
- LLM artifacts: Empty markdown blocks, internal XML tags, and dangling backticks clutter outputs
- Media upload refusal: Models sometimes refuse to upload generated files
How It Works
Dynamic amnesia for tool calls: During normal chat, history is preserved. When the proxy detects the model trying to use a system tool, it temporarily cuts off old chat history, giving the model "tunnel vision" to execute shell commands cleanly without loops or hallucinations.
Universal auto-delivery for cron jobs: The proxy monitors the model's stream and intercepts clean text responses at the end of thought processes. It then forces delivery via automatic tool calls to WhatsApp, Telegram, or Signal, making cron jobs proactively report to your phone.
Artifact filtering: Empty markdown blocks, internal XML tags, and dangling backticks are filtered out before reaching the frontend.
Tool-name manipulation: Simple stream manipulations bypass models' refusal to upload generated media files.
Tested Setup
- Raspberry Pi 5 (8GB) with OpenClaw 3.8
- Mac mini M4 Pro 24GB with MLX-LLM running Qwen2.5-Coder-7B-Instruct-4bit
- Windows machine with Ollama and Qwen 2.5 Coder 14B model (planned for ClawCut integration)
Limitations
ClawCut doesn't turn 7B models into GPT-4. Highly complex, multi-step logic chains remain challenging for small models. The proxy specifically addresses technical stumbling blocks that previously made them nearly unusable as everyday assistants.
📖 Read the full source: r/openclaw
👀 See Also

Claudeck: Browser UI for Claude Code with Agents, Cost Tracking, and Plugin System
Claudeck is a browser-based UI that wraps the Claude Code SDK, featuring autonomous agent orchestration, cost tracking, git worktree isolation, persistent memory, and a plugin system. Install with npx claudeck@latest.

Engram: Hybrid Memory Plugin for OpenClaw Agents — Vector + Semantic Search with Decay
Engram gives OpenClaw agents persistent memory across sessions using SQLite+FTS5 for exact recall and LanceDB for semantic search, with decay classes and auto-capture hooks.

OMAR: Open-Source TUI for Managing Hundreds of AI Coding Agents Hierarchically
OMAR is a terminal-based dashboard that lets you manage swarms of coding agents (Claude Code, Codex, Cursor, Opencode) in hierarchical orgs. Built on tmux. Features agent-managing-agent hierarchies, heterogeneous backends, and Slack integration.

Gemma 4 26B vs Qwen 3.5 27B: Local Business Workflow Benchmark on RTX 4090
A developer tested Gemma 4 26B and Qwen 3.5 27B on an RTX 4090 workstation for 18 real business operator tasks. Gemma won 13-5, showing faster speed and better discipline for daily execution work, while Qwen excelled at broader strategic thinking.