ClawCut Proxy Released on GitHub to Optimize OpenClaw for Small LLMs

ClawCut Proxy is now available on GitHub as an experimental tool designed to optimize OpenClaw's interaction with local LLMs, particularly smaller models that struggle with OpenClaw's default large system prompts and complex tool definitions.
What ClawCut Solves
OpenClaw sends massive system prompts (often >28,000 characters) and complex JSON tool definitions to LLMs. While large cloud models or high-end local models (14B+) handle this well, small models (7B, 8B) running on limited hardware (Mac/MLX or Raspberry Pi) suffer from "Cognitive Overload," leading to:
- Extreme processing latency (slow Time To First Token)
- Models forgetting their identity or available tools
- Hallucinating text answers instead of executing local scripts
- Connection timeouts or malformed JSON responses
- Huge RAM consumption
How ClawCut Works
ClawCut acts as a "Man-in-the-Middle" between OpenClaw and your local LLM server with these optimization features:
- PROMPT TRIMMING: Automatically removes unused default skills from the system prompt to keep the context window small and focused
- SMART AMNESIA: Intelligently truncates chat history after successful tool executions to free up "mental space" for the model
- ATTENTION FORCER: Injects a reminder at the very end of the user query to ensure the model prioritizes tool usage
- TOOL FORCER: Injects keywords for tool calling and points to commands
- INPUT RESCUE: Short-circuits known incoming requests (like Cron-Jobs) to bypass LLM latency and ensure 100% reliability for automated tasks
- BASH-RESCUE: Detects poorly formatted script calls (e.g., naked code blocks) and converts them into valid OpenClaw tool calls on the fly
- Automatically filters dynamic timestamps from system prompts to enable near-instant responses via hardware caching
- Translates between OpenAI-compatible streams (MLX) and the Ollama/NDJSON format expected by OpenClaw
- Real-time console output of prefill duration, token count
Performance and Debugging
ClawCut provides significantly faster response times (TTFT) as the model has less text to process upfront, improved reliability when calling scripts, and robust error handling for stream interruptions or formatting errors. With DEBUG_MODE enabled, you can inspect the full "JSON Clutter" sent by OpenClaw to understand exactly what the model is processing.
When to Use
Ideal for small models (7B-8B) running on hardware like Mac (MLX), Windows, or Linux, especially if your model "chats" too much instead of executing commands. Use with caution if you're using highly intelligent, large models (14B+) that can handle complex prompts natively. In this case, the proxy can act purely as a logger and format translator without manipulating content if PASS_THROUGH_MODE = True.
📖 Read the full source: r/openclaw
👀 See Also

Persistent Side Panel for Claude Code with Autonomous Content Management
A developer built a TUI panel that sits in an iTerm2 split pane next to the terminal, featuring three fixed panels that Claude autonomously manages to show relevant content like code, diagrams, and status updates.

Beacon: Open-Source Endpoint Telemetry for Local AI Agents
Beacon captures local AI agent activity (Claude Code, Codex CLI, Cursor, etc.) and normalizes it into endpoint events for inspection or SIEM forwarding via Wazuh, Elastic, Splunk HEC.

Claude Counter: Android app tracks Claude usage limits with real-time notifications
A developer built Claude Counter, a free Android app that polls Claude's API to display live session and weekly usage limits. The app shows progress bars, provides rich notifications with percentage remaining, and alerts when limits reset.

OpenClaw Skill Connects Agents to Knods.io UI for Workflow Creation
A developer has built an OpenClaw skill that enables agents to understand and create workflows within the Knods.io UI, allowing users to switch between specific agents like brand-specific ones instead of relying on Knods' built-in agent.