ClawCut Proxy Released on GitHub to Optimize OpenClaw for Small LLMs

✍️ OpenClawRadar📅 Published: March 15, 2026🔗 Source
ClawCut Proxy Released on GitHub to Optimize OpenClaw for Small LLMs
Ad

ClawCut Proxy is now available on GitHub as an experimental tool designed to optimize OpenClaw's interaction with local LLMs, particularly smaller models that struggle with OpenClaw's default large system prompts and complex tool definitions.

What ClawCut Solves

OpenClaw sends massive system prompts (often >28,000 characters) and complex JSON tool definitions to LLMs. While large cloud models or high-end local models (14B+) handle this well, small models (7B, 8B) running on limited hardware (Mac/MLX or Raspberry Pi) suffer from "Cognitive Overload," leading to:

  • Extreme processing latency (slow Time To First Token)
  • Models forgetting their identity or available tools
  • Hallucinating text answers instead of executing local scripts
  • Connection timeouts or malformed JSON responses
  • Huge RAM consumption

How ClawCut Works

ClawCut acts as a "Man-in-the-Middle" between OpenClaw and your local LLM server with these optimization features:

  • PROMPT TRIMMING: Automatically removes unused default skills from the system prompt to keep the context window small and focused
  • SMART AMNESIA: Intelligently truncates chat history after successful tool executions to free up "mental space" for the model
  • ATTENTION FORCER: Injects a reminder at the very end of the user query to ensure the model prioritizes tool usage
  • TOOL FORCER: Injects keywords for tool calling and points to commands
  • INPUT RESCUE: Short-circuits known incoming requests (like Cron-Jobs) to bypass LLM latency and ensure 100% reliability for automated tasks
  • BASH-RESCUE: Detects poorly formatted script calls (e.g., naked code blocks) and converts them into valid OpenClaw tool calls on the fly
  • Automatically filters dynamic timestamps from system prompts to enable near-instant responses via hardware caching
  • Translates between OpenAI-compatible streams (MLX) and the Ollama/NDJSON format expected by OpenClaw
  • Real-time console output of prefill duration, token count
Ad

Performance and Debugging

ClawCut provides significantly faster response times (TTFT) as the model has less text to process upfront, improved reliability when calling scripts, and robust error handling for stream interruptions or formatting errors. With DEBUG_MODE enabled, you can inspect the full "JSON Clutter" sent by OpenClaw to understand exactly what the model is processing.

When to Use

Ideal for small models (7B-8B) running on hardware like Mac (MLX), Windows, or Linux, especially if your model "chats" too much instead of executing commands. Use with caution if you're using highly intelligent, large models (14B+) that can handle complex prompts natively. In this case, the proxy can act purely as a logger and format translator without manipulating content if PASS_THROUGH_MODE = True.

📖 Read the full source: r/openclaw

Ad

👀 See Also