Using a Local LLM as a Claude Code Subagent to Reduce Context Usage

A developer on r/LocalLLaMA demonstrates how to use Claude Code to delegate tasks to a local LLM running via LM Studio, reducing Claude's context usage by keeping file content local.
How It Works
The system uses a small Python script (~120 lines, standard library only) that runs an agent loop:
- You pass Claude a task description without file content
- The script sends it to LM Studio's
/v1/chat/completionsendpoint withread_fileandlist_dirtool definitions - The local model calls those tools itself to read the files it needs
- The loop continues until it produces a final answer
- Claude sees only the result, not the file content
Example Usage
python3 agent_lm.py --dir /path/to/project "summarize solar-system.html"
# [turn 1] → read_file({'path': 'solar-system.html'})
# [turn 2] → This HTML file creates an interactive animated solar system...
The file content goes into the local model's context (tested with Qwen3.5 35B 4-bit via MLX on Apple Silicon), not Claude's.
What It's Good For
- Code summarization and explanation
- Bug finding
- Boilerplate / first-draft generation
- Text transformation and translation (tested with Hebrew)
- Logic tasks and reasoning (use
--thinkflag for harder problems)
What It's Not Good For
- Tasks that require Claude's full context, such as multi-file understanding where relationships matter
- Tasks needing the current conversation history
- Anything where accuracy is critical
The author describes it as "a Haiku-tier assistant, not a replacement."
Setup
- LM Studio running locally with the API server enabled
- One Python script for the agent loop, one for simple prompt-only queries
- Both wired into a global
~/.claude/CLAUDE.mdso Claude Code knows to offer delegation when relevant - No MCP server, no pip dependencies, no plugin infrastructure needed
- Recommendation: Add
{%- set enable_thinking = false %}to the top of the jinja template - for most tasks this saves time and tokens without quality degradation
The author notes they had Claude help write the post but with supervision and corrections, and is happy to share the scripts if there's interest.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Open Source Dashboard Reveals Actual Claude Code Compute Costs
A developer reverse-engineered Claude Code's rate limit formula to build a local dashboard that shows real-time usage percentage, actual dollar costs, burn rate, peak hours, and which skills/hooks are firing. The tool revealed a $100/month plan consumed $13,286 in equivalent API compute in one month.

Ktx: An Executable Context Layer to Fix Data Agent Accuracy
Ktx is an open-source executable context layer that makes agents reliable on your data stack by combining Markdown wiki ingestion with YAML semantic definitions.

agentcache: Python Library for Multi-Agent LLM Prefix Caching
agentcache is a Python library that enables multi-agent LLM frameworks to share cached prompt prefixes, achieving up to 76% cache hit rates and cutting inference time by more than half in tests with GPT-4o-mini.

OpenClaw 2026.3.23 adds DeepSeek provider, Qwen pay-as-you-go, and Chrome MCP improvements
OpenClaw v2026.3.23 introduces a DeepSeek provider plugin, Qwen pay-as-you-go pricing, OpenRouter auto pricing with Anthropic thinking order, Chrome MCP tab waiting, and fixes for Discord/Slack/Matrix and Web UI.