Reduce Claude Context with Local LLM Subagent

Claude Code can orchestrate tasks by delegating to a local LLM running on your machine, similar to how it uses Claude subagents. This approach keeps file content out of Claude's context—only the local model's summary and insights are passed back.

How It Works

A small Python script (~120 lines, standard library only) runs an agent loop:

You pass Claude a task description without file content
The script sends it to LM Studio's /v1/chat/completions endpoint with read_file and list_dir tool definitions
The local model calls those tools itself to read the files it needs
The loop continues until it produces a final answer
Claude sees only the result

Example command:

python3 agent_lm.py --dir /path/to/project "summarize solar-system.html"

This results in:

[turn 1] → read_file({'path': 'solar-system.html'})
[turn 2] → This HTML file creates an interactive animated solar system...

The file content goes into the local model's context (tested with Qwen's context), not Claude's.

Use Cases and Limitations

Based on testing with Qwen3.5 35B 4-bit via MLX on Apple Silicon, this approach is good for:

Code summarization and explanation
Bug finding
Boilerplate / first-draft generation
Text transformation and translation (tested with Hebrew)
Logic tasks and reasoning (use --think flag for harder problems)

It's not good for:

Tasks that require Claude's full context
Multi-file understanding where relationships matter
Tasks needing the current conversation history
Anything where accuracy is critical

Think of it as a Haiku-tier assistant, not a replacement for Claude.

Setup Requirements

LM Studio running locally with the API server enabled
One Python script for the agent loop, one for simple prompt-only queries
Both wired into a global ~/.claude/CLAUDE.md so Claude Code knows to offer delegation when relevant
No MCP server, no pip dependencies, no plugin infrastructure needed

Configuration tip: Add {%- set enable_thinking = false %} to the top of the Jinja template. For most tasks, you don't need the local model to reason, and this saves time and tokens while increasing speed with no real degradation in quality for such tasks.

📖 Read the full source: r/ClaudeAI

Using a Local LLM as a Claude Code Subagent to Reduce Context Usage

How It Works

Use Cases and Limitations

Setup Requirements

👀 See Also

Developer shares solution for Claude AI ignoring rules beyond 50-count threshold

Open Source Agent Skill for TypeScript, React, and Next.js Patterns

Femtobot: Efficient Rust Agent for Low-Resource Environments

Query Your Jira Sprint Via Claude MCP: Instant Status, Unassigned Issues, and Blocked Items