Giving Claude a Local LLM as an Assistant via MCP on Mac

A Reddit user detailed how they gave Claude access to a local LLM running on a Mac Mini M4 (24GB RAM) via an MCP connection to Ollama. The setup uses Ollama serving Qwen 2.5 Coder (14B) as an assistant named 'Frank', which Claude can delegate tasks to under specific rules — must use fewer tokens than Claude itself, must not affect quality, and requires a final review.
Setup Details
- Hardware: Mac Mini M4 with 24GB RAM.
- Local LLM: Qwen 2.5 Coder (14B) running via Ollama (also tested with LM Studio).
- Connection: MCP (Model Context Protocol) to link Claude (CLI or Desktop App) with the local model.
- Instructions: Claude was given a memory Markdown file (
memory.md) with guidelines for when and how to use Frank — e.g., delegate text processing, large CSS/HTML file handling, and use only when it saves tokens without degrading output quality.
Practical Use Cases
- Text processing and transformation — offloaded to Frank to reduce Claude's token usage.
- Handling large CSS/HTML files that would be expensive for Claude to process directly.
- Running performance, coding, and logic tests — Claude evaluated local models via Frank rather than manually.
The user noted they are operating at the limits of their RAM/GPU and cannot test larger models (30B+). They invited others with more powerful hardware to try similar setups and share results.
This approach effectively creates a cost-free assistant for Claude, offloading token-heavy tasks while maintaining quality through Claude's final review.
📖 Read the full source: r/ClaudeAI
👀 See Also

Building a Self-Updating Writing Style Guide for AI-Assisted Content
A team building a voice extraction platform called Noren has developed a 117-line Markdown style guide that rewrites itself after every published piece, using Claude to enforce rules and banning AI-sounding words like 'cadence' and 'optimize'.

Creative Excellence Plugin for Claude Code Improves Animation Quality with Interaction Thesis
A new open-source plugin for Claude Code addresses generic animation generation by implementing an 'interaction thesis' approach where Claude must describe motion concepts before coding. The plugin includes 8 sub-skills covering GSAP, Framer Motion, CSS animations, and design principles from studied repositories.

AgentLens: Observability Tool for Multi-Agent AI Workflows
AgentLens provides unified tracing across Ollama, vLLM, Anthropic, and OpenAI, with cost tracking, an MCP server for querying stats from Claude Code, and a CLI for inline checks. It's self-hosted and runs locally via Docker.

Relvy improves Claude's root cause analysis accuracy by 12 percentage points on OpenRCA benchmark
Relvy, a tool that automates runbooks, has demonstrated a 12 percentage point improvement in Claude's accuracy on the OpenRCA benchmark for root cause analysis. The results were shared via a Hacker News post with 11 points.