Practical experience replacing automation stack with MCP servers and local LLMs

Setup and hardware
The developer runs a mix of Qwen 2.5 32B (quantized) and Llama 3.3 70B on a dual 3090 rig. Each automation task gets its own MCP server that exposes tools the model can call, functioning like an API that an LLM consumes instead of a human.
What works well
- Code review automation: Pointing the model at a git diff via MCP tools catches real issues including logic bugs, missing error handling, and race conditions. Works about 70% as good as a senior dev review.
- Log analysis and alerting: MCP server connects to ELK stack, with the model monitoring for anomaly patterns. It has caught 3 production issues before Grafana alerts fired. The key is giving enough context about what "normal" looks like for your system.
- Documentation generation: Model reads the codebase through MCP file tools and generates/updates API docs, saving hours per week with genuinely good output quality.
What doesn't work (yet)
- Multi-step reasoning chains: Anything requiring more than 3-4 tool calls in sequence starts to go off the rails as the model loses context of the original goal. Smaller context windows make this worse. Chain-of-thought prompting helps but doesn't solve it.
- Real-time decision making: Latency on 70B models means this can't be used for time-sensitive tasks. Code review pipeline takes 2-3 minutes per PR, making it fine for async workflows but useless for real-time applications.
- Creative problem solving: Local models struggle with tasks requiring approaches not well-represented in training data. API models (Claude, GPT-4) are noticeably better here.
Key architectural lessons
- Keep MCP servers stateless. Let the model manage state through tool calls, not server-side session.
- Build retry logic into your MCP client, not the server. Models will make malformed tool calls approximately 5% of the time.
- Log every tool call and response for debugging when the model does something unexpected.
- Use structured output (JSON mode) for anything downstream systems consume. Free-form text output is a debugging nightmare.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw bot connects n8n, WordPress, Airtable, and GHL for CRM automation
A non-developer used an OpenClaw bot to connect n8n, WordPress, Airtable, and GoHighLevel environments via Telegram chats, building a CRM and workflow system within a week. The bot consumed significant tokens but proved cheaper than hiring technical help.

Using Claude as a Creative Director in a Sticker Generation Pipeline
A developer built a sticker app where Claude analyzes user-uploaded photos, generates nine sticker concepts, and writes detailed prompts for image models, resulting in personalized stickers rather than generic ones.

From Copy-Paste to Workspace Integration: A Developer's Experience with AI Coding Evolution
A developer describes the transition from early ChatGPT coding attempts with hallucinated libraries and context management issues to Claude Code's workspace integration that reads files directly, eliminating the need for manual context rebuilding.

Financial Analyst Uses Claude Code to Build DCF Model Without Coding Experience
A financial analyst with no terminal experience used Claude Code to build a discounted cash flow model in 20-25 minutes instead of 1-2 days. The tool read financial files and generated a fully structured Excel model with working formulas after the user typed /dcf [company name].