Local AI Agent Workflow Using OpenCode, FastMCP, and DeepSeek-r1

A developer on r/LocalLLaMA describes moving beyond using LLMs as "glorified autocomplete" by implementing a local agentic workflow with OpenCode, FastMCP, and the DeepSeek-r1 model.
AGENTS.md Standard for Deterministic Prompts
The developer uses an AGENTS.md file as a deterministic manual that injects strict rules into the AI's system prompt. Examples include "Use Python 3.9, format with Ruff, absolutely no global variables." This approach aims to eliminate hallucinations from the start.
Local Subagents with DeepSeek-r1
Instead of using cloud APIs like Claude or GPT-4o for trivial tasks, they set up Ollama with the free deepseek-r1 model. They created specific subagents, such as one for testing defined in a pytest.md file. Key configurations include:
- Temperature set to 0.1
- Tools restricted: "pytest": true and "bash": false
This allows the AI to autonomously run test suites, read tracebacks, and fix syntax errors while being blocked from potentially dangerous commands like rm -rf.
FastMCP for Standardized Local Function Exposure
FastMCP is described as "the 'USB-C' of AI"—similar to FastAPI but for AI agents. With about 5 lines of Python, you can spin up a local server to expose secure local functions (like querying a development database) in a standardized way that any OpenCode agent can consume.
A critical implementation tip: route all Python logs to stderr because the MCP protocol runs over stdio. Leaving a standard print() statement can corrupt the JSON-RPC packet and drop the connection.
The developer notes they recorded a video coding this entire architecture from scratch and setting up the local environment in about 15 minutes.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Memtrace: Persistent, Time-Aware Codebase Memory for Claude Code Agents
Memtrace provides always-fresh snapshots and bi-temporal replay for Claude Code agents, using Tree-sitter AST parsing and hybrid retrieval (BM25 + Jina-code embeddings) with zero LLM inference cost during indexing.

Logic Virtual Machine: A Prompt-Based System to Halt LLM Reasoning Collapses
A researcher has developed a Logic Virtual Machine (LVM) prompt that forces LLMs to halt and report specific collapse modes when they encounter paradoxes or reasoning drift, based on a single stability law: K(σ) ⇒ K(β(σ)). The prompt is substrate-independent and works on models like Grok and Claude.

cc-soul plugin adds persistent memory and adaptive personas to OpenClaw
The cc-soul plugin for OpenClaw provides permanent memory storage across sessions, 10 auto-switching personas, and learning from corrections. Installation requires one command with zero configuration.

Using an MCP Server to Optimize React Native Apps with Claude Code
An MCP server streams live runtime data from a React Native app into Claude Code, identifying performance issues like Zustand store thrashing and unnecessary re-renders.