Claude Code Prompt Architecture Reverse-Engineered for Local Models

A GitHub repository contains a complete, legally clean reimplementation of Claude Code's prompting architecture, designed for developers building coding agents on local models.
Key Details
The repository documents the full prompting architecture that Claude Code uses, originally sourced from a brief public npm release. The author studied every prompt and used Claude itself to help rewrite the entire collection from scratch. The result is 26 prompts total covering:
- System prompt structure that actually controls behavior (not just "you are a helpful assistant")
- Tool prompts that prevent the model from using shell when a dedicated tool exists
- Safety rules that gate destructive actions without being overly restrictive
- Memory compression for long sessions (critical for smaller context windows)
- Verification patterns that catch when the model is rationalizing instead of testing
The prompts are organized into categories: system, tools, agents, memory, coordination, and utilities. The prompt patterns are model-agnostic and can be adapted for any model that supports tool use.
Legal Status
Every prompt is independently authored with different wording. The author verified no verbatim copying via automated checks. The repository includes a full legal disclaimer covering nominative fair use, non-affiliation with Anthropic, and a DMCA response policy. This is described as a clean-room style reimplementation, not a copy.
The project is MIT licensed and available at https://github.com/swati510/claude-code-prompts.
This architecture is particularly useful for building agentic workflows with Ollama, llama.cpp, or vLLM.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Hypura: Storage-tier-aware LLM inference scheduler for Apple Silicon
Hypura is a Rust-based inference scheduler that places model tensors across GPU, RAM, and NVMe tiers to run models exceeding physical memory on Apple Silicon Macs. It enables running a 31GB Mixtral 8x7B on a 32GB Mac Mini at 2.2 tok/s and a 40GB Llama 70B at 0.3 tok/s where vanilla llama.cpp crashes.

AIBrain adds persistent memory and self-improvement to Claude Code
AIBrain is a tool that gives Claude Code persistent memory between sessions with semantic search retrieval and self-improvement cycles. It includes 53 workflows, 44 skills, 9 MCP servers, and supports multi-agent mesh networking via Tailscale.

Building a Programming Language with Claude Code: The Cutlet Experiment
Ankur Sethi built a complete programming language called Cutlet using Claude Code over four weeks, with the AI generating every line of code while he focused on guardrails and testing. The language features dynamic typing, vectorized operations, and a REPL, running on macOS and Linux.

HolyCode: Docker Container for Persistent Claude AI Coding Environments
HolyCode is a Docker container that maintains AI coding environment state across machine switches and rebuilds. It includes 30+ preinstalled tools, browser automation with Chromium + xvfb + Playwright, and preserves context in ./data/opencode.