Moving from CLAUDE.md rules to infrastructure enforcement with Citadel

The problem with rule accumulation
When Claude ignored instructions, the instinct was to add more rules to CLAUDE.md. Starting at 45 lines, it grew to 190 lines over three months, but compliance worsened. Instructions past line 100 started being treated as suggestions rather than rules. A forensic audit revealed 40% redundancy—rules saying the same thing in different words, rules contradicting each other, and outdated rules. Trimming to 123 lines improved compliance immediately.
The infrastructure shift
The real fix was recognizing CLAUDE.md as an intake point for orientation (project conventions, tech stack, key priorities), not a permanent home for all rules. Everything else should be loaded only when needed. The key shift: moving enforcement from instructions to the environment.
For example, instead of a rule saying "always run typecheck after editing a file," which Claude followed inconsistently, a lifecycle hook script runs automatically on every file save. This ensures typechecking happens without agent choice, surfacing errors immediately rather than 20 edits later. This cut review time dramatically, allowing focus on intent and design rather than chasing type errors.
The progression system
The author outlines a five-level progression:
- Level 1: Raw prompting (nothing persists, same mistakes repeat)
- Level 2: CLAUDE.md (rules help but hit a ceiling around 100 lines)
- Level 3: Skills (modular expertise that loads on demand, zero tokens when inactive)
- Level 4: Hooks (environment enforces quality, not instructions)
- Level 5: Orchestration (parallel agents, persistent campaigns, coordinated waves)
Most projects are fine at Level 2 or 3. The critical insight: when CLAUDE.md stops working, the answer isn't more rules—it's moving enforcement into infrastructure.
Specific implementations
The author implemented three key systems:
- Skills: Markdown files encoding patterns, constraints, and examples for specific domains. The agent loads relevant skills for the current task, avoiding token waste on irrelevant context.
- Campaign files: Structured documents tracking what was built, decisions made, and what remains. These persist across sessions, eliminating daily re-explanations.
- Automated hooks: Typecheck on every edit, anti-pattern scanning on session end, circuit breaker killing the agent after 3 repeated failures on the same issue, and compaction protection saving state before Claude compresses context.
Citadel: The open-source system
The full system, called Citadel, has been open-sourced at https://github.com/SethGammon/Citadel. It includes the skill system, hooks, campaign persistence, and a /do command that routes tasks to the right orchestration level automatically. Built from 27 documented failures across 198 agents on a 668K-line codebase, every rule traces to something that broke.
📖 Read the full source: r/ClaudeAI
👀 See Also

Local AI Agent Achieves Sub-Second STT and TTS Latency with Open-Source Servers
A developer achieved ~0.2s STT latency using Whisper large-v3-turbo with hybrid thread-managed GPU architecture and ~250ms TTS latency with Coqui-TTS optimized for low-latency synthesis. Both implementations are fully self-hosted and open-sourced.

Claude Code Ultracode Mode Spawns 70-Agent Pipeline for Deep Search
A single 'deep search' request in Claude Code's ultracode mode auto-generated a 4-phase pipeline with ~70 agents, each fetching and cross-checking projects independently. The orchestrator script keeps intermediate results out of the context window, preventing context overload.

Flash-MoE: Running 397B Parameter Qwen Model on MacBook Pro with Pure C/Metal
Flash-MoE is a pure C/Metal inference engine that runs Qwen3.5-397B-A17B, a 397 billion parameter Mixture-of-Experts model, on a MacBook Pro with 48GB RAM at 4.4+ tokens/second. The 209GB model streams from SSD through custom Metal compute shaders with no Python or frameworks.

Windows System Tray App for Real-Time Claude API Usage Monitoring
A developer built a lightweight Windows tray application that displays Claude API quota usage in real time, including 5-hour and 7-day windows, today's token counts, and depletion forecasts. The app supports Korean, English, Chinese, and Japanese UI and is open source on GitHub.