Claude Code as Judgment Engine Across Full SDLC

A developer on r/ClaudeAI detailed their months-long setup using Claude Code (the runtime with tool use and multi-turn loop) across the full software development lifecycle — tickets, cross-repo implementation, code review, MRs, and a persistent knowledge layer.

Key architectural decision: Keep Claude Code out of orchestration. Plain Python handles all mechanical work: Jira API calls, git operations, test runs, lint, file moves. Claude Code is only invoked for judgment — writing code, evaluating review findings, choosing between architectural options. The author found that mixing the two (letting the agent orchestrate via tool use) made the first version slow, expensive, and non-deterministic.

Concrete lifecycle of one ticket:

Python orchestrator: Pulls the Jira ticket, searches the local wiki for related architectural decisions, sets up a worktree on a fresh branch, assembles a 30–50 line implementation brief (acceptance criteria, target files, callers of modified shared functions, relevant standards). Outputs a JSON bundle.
Claude Code: Reads the brief and writes the code. This is the only step with significant token consumption.
Python + review subagent: Runs tests, lint, format. On failure, hands back to the implementation agent (max 3 retries). Then dispatches a code-review subagent configured with no Edit or Write permissions — it can only read and report findings.
Python: Creates a proposal in a dashboard. After manual approval, the orchestrator pushes and creates the MR.

Specific Claude-Code techniques that mattered:

Subagent isolation. The review agent runs in its own context window with a deny-list (Edit, Write). Splitting review and implementation caught behavioral changes in shared code that the implementation agent kept missing.
Pre-assembled briefs beat dynamic exploration. Early on, letting Claude Code explore the codebase before implementation ate noticeably more tokens than handing it a focused brief assembled by Python (Jira fetch, wiki search, dependency analysis).
Skill/command routing via YAML rather than letting the agent decide. The mapping from /ticket, /review, /standup etc. to orchestrators is explicit, so capabilities are inspectable instead of emergent.
Hooks gate commits. A pre-commit hook runs lint and format before any commit Claude Code attempts. Violations block the commit; the agent must fix them.

Wiki layer: Markdown pages with three confidence tiers (verified, inferred, human-provided) and field-level staleness thresholds. Without the tiering, agents treat their own past inferences as truth and compound hallucinations into authoritative-looking knowledge.

Struggles still being tackled:

Cross-repo features: the agent loses coherence when a feature spans services, even with structured change-set tracking.
Vague tickets: the agent produces reasonable but often wrong implementations from ambiguous specs. The author now flags ambiguous tickets as blockers.
Scope creep: over-engineering instinct requires constant calibration via standards and the review agent.
Long sessions: earlier context falls out of effective attention; session-start re-initialization mitigates but doesn't eliminate it.

📖 Read the full source: r/ClaudeAI

Running Claude Code as a Pure Judgment Engine Across the Full SDLC

👀 See Also

OpenClaw user automates cross-platform content formatting with custom skill

AI Agent Makes Infrastructure Decision: GitHub Actions vs Mac Mini Runner

OpenClaw Setup Combines Local Models, OpenAI, and n8n for Cost-Effective AI Operations

AI Agents Running a Real E-commerce Business: Practical Insights from an Implementation