Two Months with GitHub's Spec-Kit and Claude Code: What Works, What Doesn't

✍️ OpenClawRadar📅 Published: May 15, 2026🔗 Source
Two Months with GitHub's Spec-Kit and Claude Code: What Works, What Doesn't
Ad

After two months of using GitHub's spec-kit for Spec-Driven Development (SDD) with Claude Code as the primary agent, a developer on r/LocalLLaMA reports on what works and what doesn't. The toolkit, available at github.com/github/spec-kit, enforces a five-phase workflow: Constitution, Specify, Plan, Tasks, Implement. The core idea: the spec, not the prompt, is the source of truth.

What's Actually Good

  • Agent-agnostic: Same spec works with Claude Code, Cursor, Codex, Gemini CLI, Copilot. The author generated code with Claude Code, then handed the spec to Cursor for test refactoring seamlessly.
  • Hard checkpoints between phases: The Plan phase shows the full proposed architecture before any code is written, catching bad decisions at a 5-minute fix cost instead of 5 hours.
  • Constitution file as quality gate: You define inviolable rules up front — test coverage minimums, dependency allowlists, perf budgets, typing strictness. The agent fails its own validation if it tries to violate them.
  • Improved determinism: Re-running the implement phase produces more consistent output than raw prompting, since the agent isn't filling in 30 implicit decisions.
Ad

What Annoys

  • Drift is real: Manual code edits without updating the spec cause fast desync. spec-kit has tooling but it's early.
  • Overhead for small changes: Bug fixes <50 LOC or trivial features feel ceremonial. The author's rule: only full SDD for new modules or features touching 200+ LOC.
  • Legacy migration painful: Retrofitting SDD onto a 30k-LOC codebase takes months.
  • Quality depends on agent: Claude Code (Sonnet/Opus 4.6+) handles it well; smaller models generate plans that compile but lack architectural reasoning.

Practical Setup

  • Install: uv tool install --from git+https://github.com/github/spec-kit.git specify-cli. Only the official repo is safe — PyPI has typosquatters.
  • Primary agent: Claude Code, with cross-validation on Cursor and Gemini CLI.
  • Local persistence: SQLite (easy to spec/validate, no cloud dependency).
  • Reusable constitution template: strict typing, pytest coverage >80%, explicit dependency allowlist, no cloud services unless required.

Open Questions

  • Can local models (Qwen, DeepSeek-Coder, GLM, Llama) handle Plan and Implement competently? The author found small models follow format but architectural reasoning fails.
  • Does multi-agent SDD work? Spec by one model, implement by another, audit by a third — theoretically better, but not measurably better than single-agent in practice.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Blip MCP Server: Draw UI Changes for Claude Code Instead of Describing Them
Tools

Blip MCP Server: Draw UI Changes for Claude Code Instead of Describing Them

Blip is an MCP server for Claude Code that replaces verbal UI change descriptions with visual annotations. You draw directly on your running application, and Claude writes the corresponding code based on the annotated screenshot.

OpenClawRadar
MCP Server for Semantic Search in Obsidian Vaults
Tools

MCP Server for Semantic Search in Obsidian Vaults

A developer built an MCP server that indexes Obsidian vaults into Qdrant with local embeddings, enabling semantic search instead of keyword matching. It chunks markdown by headings, uses BAAI/bge-small-en-v1.5 embeddings, and works with Claude Code, Cursor, Windsurf, or any MCP client.

OpenClawRadar
🦀
Tools

Survey of Local-First Markdown Memory Servers for AI Agents: Mem0, Hindsight, Zep, and the Newcomer Engram

A user tested ~20 local agent memory systems for storing memories as editable files. Engram (by Obsidian68) was the only one that met all requirements: fully local, Markdown storage, smart dedup, importance decay, and standalone server.

OpenClawRadar
Claude Agent Teams UI: Desktop App for Visualizing Claude Code Agent Workflows
Tools

Claude Agent Teams UI: Desktop App for Visualizing Claude Code Agent Workflows

A developer built a free, open-source desktop app that adds a visual layer to Claude Code's experimental Agent Teams feature. The app provides a real-time kanban board where tasks move automatically as agents work, plus cross-team communication, built-in review workflows, and per-task code review.

OpenClawRadar