Replicating Anthropic's Generator-Evaluator Harness with Kiro CLI: A 12-Iteration Website Build

✍️ OpenClawRadar📅 Published: May 17, 2026🔗 Source
Replicating Anthropic's Generator-Evaluator Harness with Kiro CLI: A 12-Iteration Website Build
Ad

A developer replicated Anthropic's Generator-Evaluator harness design for long-running apps, inspired by GANs. The architecture: a Planner (runs once) then Generator ↔ Evaluator loop for 12 iterations. Each agent is a separate CLI process with zero shared context, communicating only through files (spec.md, eval-report.md). The Evaluator uses Playwright to browse the live site—not just read code.

Key Architecture Details

  • Clean slate per invocation: Each agent starts fresh, reads only its input files. Prevents context anxiety.
  • Playwright MCP for testing: Navigates, clicks, resizes viewports. Catches visual bugs code review never would.
  • Anthropic's frontend design skill: Explicitly penalizes generic AI patterns (Inter font, purple gradients, card layouts). Forces creative risk-taking.
  • Continuous iteration, not retry-on-failure: All 12 rounds run regardless. Each one improves.
Ad

Results & Stats

Iteration 1: functional but forgettable. Iteration 4: Generator pivoted to "Terminal Noir"—IBM Plex Mono, amber on black, grain textures, scanlines. Iterations 5-12: polish, accessibility, responsive fixes, reduced-motion support.

  • Total time: 3h 20min
  • Iterations: 12 (generator + evaluator each)
  • Manual code written: 0 lines (a few visual issues fixed after)
  • Tech: Next.js, Tailwind, Framer Motion, TypeScript

Live Result

https://mnemo-mcp.github.io/Mnemo/

Key Takeaway

The model is the engine. The harness—constraints, feedback loops, and adversarial structure—determines whether you get AI slop or something genuinely distinctive.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also