Replicating Anthropic's Generator-Evaluator Harness with Kiro CLI: A 12-Iteration Website Build

A developer replicated Anthropic's Generator-Evaluator harness design for long-running apps, inspired by GANs. The architecture: a Planner (runs once) then Generator ↔ Evaluator loop for 12 iterations. Each agent is a separate CLI process with zero shared context, communicating only through files (spec.md, eval-report.md). The Evaluator uses Playwright to browse the live site—not just read code.
Key Architecture Details
- Clean slate per invocation: Each agent starts fresh, reads only its input files. Prevents context anxiety.
- Playwright MCP for testing: Navigates, clicks, resizes viewports. Catches visual bugs code review never would.
- Anthropic's frontend design skill: Explicitly penalizes generic AI patterns (Inter font, purple gradients, card layouts). Forces creative risk-taking.
- Continuous iteration, not retry-on-failure: All 12 rounds run regardless. Each one improves.
Results & Stats
Iteration 1: functional but forgettable. Iteration 4: Generator pivoted to "Terminal Noir"—IBM Plex Mono, amber on black, grain textures, scanlines. Iterations 5-12: polish, accessibility, responsive fixes, reduced-motion support.
- Total time: 3h 20min
- Iterations: 12 (generator + evaluator each)
- Manual code written: 0 lines (a few visual issues fixed after)
- Tech: Next.js, Tailwind, Framer Motion, TypeScript
Live Result
https://mnemo-mcp.github.io/Mnemo/
Key Takeaway
The model is the engine. The harness—constraints, feedback loops, and adversarial structure—determines whether you get AI slop or something genuinely distinctive.
📖 Read the full source: r/ClaudeAI
👀 See Also

Using Claude to Automate App Store Connect Metadata Updates for 33 Languages
An indie iOS dev used Claude (via chat) to generate a Python script that authenticates with App Store Connect API, translates metadata into 33 languages, and pushes localized 'What's New' copy — replacing hours of manual work per update.

Developer Reports Rapid Prototyping with Claude AI in Three Evenings
A developer used Claude AI to create a project in three partial evenings that would have required a full dev team several weeks, producing a working first prototype in less than an hour and adding multiple features rapidly.

Integrating OpenClaw with Obsidian for a Private AI Knowledge Base
A developer shares their setup using an isolated Obsidian vault for OpenClaw, synced via SyncThing to maintain privacy while using AI agents. They've implemented task management through OpenClaw with automated research and metadata augmentation.

Trading Algorithm Rebuild: From Win Rate to Est. PoP and Smart Pre-Filtering
A developer rebuilt their stock trading scanner to replace misleading 'Win Rate' calculations with accurate 'Est. PoP' (Estimated Probability of Profit) using N(d2) at breakeven prices, added market-metrics pre-filtering that reduced API calls by 85%, and implemented a three-outcome expected value model.