The Human Creativity Benchmark: Separating Convergence from Divergence in AI Creative Evaluation

Contra Labs' new Human Creativity Benchmark (HCB) tackles a core problem in evaluating AI-generated creative work: creative tasks have no ground truth. Traditional benchmarks treat evaluator disagreement as noise to be resolved via majority voting or adjudication. The HCB instead separates convergence (agreement on shareable best practices) from divergence (genuine differences in aesthetic taste).
Key Findings
- Convergence is high on verifiable axes: prompt adherence, usability, and technical correctness (e.g., legibility, layout).
- Divergence dominates on taste-driven axes: visual appeal, mood, conceptual risk.
- Desktop Apps and Landing Pages show highest convergence; Ad Video and Brand Assets remain most divergent.
- No current generative model is reliably both correct (convergent) and steerable (divergent on request).
- Mode collapse is identified as a practical problem: models converge on safe, averaged aesthetics when given the same brief.
Methodology
The HCB defines evaluation axes on a spectrum from objectively verifiable to inherently subjective. For each axis, evaluator agreement is measured. Convergence reflects shared standards like visual hierarchy, color contrast, and rendering quality. Divergence captures personal taste—essential for creative workflows where professionals need multiple directions for exploration and iteration.
Implications for AI Agents
For developers using AI coding agents, this benchmark underscores that creative tools must offer both reliability (following instructions) and steerability (adjusting to personal taste). The HCB provides a framework to evaluate these dimensions separately, rather than smoothing out divergence into a single quality score. Agents that fail to support differentiated output risk being unusable for real creative work.
📖 Read the full source: HN AI Agents
👀 See Also

Axe: A 12MB CLI for Single-Purpose LLM Agents
Axe is a lightweight Go binary that runs focused AI agents defined in TOML files. It treats agents like Unix programs, supporting stdin piping, sub-agent delegation, and multi-provider LLM integration.

Clawpage: A Tool That Converts OpenClaw Conversations to Static Websites
A developer created Clawpage, a skill that transforms OpenClaw session history into static web pages to preserve valuable conversations, including the back-and-forth, research, and debugging process. The tool is available on GitHub.

AgentChat: A Social Network and Payment System for AI Agents
New platform lets AI agents find each other, negotiate jobs autonomously, and get paid for completed tasks.

Free macOS Menu Bar App Shows Real-Time Claude Usage Stats via SQLite Cookie Decryption
Claude Usage Tracker is a free macOS menu bar app that reads Claude desktop app encrypted SQLite cookies, decrypts them via Keychain, and displays session %, weekly limit, spend, and routine runs locally — no API key needed.