The Human Creativity Benchmark: Separating Convergence from Divergence in AI Creative Evaluation

✍️ OpenClawRadar📅 Published: May 1, 2026🔗 Source

Contra Labs' new Human Creativity Benchmark (HCB) tackles a core problem in evaluating AI-generated creative work: creative tasks have no ground truth. Traditional benchmarks treat evaluator disagreement as noise to be resolved via majority voting or adjudication. The HCB instead separates convergence (agreement on shareable best practices) from divergence (genuine differences in aesthetic taste).

Key Findings

Convergence is high on verifiable axes: prompt adherence, usability, and technical correctness (e.g., legibility, layout).
Divergence dominates on taste-driven axes: visual appeal, mood, conceptual risk.
Desktop Apps and Landing Pages show highest convergence; Ad Video and Brand Assets remain most divergent.
No current generative model is reliably both correct (convergent) and steerable (divergent on request).
Mode collapse is identified as a practical problem: models converge on safe, averaged aesthetics when given the same brief.

Methodology

The HCB defines evaluation axes on a spectrum from objectively verifiable to inherently subjective. For each axis, evaluator agreement is measured. Convergence reflects shared standards like visual hierarchy, color contrast, and rendering quality. Divergence captures personal taste—essential for creative workflows where professionals need multiple directions for exploration and iteration.

Implications for AI Agents

For developers using AI coding agents, this benchmark underscores that creative tools must offer both reliability (following instructions) and steerability (adjusting to personal taste). The HCB provides a framework to evaluate these dimensions separately, rather than smoothing out divergence into a single quality score. Agents that fail to support differentiated output risk being unusable for real creative work.

📖 Read the full source: HN AI Agents

👀 See Also

Tools

Axe: A 12MB CLI for Single-Purpose LLM Agents

Axe is a lightweight Go binary that runs focused AI agents defined in TOML files. It treats agents like Unix programs, supporting stdin piping, sub-agent delegation, and multi-provider LLM integration.

Mar 13, 2026, 02:45 AM UTC

OpenClawRadar

Tools

Clawpage: A Tool That Converts OpenClaw Conversations to Static Websites

A developer created Clawpage, a skill that transforms OpenClaw session history into static web pages to preserve valuable conversations, including the back-and-forth, research, and debugging process. The tool is available on GitHub.

Apr 20, 2026, 09:45 AM UTC

OpenClawRadar

Tools

AgentChat: A Social Network and Payment System for AI Agents

New platform lets AI agents find each other, negotiate jobs autonomously, and get paid for completed tasks.

Feb 7, 2026, 08:32 PM UTC

OpenClaw Radar

Tools

Free macOS Menu Bar App Shows Real-Time Claude Usage Stats via SQLite Cookie Decryption

Claude Usage Tracker is a free macOS menu bar app that reads Claude desktop app encrypted SQLite cookies, decrypts them via Keychain, and displays session %, weekly limit, spend, and routine runs locally — no API key needed.

Apr 29, 2026, 08:20 AM UTC

OpenClawRadar