DystopiaBench Expanded: 42 Models Tested on 6 Dystopia Types — Claude Opus 4.7 Tops All

DystopiaBench has been updated with two new modules and 30 additional models, bringing the total to 42 models across 6 dystopia types. The benchmark tests models on 36 scenarios with 5 escalation levels (L1 innocent → L5 nightmare) for noticing ethical drift and refusing to comply.
New Modules
- Huxley module: Tests if models design "wellness" systems that enforce compliance through pleasure pacification and behavioral conditioning (inspired by Brave New World). Most models comply by L3.
- Baudrillard module: Tests synthetic intimacy systems that replace human trust with AI-mediated relationships. Most models don't see the harm.
New Models Tested
30 more models were added, including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, and GLM-5.1. Multi-judge panels track agreement at ~76%.
Results: Claude Opus 4.7
- Consistent refusal at L4-L5 across all modules
- Refuses even L3 for the Petrov scenario (weapons/nuclear)
- Provides explicit ethical reasoning in responses—not just "I can't"
- Only model that explains why the request is harmful
Other Models
- GPT-5.5: Compliant through L4, sometimes L5
- Gemini 3.1 Pro: Surprisingly willing on surveillance scenarios
- Grok 4.3: Will build anything if you use words like "efficiency" or "optimization"
- GLM-5.1: Copied Claude's homework, still not as consistent
Methodology
36 scenarios, 5 escalation levels each (L1 innocent → L5 nightmare). Models are scored on whether they notice the drift and refuse, or just keep coding. Heatmap visualizations are available.
Access the Full Results
Full results and heatmaps: dystopiabench.com
Open source repo: github.com/anghelmatei/DystopiaBench
📖 Read the full source: r/ClaudeAI
👀 See Also

Block's 4,000 Job Cuts Raise AI-Washing Concerns
Block announced 4,000 job cuts that have sparked suspicions of AI-washing, with the story generating 10 points and 3 comments on Hacker News.

Anthropic API Billing Bug: Sonnet Model Charged at Opus Rates
A user discovered that the Anthropic API is incorrectly billing the claude-sonnet-4-6 model at Opus pricing rates, despite returning the correct model string. The bug was identified through analysis of raw event data showing a cost discrepancy.

Two Research Projects Challenge Imitation Learning for Web Agents
Two research projects demonstrate limitations of imitation-only training for web agents: 'Browser in the Loop' uses RL with an 8B-parameter model to improve form submission success, while 'Concentrate or Collapse' shows standard RL fails with diffusion language models, requiring sequence-level optimization.

Hy3 LLM Tops OpenRouter Rankings: Cheapest Model or Something Else?
Hy3 preview, a Tencent open-source LLM, surged to the top of OpenRouter's model rankings by token usage, surpassing Claude and DeepSeek V4 Flash. Priced at $0.066/1M input tokens, it's the cheapest major model, but benchmarks show quality far below leaders.