Berkeley Study: All AI Revision Prompts Drift Prose Toward Formality, Even "Preserve Voice"

Tom van Nuenen at Berkeley ran 300 personal narratives through three frontier models (Claude-class, ChatGPT-class, Gemini-class) under three prompt conditions: generic "improve this," generic "rewrite this," and explicitly "revise this while preserving the original voice." He measured 13 stylometric markers in input and output: function words, contractions, first-person pronouns, vocabulary diversity, sentence length variance, punctuation patterns, and emotion words.
The result: every model in every condition drifted in the same direction. The output had fewer contractions, fewer first-person pronouns, greater vocabulary spread, longer words, and more elaborate punctuation. The shift moved prose from embedded narration toward distanced narration. The "preserve voice" prompt only reduced the magnitude of the drift, not its direction.
In plain language: every AI revision prompt makes prose more polite, more formal, more eager to please—even when the prompt says don't.
Implications for Tooling
The paper argues that voice instructions live at a layer the model's post-training distribution overrides within a paragraph or two. Anyone iterating on prompts, sample paste-ins, custom instructions, or character bibles for voiced output (writing, dialogue, marketing copy, persuasive essays) has been working on a problem with a structural ceiling.
It also offers the cleanest empirical explanation for the Claude 4.7 prose regression: 4.7's central voice is more deeply encoded than 4.6's, which is why it reads stylometric structure better (as seen in the Piper experiment) and resists deviation harder (the memo-voice complaints).
Constraint-Based Architecture
The author's recommendation: if you want voice preservation across long-form work, the architecture must live outside the prompt. Compiled style profiles should be applied as binding constraints on every generation — not as prompt parameters that can be overridden. A breakdown of why each major writing tool (Sudowrite, NovelCrafter, Claude/ChatGPT direct) hits the same ceiling, and what a constraint-based architecture looks like in practice, is available at the linked blog post below.
Paper: https://arxiv.org/abs/2604.22142
📖 Read the full source: r/ClaudeAI
👀 See Also

GPU Power Consumption Deviates from Token Predictor Theory in Small LLMs
An experiment testing the 'stochastic parrot' theory on four 8B-parameter models found GPU power consumption often scales non-linearly with token count, with divergence rates ranging from 7.7% to 36.7%. The study also revealed persistent residual heat after philosophical queries and order-dependent effects.

Anthropic changes subscription terms, OpenClaw users now billed separately for agent usage
Anthropic has narrowed Claude Max subscriptions to only cover first-party surfaces like Claude.ai and Claude Code, with all third-party agent usage now billed as 'Extra Usage' on a per-token basis. Users have four options: stay on Max and pay extra, switch to Anthropic API, switch providers, or use intelligent routing with Manifest.

Claude Code v2.1.37 Released
Anthropic releases a new version of Claude Code with improvements and bug fixes.

Hacker News AI Discussion Shifts from Demos to Tooling Focus
Recent Hacker News discussions about AI are moving from one-off demos to durable tools like price tracking, verification, memory, evaluation, and workflow integration. This signals a shift toward operationalization where communities stop rewarding novelty-first posts.