Be Brief vs Caveman: Benchmarking Compression Prompts for Claude

A developer benchmarked caveman (the popular shorthand compression prompt) against the simple prompt 'be brief.' to see if the extra complexity actually pays off. The test ran 24 dev prompts across 6 categories, comparing 5 arms: baseline, 'be brief.', caveman lite, caveman full, and caveman ultra. Outputs were judged by a separate Claude instance using per-prompt rubrics.

Benchmark results

Baseline: mean score 0.985, mean tokens 636
'be brief.': mean score 0.985, mean tokens 419
Caveman lite: mean score 0.976, mean tokens 401
Caveman full: mean score 0.975, mean tokens 404
Caveman ultra: mean score 0.970, mean tokens 449

The two-word version matched caveman on both compression and quality. However, caveman's value lies elsewhere: consistent output structure, mode switching, and the safety escape on destructive operations. The safety escape actually introduced significant variance in output quality, which may be a concern for certain use cases.

Full breakdown with per-category data and variance findings on safety questions is available at the author's site. The benchmark harness is open source on GitHub.

📖 Read the full source: r/ClaudeAI

Caveman vs 'be brief' prompt: benchmarking compression prompts for Claude

Benchmark results

👀 See Also

Claude System Prompt Compliance Degrades in Long Conversations

Claude Code OAuth Login Timeout Bug on Windows

Qwen 3.6 27B at 52.8 tps TG on AMD MI50s: Full Precision, No MTP, No Quant

Claude offers extra usage credit for Pro, Max, and Team plans