Be brief beats caveman plugin in Claude Code compression benchmark

✍️ OpenClawRadar📅 Published: April 30, 2026🔗 Source
Be brief beats caveman plugin in Claude Code compression benchmark
Ad

Max Taylor benchmarked the popular Claude Code compression plugin 'caveman' against a trivial baseline: prepending 'be brief.' to each prompt. The results are surprisingly flat — but reveal where the plugin actually delivers value.

Benchmark methodology

24 prompts across six categories (bug diagnosis, concept explanation, architecture tradeoffs, multi-step setup, security/destructive ops, error interpretation). Each prompt had a rubric with required key points, required terms, and forbidden claims. Five arms were tested: baseline (no instruction), 'be brief.', and caveman at three intensity levels (lite, full, ultra). All ran via claude -p on claude-opus-4-7. Responses were scored by claude-sonnet-4-6 against the rubric.

Quality results

All arms scored within 1.5% of each other:

  • Baseline: 0.985
  • Brief: 0.985
  • Lite: 0.976
  • Full: 0.975
  • Ultra: 0.970

Every arm hit 100% of key points. Zero forbidden claims triggered across 120 responses. Compression did not drop substantive content.

Token counts

ArmMean tokens
Baseline636
Brief419 (34% cut)
Lite401
Full404
Ultra449

'Be brief.' cut tokens by 34% vs baseline. Caveman lite and full landed close to brief. Ultra, the strictest mode, produced the longest answers of the three — but the category split tells a different story.

Ad

The category split reveals caveman's design

On bug diagnosis, concept explanations, architecture tradeoffs, and error interpretation, ultra is shortest or tied. Compression works as advertised. On multi-step setup and security warnings, all caveman modes show higher token counts. The reason: caveman's 'Auto-Clarity' rule explicitly disables compression for safety warnings, irreversible actions, and multi-step sequences. The safety escape engages, and compression stops — by design.

So what is caveman actually for?

If 'be brief.' matches on tokens and quality, the plugin's value is structural:

  • Consistent output shape — every response follows the same pattern, useful for downstream tooling or uniform session feel.
  • Intensity dial — slash commands to switch lite/full/ultra mid-session.
  • Persistence across long sessions — caveman re-injects its ruleset via SessionStart and UserPromptSubmit hooks to prevent drift (not tested in this single-shot benchmark).

The full dataset and harness are open source.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Super Claude browser extension tracks Claude AI usage velocity and limit predictions
Tools

Super Claude browser extension tracks Claude AI usage velocity and limit predictions

A developer built a browser extension called Super Claude that adds usage velocity indicators and time-to-100% predictions directly in the Claude UI, helping users monitor their 5-hour allocation consumption.

OpenClawRadar
Pilot Protocol: A P2P Network Stack for AI Agents Built with Claude
Tools

Pilot Protocol: A P2P Network Stack for AI Agents Built with Claude

A developer built Pilot Protocol, a pure user-space peer-to-peer virtual network stack in Go specifically for autonomous AI agents, enabling direct communication without centralized infrastructure. The protocol uses UDP multiplexing, NAT traversal, and end-to-end encryption, with benchmarks showing 89 MB/s local throughput and 2.1 MB/s cross-continent WAN throughput.

OpenClawRadar
Claude Sleuth: A 56-Task Investigation Workflow for Claude AI
Tools

Claude Sleuth: A 56-Task Investigation Workflow for Claude AI

Claude Sleuth is a structured investigation workflow for Claude AI with 6 phases and 56 tasks, featuring persistent state storage via Cloudflare D1 and standardized output conventions including ISO 8601 timestamps, POLE entity records, and ICD 203 probability language.

OpenClawRadar
Heartbeat-gateway: Event-driven replacement for cron polling in OpenClaw
Tools

Heartbeat-gateway: Event-driven replacement for cron polling in OpenClaw

Heartbeat-gateway is an open-source Python tool that replaces cron-based polling with webhook-driven events for OpenClaw, reducing API costs from ~$86/month to ~$4.50/month and improving latency from up to 30 minutes to under 2 seconds.

OpenClawRadar