CLAUDE.md: Drop-in file reduces Claude output tokens by 63%

What CLAUDE.md does
CLAUDE.md is a single file you drop into your project root. When Claude Code reads it, behavior changes immediately without code modifications. It specifically targets output behavior: sycophancy, verbosity, and formatting noise.
The problem it addresses
By default, Claude wastes tokens on behaviors that don't add value:
- Opens responses with "Sure!", "Great question!", "Absolutely!"
- Ends with "I hope this helps! Let me know if you need anything!"
- Uses em dashes (--), smart quotes, Unicode characters that break parsers
- Restates your question before answering
- Adds unsolicited suggestions beyond what you asked
- Over-engineers code with unnecessary abstractions
- Agrees with incorrect statements ("You're absolutely right!")
Benchmark results
Same 5 prompts tested without CLAUDE.md (baseline) and with CLAUDE.md (optimized):
- Explain async/await: 180 words → 65 words (64% reduction)
- Code review: 120 words → 30 words (75% reduction)
- What is a REST API: 110 words → 55 words (50% reduction)
- Hallucination correction: 55 words → 20 words (64% reduction)
- Total: 465 words → 170 words (63% reduction)
Approximately 384 output tokens saved per 4 prompts. Note: This is a directional indicator from 5 prompts, not a statistically controlled study.
When it helps vs. when it doesn't
Works best for:
- Automation pipelines with high output volume (resume bots, agent loops, code generation)
- Repeated structured tasks where Claude's default verbosity compounds across hundreds of calls
- Teams who need consistent, parseable output format across sessions
Not worth it for:
- Single short queries (file loads into context on every message, causing net token increase on low-output exchanges)
- Casual one-off use (overhead doesn't pay off at low volume)
- Fixing deep failure modes like hallucinated implementations or architectural drift
- Pipelines using multiple fresh sessions per task
- Parser reliability at scale (use structured outputs like JSON mode instead)
- Exploratory or architectural work where debate and alternatives are the point
Cost considerations
The CLAUDE.md file itself consumes input tokens on every message. Savings come from reduced output tokens. Net benefit is only positive when output volume is high enough to offset the persistent input cost. At low usage, it costs more than it saves.
Model support
Benchmarks were run on Claude only. The rules are model-agnostic and should work on any model that reads context, but results on local models like llama.cpp, Mistral, or others are untested.
📖 Read the full source: HN AI Agents
👀 See Also

Council: A Structured Dialogue Framework for Claude
Council — A Crucible is a structured dialogue framework that runs inside a single Claude context window, using persona framing to produce four distinct modes of engagement: rigorous interrogation, generative action, lived experience, and unformed intuition.

How to Move or Rename Claude Code Project Folders Without Losing Session History
Claude Code stores session history using absolute project paths, so moving or renaming folders with mv breaks session access. The clamp tool fixes this by migrating session data to match new paths.

OpenClaw Shared Memory Plugin: SQLite-Based Multi-Agent Coordination
A developer built a plugin for OpenClaw multi-agent setups that enables agents to share memory using SQLite, eliminating the need for external services. The plugin allows explicit memory sharing via a tool, automatic context extraction, access control, entity tracking, and contradiction detection.

Building a Sub-500ms Voice Agent: Architecture and Performance Insights
A developer built a voice agent from scratch achieving ~400ms end-to-end latency with full STT → LLM → TTS streaming. Key insights include treating voice as a turn-taking problem, using semantic end-of-turn detection, and colocating all components for minimal latency.