WCY Format Cuts LLM Token Overhead 50-71%

WCY (Watch → Compute → Yield) is a line-oriented format designed to reduce LLM token overhead and provide structural markers for uncertainty in reasoning. It replaces JSON's brackets, quotes, and commas with one-marker-per-line syntax.

Token reduction benchmarks

From testing across 10-500 rows and MCP exchange types:

Structured data vs JSON: -50 to -54% token reduction
Tool-call schemas: -65 to -71% reduction
Full MCP protocol exchange: -61% reduction
Multi-agent output tokens: -40% reduction

No fine-tuning is needed—three few-shot examples are enough for models to switch formats. The parse_r metric goes from 0.29 to 1.00 on complex tasks with this approach.

The ? marker for uncertainty

WCY introduces a structural way for LLMs to mark what they don't know during reasoning. The ? (void-B) slot allows models to indicate uncertainty inline:

: ?diagnosis hint=labs+imaging conf_range=0.4..0.8
    order CT_scan reason=from=3 . CT_result mass_in_RUL size=2.3cm : diagnosis=adenocarcinoma conf=0.82 from=3,5

Testing showed:

Zero-shot: models use ? markers 0% of the time, even with the spec in the prompt
With 3 examples: 5.4 markers per trace, 67-97% resolved
48 pipeline traces across 8 domains: 95% resolution, 100% quality gate pass

The from= slot tracks which observations support which conclusions inline, which helps catch hallucination chains.

Available resources

wcy_parser.py — pure Python, no external dependencies
wcy_eval.py — 3-axis scoring (Structural / Meaning / Provenance)
60 reasoning traces with void-B cycles (CC BY 4.0 license, for fine-tuning experiments)
Pipeline script to generate more traces

So far only tested on Claude Sonnet. The author is curious whether the 0% → 5.4 markers result holds on Qwen, Llama, and Mistral with the same few-shot examples.

📖 Read the full source: r/LocalLLaMA

WCY format reduces LLM token overhead by 50-71% and adds structural 'I don't know' markers

Token reduction benchmarks

The ? marker for uncertainty

Available resources

👀 See Also

jsongrep: A DFA-Based JSON Query Tool That Outperforms jq in Benchmarks

Caliber: Local CLI tool generates AI coding assistant configs from your repo

Cloudflare Dynamic Worker Loader: Sandboxing AI Agents with Isolates

OpenClaw Budget Guard Plugin Prevents Concurrent Budget Overspend