WCY format reduces LLM token overhead by 50-71% and adds structural 'I don't know' markers

WCY (Watch → Compute → Yield) is a line-oriented format designed to reduce LLM token overhead and provide structural markers for uncertainty in reasoning. It replaces JSON's brackets, quotes, and commas with one-marker-per-line syntax.
Token reduction benchmarks
From testing across 10-500 rows and MCP exchange types:
- Structured data vs JSON: -50 to -54% token reduction
- Tool-call schemas: -65 to -71% reduction
- Full MCP protocol exchange: -61% reduction
- Multi-agent output tokens: -40% reduction
No fine-tuning is needed—three few-shot examples are enough for models to switch formats. The parse_r metric goes from 0.29 to 1.00 on complex tasks with this approach.
The ? marker for uncertainty
WCY introduces a structural way for LLMs to mark what they don't know during reasoning. The ? (void-B) slot allows models to indicate uncertainty inline:
: ?diagnosis hint=labs+imaging conf_range=0.4..0.8
order CT_scan reason=from=3 . CT_result mass_in_RUL size=2.3cm : diagnosis=adenocarcinoma conf=0.82 from=3,5Testing showed:
- Zero-shot: models use ? markers 0% of the time, even with the spec in the prompt
- With 3 examples: 5.4 markers per trace, 67-97% resolved
- 48 pipeline traces across 8 domains: 95% resolution, 100% quality gate pass
The from= slot tracks which observations support which conclusions inline, which helps catch hallucination chains.
Available resources
- wcy_parser.py — pure Python, no external dependencies
- wcy_eval.py — 3-axis scoring (Structural / Meaning / Provenance)
- 60 reasoning traces with void-B cycles (CC BY 4.0 license, for fine-tuning experiments)
- Pipeline script to generate more traces
So far only tested on Claude Sonnet. The author is curious whether the 0% → 5.4 markers result holds on Qwen, Llama, and Mistral with the same few-shot examples.
📖 Read the full source: r/LocalLLaMA
👀 See Also

jsongrep: A DFA-Based JSON Query Tool That Outperforms jq in Benchmarks
jsongrep is a Rust-based command-line tool for querying JSON documents using a regular language syntax that compiles to deterministic finite automata (DFA), achieving faster search times than jq, jmespath, jsonpath-rust, and jql in benchmarks.

Caliber: Local CLI tool generates AI coding assistant configs from your repo
Caliber is a local-first CLI tool that scans repositories in languages like TypeScript, Python, Go, and Rust, then generates prompt and configuration files for AI coding assistants including Claude Code, Cursor, and Codex. It runs entirely on your machine with your own keys, has 13k npm installs, and is open source under MIT license.

Cloudflare Dynamic Worker Loader: Sandboxing AI Agents with Isolates
Cloudflare's Dynamic Worker Loader API, now in open beta, allows Workers to instantiate new Workers with runtime-specified code in isolated sandboxes using V8 isolates, offering 100x faster startup than containers and no global concurrency limits.

OpenClaw Budget Guard Plugin Prevents Concurrent Budget Overspend
A new OpenClaw plugin called @runcycles/openclaw-budget-guard solves concurrent budget overspend by implementing atomic balance checks, reservation before execution, and idempotent retries. It requires a Cycles server with Redis and can be installed via bash command.