FairyFuse Achieves 29.6x Kernel Speedup on CPUs via Ternary Weight Multiplication-Free Inference
FairyFuse is an inference system for ternary (values in {-1,0,+1}) LLMs on commodity CPUs. By fusing the eight real-valued sub-GEMVs of each widely-linear layer into a single AVX-512 loop using masked additions and subtractions, it eliminates all floating-point multiplications. Roofline analysis shows that 16x weight compression shifts memory-bound GEMV toward the compute regime on bandwidth-limited CPUs, yielding a 29.6x kernel speedup over conventional dequantize-and-multiply kernels. Notably, the approach offers little benefit on GPUs.
Key Results
- End-to-end throughput: 32.4 tokens per second on a single Intel Xeon 8558P.
- Comparison to llama.cpp Q4_K_M: 1.24x faster with near-lossless quality (WikiText-2 perplexity 5.52 vs. 5.47 for FP16; downstream accuracy 66.0% vs. 66.0% FP16).
- Weight compression: 16x (2 bits per weight) due to ternary representation — no dequantization to FP needed.
- Technique: Fuses eight sub-GEMVs into a single AVX-512 loop using masked adds/subtracts — no floating-point multiplications at all.
Context
Prior work (Fairy2i) showed that ternary LLMs can match FP16 quality, but runtime didn't exploit the structure. FairyFuse bridges that gap by rearchitecting inference to be multiplication-free on x86 CPUs with AVX-512.
📖 Read the full source: HN LLM Tools
👀 See Also

Anthropic Removes Gmail Message Body Access from Claude Connector
Anthropic has removed the gmail_read_message and gmail_search_messages tools from the Gmail connector, replacing them with get_thread and search_threads that no longer return message bodies or attachment content.

‘AI Washing’: UK Firms Rebrand as AI Companies Despite Weak Links
PR executives report UK companies forcing them to pitch ordinary automation as AI, with 50% of AI-related press releases sent under duress. Examples include an AllBirds pivot to acquiring AI GPUs and a property firm calling a handheld scanner an AI tool.

OpenClaw 0.9 CLI Removal Causes Agent Disruption
A user reported that attempting to update OpenClaw via an AI agent resulted in the CLI being removed, breaking gateway commands and Telegram chat functionality. OpenClaw 0.9 dropped the CLI entirely, removing commands like 'openclaw gateway start' and 'openclaw status'.

SCOTUS Declines to Hear AI Copyright Case, Leaving Lower Court Ruling Intact
The U.S. Supreme Court declined to hear a dispute over copyrights for AI-generated material, leaving a lower court ruling that denied copyright protection for works created without human authorship in place.