Bonsai 1.7B Ternary Model Hits 442 T/s on M4 Max with Autonomously Tuned Metal Kernels

✍️ OpenClawRadar📅 Published: May 4, 2026🔗 Source
Bonsai 1.7B Ternary Model Hits 442 T/s on M4 Max with Autonomously Tuned Metal Kernels
Ad

Bonsai 1.7B — a ternary model from PrismML — has been optimized for Apple Silicon using autonomously tuned Metal kernels. The work was performed by ata, an autonomous engineering agent from Agents2Agents, which ran an agentic evolution search for 6 hours to produce custom GPU kernels.

Benchmark Results

Measured against the upstream llama.cpp at the same Bonsai/Q2_0 commit on an M4 Max (same model file, same llama-bench -p 512 -n 128 -r 10 -fa 1 -ngl 99 config):

  • Decode (tg128): 311.66 → 442.42 t/s (+42.0%)
  • Prefill (pp512): 4250.32 → 4622.63 t/s (+8.8%)

For context, the Bonsai 8B whitepaper reports MLX-upstream Q2_0 decode at 235 t/s on Apple Silicon. This build achieves 442 t/s on the 1.7B variant via custom Metal kernels (different framework, smaller model — directionally indicative of headroom in the stack).

Ad

What's Included

The build is a drop-in optimized inference package for M-series Macs (arm64 only). Inside the 358 MB tar.xz:

  • chat.sh — interactive REPL
  • complete.sh — non-interactive completion
  • bench.sh — reproduce the benchmarks
  • server.sh — OpenAI-compatible HTTP API on :8080
  • Bonsai-1.7B-Q2_0.gguf — the model file (442 MB)

Quick Start

tar -xJf bonsai-1.7b-ternary-M4Max.tar.xz
cd bonsai-1.7b-ternary-M4Max
./chat.sh

Technical Details

Every Metal kernel was authored and tuned by ata without human intervention. The work focused on custom GPU kernels at the matvec / FFN / KV-cache layer, shape-specialized for the Bonsai 1.7B Q2_0 decode path. Numerical output matches the reference build (verified top-1 token match). Tested on M4 Max; proportional gains expected on M1+.

Caveats

  • Apple Silicon only (arm64) — no Intel Mac or CPU-only builds.
  • Numbers from M4 Max; M1/M2/M3 will be lower due to less memory bandwidth.
  • Model is Q2_0 quantized — small accuracy delta vs F16.

📖 Read the full source: HN AI Agents

Ad

👀 See Also