1-Bit Bonsai Image 4B: On-Device Image Generation via Binary/Ternary FLUX.2

PrismML has released Bonsai Image 4B, a family of compact image-generation models derived from FLUX.2 Klein 4B using binary and ternary quantization. The diffusion transformer weights are represented as {−1, +1} (1-bit) or {−1, 0, +1} (ternary) with FP16 group-wise scaling factors, yielding 1.125 and 1.71 effective bits per weight respectively.
Key Specifications
- 1-bit Bonsai Image 4B: transformer footprint 0.93 GB (8.3× reduction from 7.75 GB FP16 FLUX.2 Klein 4B). Apple Silicon payload (including compressed text encoder + FP16 VAE) is 3.42 GB.
- Ternary Bonsai Image 4B: transformer footprint 1.21 GB (6.4× reduction). Apple Silicon payload 3.88 GB.
- Mean active memory for 512×512 generation: 1.5 GB (1-bit) / 1.96 GB (ternary) vs 11.74 GB for original FLUX.2 Klein 4B.
- For 1024×1024: 1.95 GB / 2.38 GB vs 14.39 GB.
Performance Benchmarks
The model runs on Apple Silicon (iPhones, iPads, Macs) via MLX low-bit paths, and on CUDA GPUs via Gemlite low-bit GEMM kernels. Generation times:
- iPhone 17 Pro Max: 9.4 seconds for 512×512 image
- Mac M4 Pro: ~6 seconds for 512×512 image (up to 5.6× faster than stock full-precision MFLUX pipeline)
The transformer reduction is achieved via binary/ternary layers (~14× / ~10× compression relative to FP16), while a small set of precision-sensitive projection layers (~5%) remain in FP16. The model is evaluated on GenEval, HPSv3, and DPG-Bench for quality and prompt fidelity.
Who It's For
Developers deploying image generation on-device (laptops, phones, edge devices) who need open weights and practical local inference without cloud dependency.
📖 Read the full source: HN LLM Tools
👀 See Also

Liquid AI releases LFM2.5-350M model for agentic loops
Liquid AI released LFM2.5-350M, a 350M parameter model trained for reliable data extraction and tool use. It's under 500MB when quantized and outperforms larger models like Qwen3.5-0.8B in most benchmarks while being faster and more memory efficient.

Claude Code Engineer Updates: AskUserQuestion Markdown, HTTP Hooks, New Skills
Claude Code Engineer released three updates: the AskUserQuestion tool now supports markdown snippets for diagrams and code examples, a new HTTP hook handler allows hooks to post to HTTP endpoints, and two new skills have been added.

Meta Releases BOxCrete AI Model for Concrete Mix Design
Meta has released Bayesian Optimization for Concrete (BOxCrete), an open-source AI model for designing sustainable concrete mixes using U.S.-produced materials. The model improves on previous versions with better noise robustness and slump prediction capabilities.

MCP vs Skills Debate: Understanding the Roles and the Real Problem of Context Rot
A Reddit post clarifies that MCP provides tools, authentication, and context steering for AI agents, while Skills are reusable prompts that define agent behavior. The author argues both are needed and identifies context rot as a critical issue where agents forget instructions.