Microsoft BitNet: 1-bit LLM inference framework for CPU and GPU

What BitNet is
BitNet is Microsoft's official inference framework for 1-bit LLMs (like BitNet b1.58). It provides optimized kernels for fast, lossless inference on CPU and GPU, with NPU support planned. The framework is built on llama.cpp and uses Lookup Table methodologies from T-MAC.
Performance benchmarks
On ARM CPUs: 1.37x to 5.07x speedups with 55.4% to 70.0% energy reduction. On x86 CPUs: 2.37x to 6.17x speedups with 71.9% to 82.2% energy reduction. The latest optimization adds parallel kernel implementations with configurable tiling and embedding quantization support, achieving 1.15x to 2.1x additional speedup over the original implementation.
BitNet can run a 100B BitNet b1.58 model on a single CPU at speeds comparable to human reading (5-7 tokens per second).
Supported models
- BitNet-b1.58-2B-4T (2.4B parameters) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- bitnet_b1_58-large (0.7B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- bitnet_b1_58-3B (3.3B) - x86: ❌ I2_S, ❌ TL1, ✅ TL2 | ARM: ❌ I2_S, ✅ TL1, ❌ TL2
- Llama3-8B-1.58-100B-tokens (8.0B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- Falcon3 Family (1B-10B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
- Falcon-E Family (1B-3B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
Installation requirements
Python≥3.9, CMake≥3.22, Clang≥18. For Windows: Visual Studio 2022 with Desktop development with C++, C++-CMake Tools for Windows, Git for Windows, C++-Clang Compiler for Windows, and MS-Build Support for LLVM-Toolset (clang). For Debian/Ubuntu: Use the automatic installation script: bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
Build from source
Clone the repository: git clone --recursive https://github.com/microsoft/BitNet.git
Change directory: cd BitNet
Install dependencies: # (Recommended) Create a new conda
Windows users must use a Developer Command Prompt/PowerShell for VS2022 for build commands.
Recent updates
- 01/15/2026: BitNet CPU Inference Optimization
- 05/20/2025: BitNet Official GPU inference kernel
- 04/14/2025: BitNet Official 2B Parameter Model on Hugging Face
- 02/18/2025: Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
- 11/08/2024: BitNet a4.8: 4-bit Activations for 1-bit LLMs
- 10/21/2024: 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
- 10/17/2024: bitnet.cpp 1.0 released
📖 Read the full source: HN AI Agents
👀 See Also

Node Control: Real-Time Multiplayer .io Game Built Entirely with Claude 4.6 and 4.7
Developer built a live competitive multiplayer .io game, Node Control, using Claude 4.6 and 4.7. Features server-authoritative netcode at 60Hz, 4-region deployment on fly.io, and neural-network aesthetic.
xAI TTS Integration for Home Assistant Built with Claude — Full Repo
A developer used Claude to build a custom Home Assistant integration for xAI's TTS API (Eve voice) with full UI config, five voices, and speech tags.

OpenClaw Setup on Ubuntu UTM VM with LLM API and Ollama Access
A user successfully configured OpenClaw in a sandboxed Ubuntu VM on an M3 Mac, with access to both local Ollama on macOS and external LLM APIs like Gemini, Claude, and DeepSeek. Sample configuration files and troubleshooting notes are available on GitHub.

Running Two Claude Code Agents on the Same Repo with Git Worktrees
A Reddit user details how to run multiple Claude Code agents in parallel on the same codebase using git worktrees, avoiding file conflicts and enabling independent agent sessions.