Microsoft BitNet: 1-bit LLM inference framework for CPU and GPU

✍️ OpenClawRadar📅 Published: March 11, 2026🔗 Source
Microsoft BitNet: 1-bit LLM inference framework for CPU and GPU
Ad

What BitNet is

BitNet is Microsoft's official inference framework for 1-bit LLMs (like BitNet b1.58). It provides optimized kernels for fast, lossless inference on CPU and GPU, with NPU support planned. The framework is built on llama.cpp and uses Lookup Table methodologies from T-MAC.

Performance benchmarks

On ARM CPUs: 1.37x to 5.07x speedups with 55.4% to 70.0% energy reduction. On x86 CPUs: 2.37x to 6.17x speedups with 71.9% to 82.2% energy reduction. The latest optimization adds parallel kernel implementations with configurable tiling and embedding quantization support, achieving 1.15x to 2.1x additional speedup over the original implementation.

BitNet can run a 100B BitNet b1.58 model on a single CPU at speeds comparable to human reading (5-7 tokens per second).

Supported models

  • BitNet-b1.58-2B-4T (2.4B parameters) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
  • bitnet_b1_58-large (0.7B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
  • bitnet_b1_58-3B (3.3B) - x86: ❌ I2_S, ❌ TL1, ✅ TL2 | ARM: ❌ I2_S, ✅ TL1, ❌ TL2
  • Llama3-8B-1.58-100B-tokens (8.0B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
  • Falcon3 Family (1B-10B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
  • Falcon-E Family (1B-3B) - x86: ✅ I2_S, ❌ TL1, ✅ TL2 | ARM: ✅ I2_S, ✅ TL1, ❌ TL2
Ad

Installation requirements

Python≥3.9, CMake≥3.22, Clang≥18. For Windows: Visual Studio 2022 with Desktop development with C++, C++-CMake Tools for Windows, Git for Windows, C++-Clang Compiler for Windows, and MS-Build Support for LLVM-Toolset (clang). For Debian/Ubuntu: Use the automatic installation script: bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"

Build from source

Clone the repository: git clone --recursive https://github.com/microsoft/BitNet.git

Change directory: cd BitNet

Install dependencies: # (Recommended) Create a new conda

Windows users must use a Developer Command Prompt/PowerShell for VS2022 for build commands.

Recent updates

  • 01/15/2026: BitNet CPU Inference Optimization
  • 05/20/2025: BitNet Official GPU inference kernel
  • 04/14/2025: BitNet Official 2B Parameter Model on Hugging Face
  • 02/18/2025: Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
  • 11/08/2024: BitNet a4.8: 4-bit Activations for 1-bit LLMs
  • 10/21/2024: 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
  • 10/17/2024: bitnet.cpp 1.0 released

📖 Read the full source: HN AI Agents

Ad

👀 See Also