Codebook Lossless LLM Compression: 10-25% RAM Reduction with Bitwise Packing

✍️ OpenClawRadar📅 Published: March 15, 2026🔗 Source

A developer has published proof-of-concept code for lossless LLM compression that reduces memory usage by 10-25% through bitwise generic packing of indexed weights. The technique trades some inference speed for smaller model size, making it possible to run larger models on hardware with limited VRAM.

How It Works

The developer started by asking how many unique values actually exist in LLM layers. Analysis revealed that while fp16 uses 16 bits, most models only utilize about 12-13 bits of unique values. By packing these values into blocks, the technique achieves compression without losing precision.

Performance Characteristics

RAM reduction: 10-25%+ across tested models
Speed impact: Inference speed approximately halved in example tests
Test hardware: NVIDIA P2200 (5GB) and CPU, with updates being developed for AMD MI50 (32GB)

Implementation Details

The developer worked on this project for several weeks using AI coding assistants including Claude, Qwen, and Gemini. The repository includes both lossless and lossy/balanced versions, though the lossy version hasn't been extensively tested yet.

The developer suggests this compression approach might serve as a way to measure a model's "compactness" - how efficiently it uses its parameter space.

Code Availability

The proof-of-concept code is available on GitHub: https://github.com/bigattichouse/Codebook-Quantization

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Automate daily briefings into personal Spotify podcasts with OpenClaw and the Save to Spotify CLI

OpenClaw runs daily at 7am, pulls Slack threads + GitHub notifications + calendar, summarizes into mp3, and uploads as a private episode via the Save to Spotify CLI. Works on Free and Premium.

May 9, 2026, 08:15 PM UTC

OpenClawRadar

Tools

Marketing Wisdom MCP: Free Semantic Search for Startup Insights

A free MCP server provides semantic search across 6,700 insights from 1,040 episodes of My First Million and Starter Story podcasts. It offers four tools for querying founder wisdom on growth, marketing, and business strategies.

Mar 21, 2026, 05:45 PM UTC

OpenClawRadar

Tools

Sylve: A FreeBSD Management Plane for Virtualization, Containers, and Storage

Sylve is a BSD-2 licensed management plane for FreeBSD that provides unified control over Bhyve VMs, FreeBSD Jails, ZFS storage, and networking. It uses a RAFT consensus model for clustering and includes Samba share management with ZFS snapshot automation.

Apr 15, 2026, 08:45 PM UTC

OpenClawRadar

Tools

Scaling Karpathy's Autoresearch with 16 GPUs: Results and Methods

The SkyPilot team gave Claude Code access to 16 GPUs on a Kubernetes cluster to run Karpathy's Autoresearch project. Over 8 hours, the agent submitted ~910 experiments, reduced validation bits per byte from 1.003 to 0.974 (2.87% improvement), and reached the best validation loss 9x faster than sequential execution.

Mar 19, 2026, 11:45 PM UTC

OpenClawRadar