NexQuant: Rust-native 3-bit KV-cache engine for edge deployment

✍️ OpenClawRadar📅 Published: April 2, 2026🔗 Source

NexQuant is a Rust-native engine for running high-context models on consumer hardware that would normally struggle with memory constraints. It's positioned as a production-hardened successor to Tom Turney's TurboQuant+ research.

Key technical details

3-5x Memory Reduction: 14B models now fit in 4GB of VRAM or unified memory
MSE-Only Stability: Replaces noisy QJL paths with stable MSE-only trajectory (27/27 logic tests passed)
Integrated Sparse-V: Sparsity is integrated into the real-time decode loop rather than just being a benchmark feature
Zero-Alloc Prefill: Written in 100% Safe Rust for speed without C++ prototype segfault issues
Hardware Support: Native runtime dispatch for Metal, CUDA, and Vulkan, with CPU-AVX2/NEON backend support for older laptops and Raspberry Pi

Implementation specifics

The project uses Walsh-Hadamard Transforms and Rust GGUF parsing. It builds on Tom Turney's PolarQuant/TurboQuant+ breakthroughs that proved 3-bit KV-caches were mathematically possible. The development involved Claude (Anthropic) as a high-speed pair programmer.

The goal is to ensure that as models scale, the ability to run them remains local and decentralized. The team is specifically seeking feedback on Vulkan SPIR-V kernels.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Qwen 3.5 Chat Template Release with 21 Bug Fixes for Agent Workflows

A developer has released a fixed chat template for Qwen 3.5 models, addressing 21 bugs including tool calling crashes, parallel call separation, and agent loop stability. It's a drop-in replacement tested on llama.cpp, Open WebUI, vLLM, and other platforms.

Mar 17, 2026, 01:45 AM UTC

OpenClawRadar

Tools

OpenClaw CoreBrain Plugin: Persistent Memory for AI Coding Agents

A new plugin called CoreBrain addresses OpenClaw's memory issues by storing information outside the context window in a knowledge graph and auto-injecting it before every query, eliminating the need for tool calls and optional memory invocation.

Apr 17, 2026, 03:45 PM UTC

OpenClawRadar

Tools

memv MCP Server: Persistent Structured Memory for AI Agents

memv, an open-source Python memory layer for agents, now ships with an MCP server. It provides five tools for persistent, structured memory with per-user isolation and LLM-optional extraction.

May 18, 2026, 02:15 AM UTC

OpenClawRadar

Tools

Forge: A Claude-based IDE with automated verification and project DNA

Forge is a Claude-based IDE built on VS Code that automatically runs type checking, tests, coverage checks, and import validation before showing code. It includes self-healing loops for failed verification and builds a Project DNA of your codebase patterns.

Mar 17, 2026, 11:45 AM UTC

OpenClawRadar