Qwen 3.6 27B with MTP on V100 32GB: 54 t/s via llama.cpp Branch

A user on r/LocalLLaMA reports impressive results running Qwen 3.6 27B with Multi-Token Prediction (MTP) on a V100 32GB SXM module using a PCIe adapter. The setup uses am17an's MTP branch of llama.cpp and the corresponding MTP GGUF quant. Key specs: Q8_0 KV cache with 200k cache limit, running as a VS Code Copilot backend via llama-server.
Performance Numbers
- Without MTP: 29-30 tokens/second
- With MTP: 54-55 tokens/second (at 150W power limit)
- After 50k tokens context: drops to 40-45 t/s
Branch: am17an's MTP fork. Build and run were straightforward — 'pulled and built in one shot' with llama-server running without issues. The setup handles tool calls and sub-agents well, and delivered 'very insightful code reviews and refactors' despite the VRAM limitation (32GB).
This is particularly relevant for developers running LLMs on older datacenter hardware like V100s. MTP effectively doubles throughput for this model, demonstrating practical gains for coding assistant workloads.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OmniCoder-9B fine-tune shows strong performance for agentic coding on 8GB VRAM systems
A Reddit user tested OmniCoder-9B, a fine-tune of Qwen3.5-9B on Opus traces, with OpenCode and reported 40+ tokens per second speeds using Q4_K_M GGUF quantization at 100k context length on an 8GB VRAM system.

Introducing operate.txt: A YAML spec for AI agents navigating SaaS products
A developer created operate.txt, a YAML file hosted at yourdomain.com/operate.txt that documents screen details, loading states, irreversible actions, and step-by-step paths for AI agents using computer use features. The spec addresses issues like Claude asking 'is this broken?' during legitimate loading screens.

Rift: A Better Alternative to Git Worktrees with Instant Copy-on-Write Snapshots
Rift uses btrfs or APFS snapshots to create instant, space-efficient copies of Git repositories. Initialization, creation, and listing via CLI or JavaScript FFI.

Custom WhatsApp Channel Plugin for Claude Code Using Baileys
A developer built a custom channel plugin that adds WhatsApp support to Claude Code 2.1.80+ using Baileys v7, implementing the WhatsApp Web Multi-Device protocol as an MCP server with the experimental claude/channel capability.