V100 SXM2 Homelab: 64GB NVLink VRAM for $1,100

What This Is

A detailed reference document for building a local LLM inference homelab using NVIDIA V100 SXM2 GPUs. The guide focuses on achieving cost-effective, high-bandwidth GPU pooling through reverse-engineered NVLink hardware.

Key Hardware: The 1CATai TECH Board

The core component is a custom quad-GPU adapter board from Chinese company 1CATai TECH (一猫之下科技). The board, model TAQ-SXM2-4P5A5, implements NVIDIA's NVLink 2.0 signaling to create a real NVLink mesh across four V100 SXM2 modules. This provides approximately 300 GB/s bidirectional interconnect per pair, enabling effective tensor parallelism.

A complete quad board setup with 4x V100 SXM2 16GB modules, a PLX8749 IO card, cables, and cooling costs about $1,000-1,200 total, yielding 64GB of NVLink-unified VRAM. Individual V100 16GB modules currently cost $56-99 each.

What It's Not: Common Misconceptions

It's not "one big GPU." nvidia-smi shows four separate GPUs.
NVLink makes tensor parallelism fast enough to feel seamless, but requires software that supports TP (vLLM, llama.cpp, Ollama all work).
It's not automatic unified memory. Two quad boards are two separate NVLink islands connected by PCIe, creating a 20x bandwidth cliff between boards.
The Supermicro AOM-SXM2 has NO NVLink—it's just a carrier board.
The ~900 GB/s number is HBM2 bandwidth per card, not NVLink bandwidth. NVLink 2.0 is ~300 GB/s bidirectional per pair.

Why V100 SXM2 Specifically

900 GB/s HBM2 bandwidth per card with NVLink 2.0 on the SXM2 form factor.
Modules are physically identical across platforms (Supermicro 4029GP-TVRT, Inspur NF5288M5, Dell C4140, DGX-2).
Supercomputer decommissionings (Summit, Sierra) have flooded the secondary market, driving prices down.

MoE Model Advantage

While dense 70B models at Q4 might run at 20-30 tok/s on a single quad board, Mixture of Experts (MoE) models like DeepSeek V3.2 (~685B total, ~37B active per token) decouple storage requirements from inference bandwidth. V100s with massive HBM2 bandwidth and NVLink pools are ideal for this architecture.

120V Server Discovery

The Supermicro 4029GP-TVRT is an 8-way V100 SXM2 server with full NVLink cube mesh (same topology as DGX-1). It has wide-input PSUs accepting 100-240V and ships with standard US wall plugs. At 120V, PSUs derate to ~1,100W each. With V100s power-limited to 150W via nvidia-smi, total system draw is ~1,700W against ~4,400W available capacity—manageable on two standard 15A circuits. This provides 128GB of 8-way NVLink VRAM on residential power. Used units (8x V100 32GB, dual Xeon Gold, 128GB RAM) have been found on eBay for under $1,000.