Four aarch64-specific failure modes when running vLLM on Blackwell GB10 with CUDA 13.0

Setup and environment
The setup uses GB10 hardware with aarch64 (sbsa-linux), Python 3.12, CUDA 13.0, and vLLM v0.7.1. The issues emerged in a daily-reset test environment and are specific to aarch64 with CUDA 13.0.
Failure mode 1: cu121 wheel doesn't exist for aarch64
Using the --index-url .../cu121 protocol returns: ERROR: Could not find a version that satisfies the requirement torch (from versions: none). The cu121 index has no aarch64 binary. The correct index for Blackwell aarch64 is cu130.
sudo pip3 install --pre torch torchvision torchaudio \ --index-url https://download.pytorch.org/whl/nightly/cu130 \ --break-system-packages
Failure mode 2: ncclWaitSignal undefined symbol
After installing cu130 torch, importing fails with: ImportError: libtorch_cuda.so: undefined symbol: ncclWaitSignal. The apt-installed NCCL doesn't have this symbol, but pip-installed nvidia-nccl-cu13 does. The linker doesn't find it automatically.
Fix: Force it via LD_PRELOAD before every Python call:
export LD_PRELOAD=/usr/local/lib/python3.12/dist-packages/nvidia/nccl/lib/libnccl.so.2
Failure mode 3: numa.h not found during vLLM CPU extension build
The error: fatal error: numa.h: No such file or directory. vLLM's CPU extension requires libnuma-dev, which wasn't installed on the reset system.
sudo apt-get install -y libnuma-dev
Failure mode 4: ABI mismatch — MessageLogger undefined symbol
After completing the full build, launching vLLM fails with: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib.
Diagnosis with nm shows:
- What vLLM binary expected (old signature):
U _ZN3c1013MessageLoggerC1EPKciib← (const char*, int, int, bool) - What the cu130 torch library actually provides (new signature):
T _ZN3c1013MessageLoggerC1ENS_14SourceLocationEib← (SourceLocation, int, bool)
Root cause: pip's build isolation. When running pip install -e ., pip creates an isolated build environment and downloads a separate older torch based on pyproject.toml version constraints. vLLM compiles against those old headers, but at runtime the newer cu130 torch is found, causing signature mismatch.
Fix: Use --no-build-isolation with explicit subprocess injection:
sudo -E env \ LD_PRELOAD="/usr/local/lib/python3.12/dist-packages/nvidia/nccl/lib/libnccl.so.2" \ LD_LIBRARY_PATH="/usr/local/lib/python3.12/dist-packages/torch/lib:..." \ MAX_JOBS=8 \ pip3 install -e . --no-deps --no-build-isolation --break-system-packages
Important detail: sudo -E alone doesn't work because pip's subprocess chain doesn't carry LD_PRELOAD. You need sudo -E env VAR=value pip3 to inject into the subprocess explicitly.
Verify the ABI seal after installation:
nm -D vllm/_C.abi3.so | grep MessageLogger # Must contain "SourceLocation" — if it still says "EPKciib", reinstall
Additional note for multi-agent systems
If using vLLM as a backend for a multi-agent system, add --served-model-name your-model-name. Without it, vLLM serves the model under its full file path and agents get 404 when they query by name.
The full v2 protocol, including automation script and systemd service, is available at github.com/trgysvc/AutonomousNativeForge → docs/BLACKWELL_SETUP_V2.md. The repo is for ANF — a 4-agent autonomous coding pipeline running on top of this setup, but the setup docs stand alone if you just need the Blackwell/vLLM fixes.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Running OpenClaw, ClawdBot, and MoltBot on a Budget
Discover how to run OpenClaw, ClawdBot, and MoltBot without breaking the bank. Explore budgeting tips and free alternatives as discussed by enthusiasts on r/clawdbot.

Using AI to Write Better Code More Slowly: A Bug-Finding Workflow
Nolan Lawson describes a workflow using multiple AI agents (Claude, Codex, Cursor Bugbot) to find and prioritize bugs in PRs, improving code quality over raw velocity.

Practical Guide to Creating Claude Skills: Structure, Triggers, and Scripts
Claude Skills are instruction manuals that automate repetitive tasks, stored as folders with a SKILL.md file in ~/.claude/skills/. The guide explains YAML triggers, script integration, and multi-skill orchestration rules.

Practical Prompt Structure for Claude AI Execution Agents
A developer shares prompt engineering techniques that reduced hallucinations in Claude AI agents performing API calls, data extraction, and multi-step workflows. Key strategies include writing prompts as contracts, dedicating 40% of tokens to error handling, and separating 'wait' from 'stop' conditions.