Apple's libibverbs Hides GPUDirect RDMA Symbols; Zero-Copy Metal Buffer RDMA Works on macOS

A follow-up to the TinyGPU investigation reveals that Apple's RDMA implementation supports zero-copy memory sharing with Metal GPU buffers, and hidden symbols indicate possible GPUDirect RDMA support — undocumented and previously unknown.
Key Findings
The developer tested ibv_reg_mr() with various memory types on a 4-node Mac cluster (3x M3 Ultra + M5 Max MacBook Pro, ~1.5TB unified memory, Thunderbolt 5). Results:
malloc()— FAIL (unexpected; works on Linux)posix_memalign()— FAIL (unexpected)mmap(MAP_ANON)— PASS (expected)IOSurfaceGetBaseAddress()— PASS (no documentation)MTLBuffer.contents(Metal shared) — PASS (no documentation)
Apple's RDMA validates VM-mapping type, not physical backing. Heap allocations fail; VM-mapped memory (mmap, IOSurface, Metal buffers) passes — a key difference from Linux.
Zero-Copy Proven
A 64MB mmap buffer was triple-registered: as an RDMA memory region, a Metal GPU buffer, and an IOSurface. All registrations succeeded with the same lkey=0x101, confirming zero-copy sharing between GPU and network.
Hidden GPUDirect RDMA Symbols
Analysis of Apple's libibverbs.dylib via nm -a revealed undocumented symbols including ibv_reg_dmabuf_mr, which on Linux enables GPUDirect RDMA. This suggests Apple has already implemented the kernel-level plumbing, but the API is not publicly exposed.
Blackwell eGPU Status
The RTX PRO 5000 Blackwell 72GB in a Razer Core X V2 is detected (PCIe link up, x4 @ 16 GT/s, 80 Gb/s TB5), and TinyGPU's DriverKit extension loads. However, NVIDIA's GSP firmware fails with RuntimeError: RPC call 4097 failed with result 101. NOCAT error decode reveals FBFLCN UNRECOGNIZED_CLIENT — the GPU's memory fabric doesn't recognize the PCIe peer through TB5. This is a known issue (tinygrad#15843); AMD GPUs work fine. The developer requests collaboration with the tinygrad team to fix GSP firmware init over TB5.
Who This Is For
Developers working on macOS GPU compute, RDMA, or eGPU infrastructure, especially those interested in zero-copy data paths for distributed inference or training.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Benchmark Comparison of Qwen 3.5 Models Against Major AI Models
A benchmark comparison website includes verified scores and head-to-head infographics for Qwen 3.5 models (122B, 35B, 27B, 397B) against models like GPT-5.2, Claude 4.5 Opus, Gemini-3 Pro, and others.

Claude CLI Directive Drift Issue Reported by Developer
A developer reports Claude CLI consistently ignores project directives stored in .claude folder files, particularly after auto-compact operations. The tool runs prohibited background processes and deletes task/session data despite explicit instructions.

Why Lawyers Keep Citing AI-Hallucinated Cases: A Developer's Take
1,400+ court cases cite AI-made-up precedents. Lawyers keep trusting hallucinations despite sanctions. How automation bias undermines professional judgment.

Telus Deploys Real-Time Accent Conversion on Call-Center Agents via Tomato.ai
Telus is using Tomato.ai's speech-to-speech system to alter offshore agents' accents in real time, drawing backlash over transparency and worker rights.