NVIDIA DGX Spark Community Launches Spark Arena for Reproducible LLM Benchmarks

✍️ OpenClawRadar📅 Published: March 1, 2026🔗 Source
NVIDIA DGX Spark Community Launches Spark Arena for Reproducible LLM Benchmarks
Ad

The NVIDIA DGX Spark community has established Spark Arena, a reproducible benchmarking platform for open-weights large language models on DGX Spark hardware, addressing previous issues with inconsistent reporting.

Background and Problem

NVIDIA began shipping DGX Spark in mid-October 2025 as a desktop box with unified memory capable of running large models locally, including ~200B parameter models for inference. The community identified a recurring problem where "everyone posts partial flags, then nobody can reproduce it two weeks later."

Standardized Methodology

On October 14, 2025, u/ggerganov posted a DGX Spark performance thread in llama.cpp with a clear methodology: measuring prefill (pp) and generation/decode (tg) across multiple context depths and batch sizes, using llama.cpp CUDA builds with llama-bench and llama-batched-bench.

Ad

Community Solution

The community agreed on standardized tools for runtime image building, orchestration, and recipe format, launching Spark Arena on February 11, 2026.

Current Performance Leaders

Top decode tokens/sec results from Spark Arena:

  • gpt-oss-120b (vLLM, MXFP4, 2 nodes): 75.96 tok/s
  • Qwen3-Coder-Next (SGLang, FP8, 2 nodes): 60.51 tok/s
  • gpt-oss-120b (vLLM, MXFP4, single node): 58.82 tok/s
  • NVIDIA-Nemotron-3-Nano-30B-A3B (vLLM, NVFP4, single node): 56.11 tok/s

Practical Implications

This standardized approach provides developers with reliable performance data for selecting and configuring open-weights LLMs on DGX Spark hardware, enabling better-informed decisions about model deployment and optimization.

📖 Read the full source: r/clawdbot

Ad

👀 See Also