Utilyze: Open-Source GPU Monitor That Measures Real Compute Throughput, Not Just Kernel Activity

The standard GPU utilization metric used by nvidia-smi, nvtop, Weights & Biases, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor is misleading. It reports the fraction of time any kernel is running, so a GPU can show 100% utilization while only using 1-10% of real compute capacity. Teams relying on this for capacity planning may think systems are saturated when they're actually underutilized.
Utilyze
SysTalize released Utilyze (utlz), an open-source (Apache 2.0) tool that measures GPU utilization differently. Instead of kernel activity, it samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical limits. It also estimates an attainable utilization ceiling for a given workload.
Installation
curl -fsSL https://systalyze.com/utilyze/install.sh | bash
Utilyze runs alongside any AI workload in real time with negligible overhead. In production deployments, it has revealed orders-of-magnitude performance headroom in systems that standard tools declared fully saturated.
Why This Matters
AI compute is scarce: H100 one-year rental contracts rose ~40% from October 2025 to March 2026, and lead times for GPUs stretch months. Wasted spend on unnecessary hardware and energy is massive. Accurate measurement is the prerequisite for optimization — every percentage point of real throughput recovered saves money and resources.
Check the GitHub repo: https://github.com/systalyze/utilyze
📖 Read the full source: HN LLM Tools
👀 See Also

Tokven MCP generates complete design token systems from a single hex color
Tokven MCP is a Model Context Protocol tool that creates a full design token system from a single brand hex color, including surfaces, borders, text hierarchy, shadows, and light/dark modes with automatic WCAG contrast validation.

Flash-MoE: Running 397B Parameter Qwen Model on MacBook Pro with Pure C/Metal
Flash-MoE is a pure C/Metal inference engine that runs Qwen3.5-397B-A17B, a 397 billion parameter Mixture-of-Experts model, on a MacBook Pro with 48GB RAM at 4.4+ tokens/second. The 209GB model streams from SSD through custom Metal compute shaders with no Python or frameworks.

cc-lens: Local Dashboard for Claude Code Session Analysis
A developer built cc-lens, a local-first dashboard that reads Claude Code session files from ~/.claude/ and provides usage analytics, cost tracking, and session replay. It runs entirely on your machine with no cloud sync, sign-ups, or telemetry.

Need MCP Server Provides Semantic Tool Discovery for AI Agents
An MCP server called Need enables semantic search over 10,000+ tools from brew, npm, pip, and cargo. When an agent requests a task like 'compress these PNGs,' it finds pngquant, installs it, runs it, and reports back on success.