Local LLM Performance Benchmarks on Mac Mini with OpenClaw and LM Studio

✍️ OpenClawRadar📅 Published: April 18, 2026🔗 Source
Local LLM Performance Benchmarks on Mac Mini with OpenClaw and LM Studio
Ad

A Reddit user shared concrete performance benchmarks for running a local large language model on a Mac Mini with 32GB RAM. The post addresses the scarcity of specific performance data for this hardware configuration.

Technical Setup Details

The user reported the following configuration and results:

  • Software versions: OpenClaw 2026.3.8, LM Studio 0.4.6+1
  • Model: Unsloth gpt-oss-20b-Q4_K_S.gguf
  • Context size: 26035
  • Performance metrics: 34 tokens/second after the first prompt, 0.7 second time to first token
Ad

Model Configuration

The user specified these model settings (all at defaults):

  • GPU offload = 18
  • CPU thread pool size = 7
  • Max concurrents = 4
  • Number of experts = 4
  • Flash attention = on

The Q4_K_S quantization indicates this is a 4-bit quantized version of the 20-billion parameter model, which reduces memory requirements while maintaining reasonable performance. The 32GB RAM on the Mac Mini is sufficient for this model size with the given context length. The 34 tokens/second throughput is a practical benchmark for developers considering similar local LLM setups on Apple Silicon hardware.

📖 Read the full source: r/openclaw

Ad

👀 See Also