Mac Studio local LLM loadout: GLM 5.1, Kimi K2.6, and what's working for coding with Claude Code

✍️ OpenClawRadar📅 Published: May 7, 2026🔗 Source
Mac Studio local LLM loadout: GLM 5.1, Kimi K2.6, and what's working for coding with Claude Code
Ad

Over on r/LocalLLaMA, user ezyz posted their Mac Studio local LLM loadout as of May 2026, running on an M3 Ultra with 512GB unified memory. The post is a day-to-day vibe check, not rigorous benchmarks, but it's full of practical observations for anyone running large models locally for coding with Claude Code.

Current active models and performance

GLM 5.1 is the biggest winner. Quantized, it fits in ~380GB with max context, leaving room for other tasks. Decode speed is ~17 t/s, prefill ~190 t/s. The author trusts it up to a 6/10 on task complexity (10 being 'brownfield legacy codebase + vague spec') for coding via Claude Code. It handles self-contained, semi-scoped problems consistently, with occasional API Claude assistance for planning or cleanup.

Kimi K2.6 is in the same tier — not obviously better or worse — but is larger. Even aggressively quantized, it uses ~460GB, leaving little for other experiments. It's faster: prefill ~220 t/s, decode ~21 t/s. The friction is needing to unload it for memory-heavy experiments.

Minimax 2.7 is impressive for its size and speed, but the author rates it only 3-4/10 for dev work. It's an awkward size — GLM and Kimi win on shipping usable code, while smaller models win on assistant tasks like 'summarize this web search'. It does quickly bail out of reasoning for simple requests.

Gemma 4 31B disappointed: MLX support is still messy a month post-release. The 31B dense isn't much faster the big MoEs, the official chat template has multiple unaddressed bugs, and patches are still trickling in. The author plans to revisit once MTP/draft support stabilizes.

Qwen 3.6 35B was replaced with Qwen 3.5 9B for multimodal tasks like translating screenshots — it's good enough and fast enough, and handles Claude Code's Haiku background tasks with no noticeable difference, while saving ~14GB memory.

Ad

Pending support and future watch

Neither Deepseek 4 Flash nor Mimo 2.5 have officially landed in llama.cpp or mlx-lm yet. The author will try the PRs when time permits. They guess the pro versions of both will be too large and slow for the M3 Ultra — GLM's 40B active parameters is roughly their patience limit.

Eagerly watched projects:

  • Exo and tinygrad for Mac + NVIDIA clustering and disaggregated prefill
  • Stable Dflash / DDtree / MTP support
  • Novel quantization formats (paroquant, JANGTQ) — see llama.cpp PR #21038
  • Local music generation — Ace Step 1.5 is 'almost good' but voices not there yet.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

How Centralized Context Architecture with Claude Saves 10+ Hours Weekly
Use Cases

How Centralized Context Architecture with Claude Saves 10+ Hours Weekly

A Reddit user reports saving 10+ hours weekly by moving SOPs, meeting notes, and CRM into a centralized Notion workspace and connecting Claude directly to that context. Three specific workflows eliminate manual email drafting, spreadsheet entry, and content creation.

OpenClawRadar
OpenClaw user automates cross-platform content formatting with custom skill
Use Cases

OpenClaw user automates cross-platform content formatting with custom skill

A developer built an OpenClaw skill that automatically formats raw drafts for multiple platforms, eliminating manual markdown adjustments for each site's specific requirements.

OpenClawRadar
Practical OpenClaw workflows: TikTok automation, portfolio tracking, Reddit engagement, and scheduled tasks
Use Cases

Practical OpenClaw workflows: TikTok automation, portfolio tracking, Reddit engagement, and scheduled tasks

A non-developer with maritime background shares four specific OpenClaw workflows: TikTok carousel automation costing $0.02 per post, portfolio tracking with DuckDB, Reddit comment automation, and scheduled task automation with cron.

OpenClawRadar
Building a Fantasy Baseball Analytics App with Claude Code: A Law Student's Experience
Use Cases

Building a Fantasy Baseball Analytics App with Claude Code: A Law Student's Experience

A law student with a 2017 CS degree built Ball Knower, a fantasy baseball analytics iOS app, using Claude Code for implementation while handling all product and domain decisions. The app features 1,313 MLB player profiles, daily streaming pitcher picks, and dynasty rankings, with a backend running 30 cron jobs pulling from 9 data sources.

OpenClawRadar