Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM

✍️ OpenClawRadar📅 Published: April 20, 2026🔗 Source

Local LLM Performance on Apple Silicon

A Reddit user has shared their experience running the Qwen3.5-122B-A10B-MINT-MLX model locally on an M5 Pro with 64GB RAM. The setup demonstrates that large language models can run effectively on consumer hardware with proper configuration.

Configuration Details

The user achieved smooth performance using specific terminal commands for VRAM allocation:

sysctl iogpu.unified_memory_limit_percentage
sudo sysctl iogpu.wired_limit_mb=61440

In LM Studio, they set the context window to 16384 tokens. With this configuration, the system maintained stable performance while running Safari with multiple tabs, Messages, and Activity Monitor simultaneously.

Performance Benchmarks

The Qwen3.5-122B-A10B-MINT-MLX model delivered:

Time to First Token: 0.86 seconds
Token Generation Speed: 39.58 tokens/second

The user noted the model "solved a bunch of riddles correctly and did a bit of vibe coding" with no complaints about the 3-bit MINT quantization. The only issue occurred when the context window filled up near 59GB VRAM usage, causing system lockup.

Comparison with Other Models

The user also tested "Qwen3.5 40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking Mxfp8," which they found to be more accurate than the 122B model but significantly slower:

Token Generation Speed: 6.93 tokens/second
Prompt processing remained fast despite slower generation

This demonstrates the trade-off between model size, quantization, and inference speed that developers face when choosing local LLM configurations.

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Claude Code: Feedback Honeypot Overrides Privacy Opt-Out — Users Report Session Transcript Trap

Anthropic's Claude Code now prompts users to allow session transcript review — pressing 'n' for no logs 'Thanks for your feedback' and may still train models. Dismiss key behavior is unclear.

May 31, 2026, 12:17 PM UTC

OpenClawRadar

News

Analysis of 'Clausage': User Anxiety Patterns in AI Subscription Models

A user analysis identifies 'Clausage' or 'The Claude Syndrome'—behavioral patterns where premium AI subscribers experience chronic usage anxiety, avoidance behavior, and compulsive resource monitoring. The source details specific symptoms like anticipatory avoidance, usage hypervigilance, and paradoxical underutilization of paid services.

Apr 15, 2026, 07:20 PM UTC

OpenClawRadar

News

Codex Converses: OpenClaw's Successor in AI Automation

Codex can now communicate with itself, heralding a new era in AI-driven automation and effectively replacing OpenClaw, the previous frontrunner.

Apr 20, 2026, 05:38 PM UTC

OpenClawRadar

News

Wikipedia's AI Policy: LLMs Banned for Article Creation, Exceptions for Copyediting and Translation

Wikipedia prohibits using LLMs to generate or rewrite articles, with narrow exceptions for basic copyediting and translation. Violations can lead to speedy deletion (G15) and removal of AI-generated comments from talk pages.

Apr 24, 2026, 12:15 PM UTC

OpenClawRadar