Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM

Local LLM Performance on Apple Silicon
A Reddit user has shared their experience running the Qwen3.5-122B-A10B-MINT-MLX model locally on an M5 Pro with 64GB RAM. The setup demonstrates that large language models can run effectively on consumer hardware with proper configuration.
Configuration Details
The user achieved smooth performance using specific terminal commands for VRAM allocation:
sysctl iogpu.unified_memory_limit_percentage
sudo sysctl iogpu.wired_limit_mb=61440
In LM Studio, they set the context window to 16384 tokens. With this configuration, the system maintained stable performance while running Safari with multiple tabs, Messages, and Activity Monitor simultaneously.
Performance Benchmarks
The Qwen3.5-122B-A10B-MINT-MLX model delivered:
- Time to First Token: 0.86 seconds
- Token Generation Speed: 39.58 tokens/second
The user noted the model "solved a bunch of riddles correctly and did a bit of vibe coding" with no complaints about the 3-bit MINT quantization. The only issue occurred when the context window filled up near 59GB VRAM usage, causing system lockup.
Comparison with Other Models
The user also tested "Qwen3.5 40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking Mxfp8," which they found to be more accurate than the 122B model but significantly slower:
- Token Generation Speed: 6.93 tokens/second
- Prompt processing remained fast despite slower generation
This demonstrates the trade-off between model size, quantization, and inference speed that developers face when choosing local LLM configurations.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code: Feedback Honeypot Overrides Privacy Opt-Out — Users Report Session Transcript Trap
Anthropic's Claude Code now prompts users to allow session transcript review — pressing 'n' for no logs 'Thanks for your feedback' and may still train models. Dismiss key behavior is unclear.

Analysis of 'Clausage': User Anxiety Patterns in AI Subscription Models
A user analysis identifies 'Clausage' or 'The Claude Syndrome'—behavioral patterns where premium AI subscribers experience chronic usage anxiety, avoidance behavior, and compulsive resource monitoring. The source details specific symptoms like anticipatory avoidance, usage hypervigilance, and paradoxical underutilization of paid services.

Codex Converses: OpenClaw's Successor in AI Automation
Codex can now communicate with itself, heralding a new era in AI-driven automation and effectively replacing OpenClaw, the previous frontrunner.

Wikipedia's AI Policy: LLMs Banned for Article Creation, Exceptions for Copyediting and Translation
Wikipedia prohibits using LLMs to generate or rewrite articles, with narrow exceptions for basic copyediting and translation. Violations can lead to speedy deletion (G15) and removal of AI-generated comments from talk pages.