Lightning MLX: Fast Local AI Engine for Apple Silicon Agentic Use Delivers 220 tok/s on Qwen 35B-A3B

✍️ OpenClawRadar📅 Published: May 8, 2026🔗 Source

A new open-source inference engine for Apple Silicon called Lightning MLX claims to be the fastest local AI engine, specifically optimized for agentic workflows — coding agents, tool calling, and short-turn tasks. The project is available on GitHub at samuelfaj/lightning-mlx.

Benchmark Results

The author tested on a MacBook Max M5 with 128GB RAM and reported the following token generation speeds:

Qwen3.6-27B: 40.67 tok/s
Qwen3.6-35B-A3B: 220.86 tok/s

These results suggest that the engine is particularly efficient for the mixture-of-expert architecture used in the Qwen3.6-35B-A3B model, which activates only a subset of parameters per token.

Key Features

Optimized for short-turn agentic use cases — code generation, tool calls, and rapid inference loops
Includes a preset configuration called MTPLX (custom sampling defaults); the author is seeking feedback on whether these defaults make sense for production use
Open source under the MIT license (likely) on GitHub

Feedback Requests

The creator is actively asking the community for:

Better benchmark designs for local coding agents
Opinions on the MTPLX preset defaults
Test results on other Apple Silicon configurations (e.g., M1, M2, M3, M4, different RAM sizes)

Who It's For

Developers running local LLMs on Apple Silicon for agentic coding workflows who need maximum inference speed.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Telegram Bot for Claude Code CLI Control from Mobile

A developer built a Telegram bot that bridges to the Claude Code CLI, allowing control via mobile commands like /commit, /code_review, and /simplify. The bot auto-discovers custom skills, processes photos/documents/voice notes, and supports group chat sessions.

Apr 17, 2026, 06:46 PM UTC

OpenClawRadar

Tools

Markdown as Protocol for Agentic UI with Streaming Execution

A prototype uses Markdown as a unified protocol for AI agents to stream text, executable code, and data in a single response. It features streaming execution where code runs statement-by-statement as it arrives and a mount() primitive for creating React UIs with data flow between client, server, and LLM.

Mar 22, 2026, 03:45 PM UTC

OpenClawRadar

Tools

Multi-Agent Career Mentor Built with Ollama and MCP for Local AI

A developer built a 5-agent AI system that analyzes resumes and generates career intelligence reports using Ollama with llama3 locally. The system chains agent outputs so each builds on previous context, with MCP handling tool integration.

Apr 13, 2026, 03:45 PM UTC

OpenClawRadar

Tools

Building a Programming Language with Claude Code: The Cutlet Experiment

Ankur Sethi built a complete programming language called Cutlet using Claude Code over four weeks, with the AI generating every line of code while he focused on guardrails and testing. The language features dynamic typing, vectorized operations, and a REPL, running on macOS and Linux.

Mar 10, 2026, 09:45 PM UTC

OpenClawRadar