Gemma-4 26B-A4B with Opencode Runs Efficiently on M5 MacBook Air

A developer tested Gemma-4-26B-A4B with Opencode on a 32GB M5 MacBook Air and found it delivers practical performance for local AI coding tasks.
Performance Benchmarks
The specific configuration tested was gemma-4-26B-A4B-it-UD-IQ4_XS running on a 32GB M5 MacBook Air. In low power mode, it achieved:
- 300 tokens/second prompt processing
- 12 tokens/second generation
- 8W power consumption
- No heat or fan noise during operation
The M5 MacBook Air showed significant improvements over previous hardware:
- ~25% faster prompt processing than an M1 Max 64GB (even when the Max wasn't in power saving mode)
- ~6 hours of battery life versus ~2 hours on the M1 Max when running Opencode
- This despite having a smaller battery (53.8Wh vs 70Wh on the M1 Max)
Practical Use Cases
The developer found this setup "actually usable" for agentic coding behavior from a laptop. Previously, running LLMs on an M1 Max 64GB was limited to "tinkering and toy use cases" and couldn't handle longer context tasks effectively. While it could create a simple Snake game in Python, agentic coding or contributing to larger codebases was "a bit janky."
The M5's performance makes it practical for mobile use cases where internet connectivity might be unreliable, such as coffee shops or train commutes.
Comparison to Other Models
The developer compared Gemma-4-26B with Opencode to closed-source alternatives:
- It doesn't replace Claude Code or Antigravity from their testing
- Gemma-4 requires "far more hand-holding than current closed-source frontier models"
- The responses are described as "kinda dry" compared to Claude Code or Gemini-3.1-Pro with Antigravity
- However, they'd prefer Gemma-4-26B over running out of Gemini-2.5-Pro allowance and being forced to use Gemini-2.5-Flash
The developer notes this represents significant progress, as "this sort of agentic coding was cutting-edge / not even really possible with frontier models back at the end of 2024."
📖 Read the full source: r/LocalLLaMA
👀 See Also

Culpa: Open Source Deterministic Replay Engine for AI Agent Debugging
Culpa is an open source tool that records LLM agent sessions with full execution context, enabling deterministic replay using recorded responses as stubs instead of hitting real APIs. It works with Anthropic and OpenAI APIs via proxy mode or Python SDK.

Using Claude Code to revive abandoned personal projects: a practical walkthrough
Matthew Brunelle shares how he used Claude Code (with Opus 4.6) to resurrect a stalled YouTube Music–to–OpenSubsonic API shim project, complete with setup steps, prompts, and workflow tips.

Atlarix v5.1 adds cloud tiers while maintaining local AI coding support
Atlarix v5.1.0 introduces Compass cloud tiers for immediate use while maintaining full Ollama and LM Studio support. The IDE uses a persistent SQLite graph called Blueprint to provide precise context to local models.

SMELT compiler reduces OpenClaw workspace token usage by up to 95%
SMELT compiles OpenClaw workspace markdown files into a denser runtime form, sending only relevant content to AI models. Benchmarks show token reductions from 76.1% to 95.5% on queries, avoiding reprocessing of static files like USER.md and SOUR.md on every message.