Gemma4 26B-4B: 145 Tokens/s on RTX 4090 with Web Search

Gemma4 26B-A4B Performance and Features

The gemma-4-26B-A4B model demonstrates strong performance for local use, with the source reporting speeds of approximately 145 tokens per second when running on an RTX 4090 GPU. This combination of capability and speed makes it suitable for responsive local applications.

Key Features from Source

Model: gemma-4-26B-A4B
Performance: ~145 t/s (tokens per second) on RTX 4090
Integration: Web search MCP (Model Context Protocol) support
Multimodal: Image support included
Platforms: Setup documented for Mac and iPhone usage

The source mentions that the experience can be improved with simple tricks and a short system prompt, though specific details about these optimizations are not provided in the excerpt. The author has documented their complete setup process in a blog post that covers configuration and usage across multiple devices.

For developers interested in implementing this setup, the full configuration details, system prompts, and optimization techniques are available in the referenced blog post at the provided URL.

📖 Read the full source: r/LocalLLaMA

Gemma4 26B-A4B Delivers Fast Local Performance with Web Search and Image Support

Gemma4 26B-A4B Performance and Features

Key Features from Source

👀 See Also

Custom WhatsApp Channel Plugin for Claude Code Using Baileys

RalphTerm: ralph-style loop for Claude Code with cross-review sessions from different agents

log-context-mcp: MCP tool reduces log token usage by 96% for Claude debugging

Open-source trust scoring hook for Claude Code monitors sessions, blocks protected paths