Single-page chatbot interface for locally running Gemma 4 26B A4B

A developer has created a single-page HTML chatbot interface designed to work with Gemma 4 26B A4B running locally. The implementation connects to LM Studio's API and provides a complete chatbot interface in a single HTML file.
Technical Implementation
The system runs Gemma 4 26B A4B locally with a 32K context window, achieving 50-65 tokens per second. The model is sharded between two GPUs: a 7900 XT and a 3060 Ti.
Interface Features
- Full streaming support for real-time responses
- Markdown rendering for formatted output
- Model selector for switching between available models
- Six parameter sliders for fine-tuning model behavior
- Message editing with history branching capabilities
- Regenerate function for response regeneration
- Abort button to stop generation mid-stream
- System prompt support for custom instructions
Development Details
The developer notes that Claude was used to fix two DOM bugs that Gemma couldn't resolve. All other development work was completed using Gemma 4. The project is available on GitHub for examination and use.
This type of single-page interface is particularly useful for developers working with local LLMs who want a lightweight, customizable chat interface without the overhead of complex web applications. The integration with LM Studio's API makes it compatible with various local models beyond just Gemma.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Comparing Multi-Agent AI Systems: Anthropic's Harness vs Agyn's Engineering Org Model
Anthropic published a harness design for long-running application development, while Agyn's multi-agent system for team-based autonomous software engineering was open-sourced last month. Both systems reject monolithic agents in favor of role separation, structured handoffs, and review loops.
Claude Code Skill Tax: 2,596 Installed Skills, 40 Used, $91/Month Wasted
Every installed Claude Code skill loads into every session's system prompt. One user measured 102,651 tokens loaded per session with 98.6% never used, costing ~$91/month. An open-source tool, skill-tax, audits usage and cost.

Benchmark shows AI browser automation tools vary 2.6x in token costs despite identical accuracy
A benchmark of 4 CLI browser automation tools using Claude Sonnet 4.6 on 6 real-world tasks found all achieved 100% accuracy, but openbrowser-ai used 36,010 tokens while others used 77,123-94,130 tokens. Tool call count was the strongest predictor of token cost.

cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration
cc+ is an open-source desktop application for Claude Code built on the Claude Agent SDK, available for macOS and Linux. It provides multi-session tabs, live activity tree visualization, security scoring, workflow enforcement, and fleet orchestration capabilities.