Gemma Gem: On-Device AI Agent for Browser Automation via WebGPU

Gemma Gem is a Chrome extension that loads Google's Gemma 4 model (2B or 4B) through WebGPU in an offscreen document, giving it tools to interact with webpages directly in the browser without external API calls.
Key Details
The extension provides several tools that run in different contexts:
read_page_content: Read text/HTML of the page or a CSS selector (Content script)take_screenshot: Capture visible page as PNG (Service worker)click_element: Click an element by CSS selector (Content script)type_text: Type into an input by CSS selector (Content script)scroll_page: Scroll up/down by pixel amount (Content script)run_javascript: Execute JS in the page context with full DOM access (Service worker)
The architecture uses three main components:
- Offscreen document: Hosts the model via @huggingface/transformers + WebGPU, runs the agent loop
- Service worker: Routes messages between content scripts and offscreen document, handles take_screenshot and run_javascript
- Content script: Injects gem icon + shadow DOM chat overlay, executes DOM tools
Setup and Usage
Requirements:
- Chrome with WebGPU support
- ~500MB disk for E2B model, ~1.5GB for E4B (cached after first run)
Setup commands:
pnpm install
pnpm build
Load the extension in chrome://extensions (developer mode) from .output/chrome-mv3-dev/.
Usage:
- Navigate to any page
- Click the gem icon (bottom-right corner) to open the chat
- Wait for model to load (progress shown on icon + chat)
- Ask questions about the page or request actions
Settings and Configuration
Available settings via gear icon in chat header:
- Model: Switch between Gemma 4 E2B (~500MB) and E4B (~1.5GB) - selection persists across sessions
- Thinking: Toggle native Gemma 4 thinking
- Max iterations: Cap on tool call loops per request
- Clear context: Reset conversation history for the current page
- Disable on this site: Disable the extension per-hostname (persisted)
Development and Debugging
Tech stack:
- WXT — Chrome extension framework (Vite-based)
- @huggingface/transformers — Browser ML inference
- marked — Markdown rendering in chat
- Gemma 4 E2B / E4B (onnx-community/gemma-4-E2B-it-ONNX, onnx-community/gemma-4-E4B-it-ONNX) — q4f16 quantization, 128K context
Build commands:
pnpm build # Development build (with logging, source maps)
pnpm build:prod # Production build (logging silenced, minified)
Debugging locations:
- Service worker logs: chrome://extensions → Gemma Gem → "Inspect views: service worker"
- Offscreen document logs: chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"
- Content script logs: Open DevTools on any page → Console
- All extension pages: chrome://inspect#other lists all inspectable extension contexts
The offscreen document logs show model loading, prompt construction, token counts, raw model output, and tool execution.
Technical Notes
The agent/ directory has zero dependencies and defines interfaces (ModelBackend, ToolExecutor) that can be extracted as a standalone library. The extension includes a thinking mode that shows chain-of-thought reasoning as it works.
According to the source, the agent works for simple page questions and running JavaScript, but multi-step tool chains are unreliable and it sometimes ignores its tools entirely.
📖 Read the full source: HN AI Agents
👀 See Also

Atlarix v5.1 adds cloud tiers while maintaining local AI coding support
Atlarix v5.1.0 introduces Compass cloud tiers for immediate use while maintaining full Ollama and LM Studio support. The IDE uses a persistent SQLite graph called Blueprint to provide precise context to local models.

4-layer self-audit system for OpenClaw behavioral evolution
A developer built a 4-layer audit system where Gemini reviews Claude's blind spots weekly, catching patterns Claude missed in self-review. The system includes post-fix verification, pattern mining, external mirroring, and expectation vs reality checks.

Claude Sessions: Lightweight Desktop App for Browsing Claude Code History
Claude Sessions is a new desktop application that lets developers browse their Claude Code session history locally. It reads from ~/.claude/projects, organizes sessions by project, handles large sessions up to 500k+ tokens without lag, and includes search functionality and keyboard navigation.

QCAI App Provides Mobile Control Center for OpenClaw Ecosystem
Academic research team releases QCAI app for iOS and Android, built with AI-assisted development, offering dashboard monitoring, gateway chat, and secure VPN access to OpenClaw tools.