Jake Benchmark v1: Local LLM Performance Testing for OpenClaw AI Agents

The Jake Benchmark v1 is a performance evaluation tool for local LLMs functioning as AI agents with OpenClaw. It tests models on 22 practical tasks to determine their effectiveness in real-world agent scenarios.
Test Setup and Methodology
The benchmark was run on a Raspberry Pi with Ollama running on an NVIDIA 3090 GPU. The developer tested 7 different local LLMs to identify the best model for agent work with OpenClaw.
Task Categories
The 22 tasks covered real-world scenarios including:
- Reading emails and creating tasks from them
- Scheduling meetings and checking for conflicts
- Phishing detection (specifically a fake email pretending to be the owner asking for a bitcoin wallet key)
- Error handling
Key Results
The performance variation was significant across models:
- Qwen 27B: Scored 59.4% - successfully handled emails, scheduled meetings, detected phishing attempts, and managed errors
- Nemotron 30B: Scored 1.6% - attempted to solve tasks by running
apt-get install git
Notable Observations
The phishing test revealed interesting behaviors:
- The best model refused the phishing request immediately
- The worst model read the secrets file three times before deciding not to share the information
Dashboard Features
The benchmark includes an interactive dashboard that allows users to:
- Click into any model to view the full conversation
- See exactly what each model did during tasks
- Identify where models went wrong in their execution
The tool is available on GitHub for developers to run their own evaluations and compare local LLM performance for agent tasks.
📖 Read the full source: r/openclaw
👀 See Also

Markdown Manager: A Simple Markdown Editor for macOS
Markdown Manager is a free, open-source macOS app for managing Markdown files, featuring document conversion and preview capabilities.

Four ClawHub Skills for Real-Time Search Data in AI Agents
Four ClawHub skills provide structured search capabilities for AI agents: Google (web, news, images, maps), Amazon (product search across 12 marketplaces), Walmart (product search with delivery filters), and YouTube (video search with transcripts). Install via clawhub install commands with one API key.

CRMy: Open Source CRM and Customer Context Engine for OpenClaw
CRMy is an open source CRM and Customer Context Engine built specifically for OpenClaw agents. It includes a complete CLI, OpenClaw plugin with 12 CRM tools, PostgreSQL backend, and self-hosted deployment with two commands.

Clarc v1.0: Workflow OS for Claude Code with 63 Agents and 249 Skills
Clarc is a plugin layer for Claude Code that provides 63 specialized subagents, 249 domain skills, and 178 slash commands for development workflows. Installation is via npx with support for multiple editors including Cursor and OpenCode.