Jake Benchmark v1: Local LLM Performance Testing for OpenClaw AI Agents

✍️ OpenClawRadar📅 Published: March 23, 2026🔗 Source

The Jake Benchmark v1 is a performance evaluation tool for local LLMs functioning as AI agents with OpenClaw. It tests models on 22 practical tasks to determine their effectiveness in real-world agent scenarios.

Test Setup and Methodology

The benchmark was run on a Raspberry Pi with Ollama running on an NVIDIA 3090 GPU. The developer tested 7 different local LLMs to identify the best model for agent work with OpenClaw.

Task Categories

The 22 tasks covered real-world scenarios including:

Reading emails and creating tasks from them
Scheduling meetings and checking for conflicts
Phishing detection (specifically a fake email pretending to be the owner asking for a bitcoin wallet key)
Error handling

Key Results

The performance variation was significant across models:

Qwen 27B: Scored 59.4% - successfully handled emails, scheduled meetings, detected phishing attempts, and managed errors
Nemotron 30B: Scored 1.6% - attempted to solve tasks by running apt-get install git

Notable Observations

The phishing test revealed interesting behaviors:

The best model refused the phishing request immediately
The worst model read the secrets file three times before deciding not to share the information

Dashboard Features

The benchmark includes an interactive dashboard that allows users to:

Click into any model to view the full conversation
See exactly what each model did during tasks
Identify where models went wrong in their execution

The tool is available on GitHub for developers to run their own evaluations and compare local LLM performance for agent tasks.

📖 Read the full source: r/openclaw

👀 See Also

Tools

Markdown Manager: A Simple Markdown Editor for macOS

Markdown Manager is a free, open-source macOS app for managing Markdown files, featuring document conversion and preview capabilities.

Feb 14, 2026, 05:45 AM UTC

OpenClawRadar

Tools

Four ClawHub Skills for Real-Time Search Data in AI Agents

Four ClawHub skills provide structured search capabilities for AI agents: Google (web, news, images, maps), Amazon (product search across 12 marketplaces), Walmart (product search with delivery filters), and YouTube (video search with transcripts). Install via clawhub install commands with one API key.

Apr 21, 2026, 06:30 AM UTC

OpenClawRadar

Tools

CRMy: Open Source CRM and Customer Context Engine for OpenClaw

CRMy is an open source CRM and Customer Context Engine built specifically for OpenClaw agents. It includes a complete CLI, OpenClaw plugin with 12 CRM tools, PostgreSQL backend, and self-hosted deployment with two commands.

Mar 20, 2026, 04:45 PM UTC

OpenClawRadar

Tools

Clarc v1.0: Workflow OS for Claude Code with 63 Agents and 249 Skills

Clarc is a plugin layer for Claude Code that provides 63 specialized subagents, 249 domain skills, and 178 slash commands for development workflows. Installation is via npx with support for multiple editors including Cursor and OpenCode.

Mar 22, 2026, 04:45 PM UTC

OpenClawRadar