OpenClaw Benchmark Shows Qwen3.5:27B Outperforms Other Local LLMs for Agent Tasks

Benchmark Setup and Results
A user tested 7 local models on 22 real agent tasks using OpenClaw on a Raspberry Pi 5 with an RTX 3090 running Ollama. The tasks included reading emails, scheduling meetings, creating tasks, detecting phishing, handling errors, and browser automation.
The winner by a massive margin was qwen3.5:27b-q4_K_M at 59.4%. The runner-up (qwen3.5:35b) scored only 23.2%. All other models scored below 5%.
Key Findings
- The quantized 27B model beat the larger 35B version by 2.5x
- A 30B model scored dead last at 1.6%
- Medium thinking worked best - too much thinking actually hurt performance
- Zero models could complete browser automation tasks
- The main differentiator between winners and losers was whether the model could find and use command line tools
- Most models couldn't even find basic tools like the email function
This benchmark provides concrete data on how different local LLMs perform as AI agents in practical scenarios. The significant performance gap between the top model and others suggests tool-finding capability is a critical bottleneck for local LLM agents.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Pair Programmer Plugin Adds Live Screen, Voice, and Audio Context to Claude Code
A developer has built a plugin called Pair Programmer that gives Claude Code real-time desktop perception by capturing screen, microphone, and system audio streams. The architecture uses specialized agents running in parallel for different input types, with indexing currently handled by cloud models but designed to be model-agnostic.

Jentic Mini: Self-Hosted API and Action Execution Layer for OpenClaw
Jentic Mini is a self-hosted API and action execution layer that sits between AI agents and external APIs, storing credentials in an encrypted vault and providing scoped toolkits with individually revocable keys. It automatically imports 10,000+ OpenAPI specs and Arazzo workflow sources when credentials are added.

Claude Banana: A Claude Code plugin for image generation with design system awareness
Claude Banana is a Claude Code plugin that generates images using Google's Gemini API with context-aware prompt crafting. It reads Tailwind configs, CSS variables, design tokens, and existing assets to understand project visual styles.

Jan Adds One-Click OpenClaw Installation with Jan-v3-Base Model Integration
Jan now supports one-click installation of OpenClaw with direct integration to the Jan-v3-base model, keeping all operations local and private on your computer.