AI Browser Automation Token Costs Vary 2.6x

Benchmark results: Same accuracy, different costs

A benchmark tested 4 CLI browser automation tools using the same model (Claude Sonnet 4.6) on 6 real-world tasks against live websites. All tools scored 100% accuracy across 18 task executions, but token usage varied dramatically:

openbrowser-ai: 36,010 tokens / 84.8s / 15.3 tool calls
browser-use: 77,123 tokens / 106.0s / 20.7 tool calls
playwright-cli (Microsoft): 94,130 tokens / 118.3s / 25.7 tool calls
agent-browser (Vercel): 90,107 tokens / 99.0s / 25.0 tool calls

Openbrowser-ai used 2.1 to 2.6x fewer tokens than the other tools. The benchmark found tool call count is the strongest predictor of token cost because every call forces the LLM to re-process the entire conversation history.

How the tools differ in implementation

All four tools maintain persistent browser sessions via background daemons, can execute JavaScript server-side and return just the result, work on making page state compact, and support some form of code execution.

browser-use exposes individual CLI commands: open, click, input, scroll, state, eval. The LLM issues one command per tool call. eval runs JavaScript in the page context. Page state is an enhanced DOM tree with [N] indices at roughly 880 characters per page. It communicates with Chrome via direct CDP through their cdp-use library.

agent-browser follows a similar pattern: open, click, fill, snapshot, eval. It's a native Rust binary that talks CDP directly to Chrome. Page state is an accessibility tree with u/eN refs. The -i flag produces compact interactive-only output at around 590 characters. Commands can be chained with && but each is still a separate daemon request.

playwright-cli offers individual commands plus run-code, which accepts arbitrary Playwright JavaScript with full API access. The LLM can write code like run-code "async page => { await page.goto('url'); await page.click('.btn'); return await page.title(); }" and execute multiple operations in one call. Page state is an accessibility tree saved to .yml files at roughly 1,420 characters, with incremental snapshots that send only diffs after the first read.

openbrowser-ai has no individual commands at all. The only interface is Python code via -c:

openbrowser-ai -c 'await navigate("https://en.wikipedia.org/wiki/Python") info = await evaluate("document.querySelector('.infobox')?.innerText") print(info)'

navigate, click, input_text, evaluate, scroll are async Python functions in a persistent namespace. The page state is DOM with [i_N] indices at roughly 450 characters. Variables persist across calls like a Jupyter notebook.

The benchmark observed that the LLM made fewer tool calls with OpenBrowser (15.3 vs 20-26 for other tools), which the authors attribute to the code-only interface naturally encouraging batching of operations.

📖 Read the full source: r/ClaudeAI

Benchmark shows AI browser automation tools vary 2.6x in token costs despite identical accuracy

Benchmark results: Same accuracy, different costs

How the tools differ in implementation

👀 See Also

Blackwell LLM Toolkit: NVFP4 Configs, Wheels, and Benchmarks for TensorRT-LLM on RTX Pro 6000

Storybloq: A Project Tracker for Claude Code with Mac App, CLI, and MCP

AgentMarket: A Proof-of-Concept Platform for AI Agent Economies

Claude Code Limiter: Self-Hosted Rate Limiter for Shared Claude Code Subscriptions