Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents

✍️ OpenClawRadar📅 Published: February 28, 2026🔗 Source

Performance benchmarks from community testing

Community testing was conducted using a single modified RTX 4090 GPU with 48GB VRAM. The official Qwen3.5-35B-A3B-FP8 and Qwen3.5-27B-FP8 models were tested with 256K context length.

Framework recommendations

SGLang is recommended as the only framework that fully supports prefix caching, which is essential for Qwen3.5's hybrid attention architecture.

For 100K context: Cold-start prefill takes about 10 seconds
With caching: Prefill drops to 200ms
Result: Very low first-token latency and extremely fast output

Model performance metrics

Qwen3.5-35B-A3B-FP8: Started at 120 tokens/second, decayed to 80 tokens/second
Qwen3.5-27B-FP8: Started at 20 tokens/second, slightly decayed to 18 tokens/second

OpenClaw agent scaling

OpenClaw can run agent teams with six agents simultaneously, and speed scales up to reach 120 tokens/second. The tester noted surprise at this scaling behavior.

The drawback mentioned is that single-thread performance is slow with this configuration.

MTP optimization notes

Enabling MTP (Multi-Token Prediction) for the 27B-FP8 model can significantly boost single-request generation speeds:

On a single NVIDIA H100: Maintains 100 tokens/second with 20K context window
Prefill speed for 64K tokens: Under 1 second

Important caveat: MTP conflicts with prefix caching and is highly VRAM-intensive. Users with RTX 4090 should start with a lower num-steps setting.

📖 Read the full source: r/openclaw

👀 See Also

News

DeepSeek Rejects Alibaba: $50B Funding Round Prioritizes Independence Over Big Tech Integration

DeepSeek's $50B funding round collapses with Alibaba due to integration demands; founder Liang Wenfeng insists on no restrictive clauses, weighing offers from Tencent and state-backed funds.

May 9, 2026, 12:15 PM UTC

OpenClawRadar

News

C++26 Standard Draft Finalized with Reflection, Memory Safety, Contracts, and Async Framework

The C++26 standard draft is complete, introducing reflection for metaprogramming, enhanced memory safety that eliminates undefined behavior for uninitialized variables and adds bounds safety for standard library types, contracts with pre/post-conditions, and std::execution for concurrency.

Apr 20, 2026, 01:45 AM UTC

OpenClawRadar

News

Claude Code v2.1.136: Hard Deny for Auto Mode, MCP OAuth Fixes, and 40+ Bug Fixes

Anthropic released Claude Code v2.1.136 with a hard_deny setting for auto mode classifier rules, fixes for MCP server disappearance after /clear, OAuth token refresh concurrency issues, and over 40 other bug fixes.

May 8, 2026, 10:17 PM UTC

OpenClawRadar

News

Reddit Discussion on Claude's Impact on MVP Development and Founder Pitfalls

A Reddit user discusses how Claude AI lowers technical barriers for building MVPs from $3k-$5k to DIY, but warns about increased competition and founders focusing too much on building versus marketing, PMF, and operations.

Apr 14, 2026, 08:00 AM UTC

OpenClawRadar