Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents

Performance benchmarks from community testing
Community testing was conducted using a single modified RTX 4090 GPU with 48GB VRAM. The official Qwen3.5-35B-A3B-FP8 and Qwen3.5-27B-FP8 models were tested with 256K context length.
Framework recommendations
SGLang is recommended as the only framework that fully supports prefix caching, which is essential for Qwen3.5's hybrid attention architecture.
- For 100K context: Cold-start prefill takes about 10 seconds
- With caching: Prefill drops to 200ms
- Result: Very low first-token latency and extremely fast output
Model performance metrics
- Qwen3.5-35B-A3B-FP8: Started at 120 tokens/second, decayed to 80 tokens/second
- Qwen3.5-27B-FP8: Started at 20 tokens/second, slightly decayed to 18 tokens/second
OpenClaw agent scaling
OpenClaw can run agent teams with six agents simultaneously, and speed scales up to reach 120 tokens/second. The tester noted surprise at this scaling behavior.
The drawback mentioned is that single-thread performance is slow with this configuration.
MTP optimization notes
Enabling MTP (Multi-Token Prediction) for the 27B-FP8 model can significantly boost single-request generation speeds:
- On a single NVIDIA H100: Maintains 100 tokens/second with 20K context window
- Prefill speed for 64K tokens: Under 1 second
Important caveat: MTP conflicts with prefix caching and is highly VRAM-intensive. Users with RTX 4090 should start with a lower num-steps setting.
📖 Read the full source: r/openclaw
👀 See Also

DeepSeek Rejects Alibaba: $50B Funding Round Prioritizes Independence Over Big Tech Integration
DeepSeek's $50B funding round collapses with Alibaba due to integration demands; founder Liang Wenfeng insists on no restrictive clauses, weighing offers from Tencent and state-backed funds.

C++26 Standard Draft Finalized with Reflection, Memory Safety, Contracts, and Async Framework
The C++26 standard draft is complete, introducing reflection for metaprogramming, enhanced memory safety that eliminates undefined behavior for uninitialized variables and adds bounds safety for standard library types, contracts with pre/post-conditions, and std::execution for concurrency.

Claude Code v2.1.136: Hard Deny for Auto Mode, MCP OAuth Fixes, and 40+ Bug Fixes
Anthropic released Claude Code v2.1.136 with a hard_deny setting for auto mode classifier rules, fixes for MCP server disappearance after /clear, OAuth token refresh concurrency issues, and over 40 other bug fixes.

Reddit Discussion on Claude's Impact on MVP Development and Founder Pitfalls
A Reddit user discusses how Claude AI lowers technical barriers for building MVPs from $3k-$5k to DIY, but warns about increased competition and founders focusing too much on building versus marketing, PMF, and operations.