Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents

✍️ OpenClawRadar📅 Published: February 28, 2026🔗 Source
Qwen3.5-27B-FP8 performance benchmarks with OpenClaw agents
Ad

Performance benchmarks from community testing

Community testing was conducted using a single modified RTX 4090 GPU with 48GB VRAM. The official Qwen3.5-35B-A3B-FP8 and Qwen3.5-27B-FP8 models were tested with 256K context length.

Framework recommendations

SGLang is recommended as the only framework that fully supports prefix caching, which is essential for Qwen3.5's hybrid attention architecture.

  • For 100K context: Cold-start prefill takes about 10 seconds
  • With caching: Prefill drops to 200ms
  • Result: Very low first-token latency and extremely fast output

Model performance metrics

  • Qwen3.5-35B-A3B-FP8: Started at 120 tokens/second, decayed to 80 tokens/second
  • Qwen3.5-27B-FP8: Started at 20 tokens/second, slightly decayed to 18 tokens/second
Ad

OpenClaw agent scaling

OpenClaw can run agent teams with six agents simultaneously, and speed scales up to reach 120 tokens/second. The tester noted surprise at this scaling behavior.

The drawback mentioned is that single-thread performance is slow with this configuration.

MTP optimization notes

Enabling MTP (Multi-Token Prediction) for the 27B-FP8 model can significantly boost single-request generation speeds:

  • On a single NVIDIA H100: Maintains 100 tokens/second with 20K context window
  • Prefill speed for 64K tokens: Under 1 second

Important caveat: MTP conflicts with prefix caching and is highly VRAM-intensive. Users with RTX 4090 should start with a lower num-steps setting.

📖 Read the full source: r/openclaw

Ad

👀 See Also