SubQ: A Sub-Quadratic LLM with 12M-Token Context Window

✍️ OpenClawRadar📅 Published: May 6, 2026🔗 Source
SubQ: A Sub-Quadratic LLM with 12M-Token Context Window
Ad

SubQ from Subquadratic is a production-ready LLM built on a fully sub-quadratic sparse-attention architecture. It handles up to 12M tokens in a single prompt, runs at 150 tokens per second, and costs roughly 1/5 of leading models like GPT-5 or Opus.

Architecture & Benchmarks

Unlike standard transformers with O(n²) attention, SubQ uses a sub-quadratic sparse-attention mechanism that only processes relevant token relationships. At 12M tokens, this reduces attention compute by nearly 1000×. Benchmarks (third-party validated):

  • SWE-Bench Verified (real-world coding): 81.8%
  • RULER @ 128K (long-context accuracy): 95.0%
  • MRCR v2 (8-needle, 1M): 65.9%

For comparison, SubQ's SWE-Bench score sits between Gemini 3.1 Pro (80.6%) and Opus 4.6 (80.8%). The model also outperforms Opus 4.7 (87.6%? – not reported at time) and GPT-5.5 (n/r) on MRCR v2.

Ad

Products & Integration

Two access options:

  • Full-Context API: 12M-token context, streaming, tool use, OpenAI-compatible endpoints. Process entire repositories in one call at linear cost.
  • SubQ Code (long-context layer for coding agents): Plug into Claude Code, Codex, or Cursor. ~25% lower bill, 10× faster exploration, auto-redirects expensive model turns. One-line install.

Who It's For

Developers and teams running AI agents that need to reason across full codebases, long PR histories, or persistent state without quality loss.

📖 Read the full source: HN AI Agents

Ad

👀 See Also