SubQ: First Fully Subquadratic LLM with 12M-Token Context and 95% RULER Accuracy

Subquadratic has released SubQ 1M-Preview, the first fully subquadratic large language model, where compute scales linearly with context length — not quadratically as with transformers. This eliminates the need for RAG systems and chunking workarounds for long-context tasks. The research model supports up to 12 million tokens, with a 1M-token production model available in early access.
Key Features
- Subquadratic attention: Reduces attention compute by ~1,000x compared to frontier transformer models at 12M-token context, per the source.
- SubQ Code: CLI-based coding agent that loads entire codebases into a single context window. No multi-agent orchestration needed — plans, executes, and reviews across a full repository in one pass.
- SubQ Search: Long-context search tool offering Deep Research capabilities at chatbot speed.
- API: Full-context API for developers and enterprise teams.
Benchmarks
All results were verified by a third party (source does not specify the firm):
- RULER 128K: 95% accuracy — compared to Claude Opus 4.6 at 94.8%.
- MRCR v2 (multi-piece retrieval & reasoning): Production model scores 65.9; research model scores 83. Reference: Claude Opus 4.7 = 32.2, GPT 5.5 = 74, Gemini 3.1 Pro = 26.3.
- SWE-Bench Verified: 81.8% — compared to Opus 4.6 (80.8) and Deepseek 4.0 Pro (80.0).
- Attention speed: SubQ Sparse Attention is 52× faster than FlashAttention in architecture-level comparison, using 63% less compute.
Architecture Details
The model uses a fundamentally redesigned attention mechanism built from first principles to be subquadratic. It leverages linear attention, state space model ideas, and sparse attention — but unlike prior attempts, maintains frontier-level accuracy. The team includes PhDs from Meta, Google, Oxford, BYU, ByteDance, Adobe, and Cambridge.
Availability
Private beta starts today (May 5, 2026). Access to API, SubQ Code CLI, and SubQ Search. SWE-Bench score indicates strong coding performance for AI coding agents like OpenClawRadar's readers.
📖 Read the full source: HN AI Agents
👀 See Also

Anthropic Blocks Claude Subscriptions via Third-Party Tools
Anthropic has implemented server-side blocks on Claude Pro/Max subscriptions used through third-party OAuth integrations, citing subsidized access being taken advantage of at scale. The policy change includes 'Extra Usage' billing that makes these integrations economically unviable.

Exploring Step 3.5 Flash: Open-Source Model for Fast Deep Reasoning
Step 3.5 Flash is an open-source foundation model designed for fast and efficient deep reasoning, utilizing a sparse Mixture of Experts architecture.

Ohio Suspends Data Center Tax Break: AI Cost Pressures Mount for Tech Firms
Ohio halts a sales tax exemption on equipment for new data centers, including those powering AI. The move signals growing state-level scrutiny of tax incentives as AI infrastructure demands surge.

Claude.ai, API, and Claude Code Experiencing Elevated Errors
Claude.ai, the Claude API, and Claude Code are experiencing elevated errors with the web interface and developer console down. Claude Code login via Claude.ai is broken, though logged-in users can still use it.