SubQ: A Sub-Quadratic LLM with 12M-Token Context Window

SubQ from Subquadratic is a production-ready LLM built on a fully sub-quadratic sparse-attention architecture. It handles up to 12M tokens in a single prompt, runs at 150 tokens per second, and costs roughly 1/5 of leading models like GPT-5 or Opus.
Architecture & Benchmarks
Unlike standard transformers with O(n²) attention, SubQ uses a sub-quadratic sparse-attention mechanism that only processes relevant token relationships. At 12M tokens, this reduces attention compute by nearly 1000×. Benchmarks (third-party validated):
- SWE-Bench Verified (real-world coding): 81.8%
- RULER @ 128K (long-context accuracy): 95.0%
- MRCR v2 (8-needle, 1M): 65.9%
For comparison, SubQ's SWE-Bench score sits between Gemini 3.1 Pro (80.6%) and Opus 4.6 (80.8%). The model also outperforms Opus 4.7 (87.6%? – not reported at time) and GPT-5.5 (n/r) on MRCR v2.
Products & Integration
Two access options:
- Full-Context API: 12M-token context, streaming, tool use, OpenAI-compatible endpoints. Process entire repositories in one call at linear cost.
- SubQ Code (long-context layer for coding agents): Plug into Claude Code, Codex, or Cursor. ~25% lower bill, 10× faster exploration, auto-redirects expensive model turns. One-line install.
Who It's For
Developers and teams running AI agents that need to reason across full codebases, long PR histories, or persistent state without quality loss.
📖 Read the full source: HN AI Agents
👀 See Also

Developer shares hybrid AI coding workflow: Claude for planning, local models for execution
A developer built a pipeline using Claude 3.5 Sonnet for task planning and local Qwen2.5-Coder models via Ollama for code generation, achieving 85% token reduction compared to using Claude alone.

TradingView MCP Server Enables Claude to Backtest Trading Strategies
A developer has released an MCP server that allows Claude to backtest six trading strategies using Yahoo Finance data without API keys. Setup involves adding one line to the claude_desktop_config.json file.

Feynman: Open Source Research Agent with Paper-Codebase Audit Tool
Feynman is an open source research agent CLI that dispatches four subagents in parallel to answer research questions and includes a unique audit tool that compares paper claims against actual codebases. It features one-command installation, MIT license, and runs on pi for agent runtime with alphaxiv for paper search.

Reflect MCP Server Implements Reflexion Paper for Persistent Coding Agent Memory
A developer implemented the Reflexion paper (Shinn et al., NeurIPS 2023) as an MCP server to give local coding agents persistent memory of their mistakes. The system uses regex-based pattern matching on error messages and stores lessons in SQLite with FTS5.