TECH

SubQ Sub-Quadratic LLM Launch with 12M Token Context

30+

Signals

Strategic Overview

01.
Subquadratic launched on May 5, 2026 with $29M in seed funding at a $500M valuation, debuting SubQ as the first frontier LLM built on a fully sub-quadratic sparse-attention (SSA) architecture with a 12 million-token context window.
02.
Headline performance claims include 52x faster than FlashAttention at 1M tokens and roughly 1,000x reduction in attention compute at 12M tokens versus other frontier models, with pricing of about $1.50 per million tokens against ~$15 for competing frontier models.
03.
The launch ships two beta products — a SubQ API exposing the full 12M-token window and SubQ Code, a CLI agent designed to load entire codebases (~5.1M tokens) or months of pull requests (~7.5M tokens) into one context call — with a 50M-token model announced as the next milestone.
04.
Independent reaction has been sharply skeptical: published benchmarks stop at the 1M-Preview production model rather than the headline 12M figure, no full technical report or weights have been released, and access is gated behind early-access approval.

The Research-vs-Production Gap That Defines This Launch

The single most important detail buried in SubQ's launch materials is that the 12 million-token context window is described as a research result, while the model actually exposed to early-access users is labeled SubQ 1M-Preview. The headline number — '12M tokens' — anchors every press story, every investor pitch, every comparison to Claude Opus, but the benchmarks that have been published largely cap at one million tokens or below. RULER 128K is a 128K-token benchmark. SWE-Bench Verified is a coding benchmark, not a context-length benchmark. MRCR v2 is reported at 1M tokens with 8 needles. There is no published 12M-token benchmark, despite 12M being the entire architectural pitch.

This matters because the value proposition of sub-quadratic attention is that it should get better at long context, not just survive it. If SSA is genuinely O(n), then a 12M-token retrieval benchmark should be the easiest, most decisive test the company could run. Its absence is what drove the dominant skeptic breakdown on developer YouTube, what fueled the LocalLLaMA dissection of MRCR v2 (where SubQ's production score of 65.9 sits below Opus 4.6's 78.3 and GPT-5.5's 74), and what frames the entire community reaction. The launch isn't being judged on whether SSA could work — it's being judged on whether the demonstrated artifact matches the marketed artifact, and on that question the gap is conspicuous.

The Cost Wedge That Makes This Threatening If Real

Set the verification questions aside for a moment and look at what SubQ would do to the long-context economy if the numbers hold. The most striking single data point in the launch is the RULER 128K comparison: SubQ scores 95% accuracy at roughly $8 of inference cost while Claude Opus scores 94% at roughly $2,600 — a ~300x cost reduction at competitive accuracy. At the headline level the company is pricing at $1.50 per million tokens against ~$15 for frontier competitors, a 10x list-price wedge that compounds with the architectural compute savings.

This is the part of the story that explains why investors moved at a $500M valuation for a company with no public weights. If long-context inference becomes 100x to 1,000x cheaper, the entire RAG industry — vector databases, chunk retrieval, summarization pipelines — becomes a workaround for a problem that no longer exists. SubQ Code's pitch is exactly this: load the whole repository, skip retrieval, let the model see everything. The reported 25% bill reduction and 10x exploration speedup versus current coding IDEs is plausible only if the underlying compute economics are real. The threat to incumbents isn't that SubQ exists; it's that the architectural direction — sparse, content-dependent attention with no quadratic fallback layers — would force every frontier lab to ship a sub-quadratic line within 12 months or cede long-context to a startup.

The Skeptic Playbook the Community Is Running

What's notable about the community reaction is how rehearsed it feels. Reddit, YouTube, and X are independently running the same checklist: weights not released, full technical report not published, benchmarks selectively reported, access gated behind approval, marketing claims inconsistent with blog claims (1/5 cost vs <5%), and a 'next model is even bigger' announcement (50M tokens) before the current one is independently verified. The magic.dev parallel comes up repeatedly because the 100M-token announcement that never shipped a product is fresh in everyone's memory.

The specific technical objections also follow a pattern. Will Depue's argument that '52x faster than FlashAttention' is most likely a kernel-level comparison rather than end-to-end inference is the kind of distinction that matters enormously: a sparse-attention kernel can absolutely beat a dense-attention kernel on raw FLOPs without delivering anything close to 52x end-to-end throughput, because attention is only one component of an LLM forward pass. Similarly, the LessWrong pre-launch survey of prior sub-quadratic work — Kimi Linear, DeepSeek Sparse Attention, Mamba, RWKV — establishes that the field has a long history of architectures that look linear in theory but degrade on precise memory retrieval in practice. SubQ's claim is that it has solved this; the community's response is that solving it would require evidence the launch did not provide.

The Open-Source Counterfactual That Hurts SubQ

The contrarian positive case for sub-quadratic attention isn't coming from SubQ's defenders — it's coming from a three-month-old open release by concavity-ai, which the LocalLLaMA community has been holding up as the credibility benchmark. concavity-ai shipped sub-quadratic attention with open weights, public code, and a paper, demonstrating that the architectural idea is real and shippable when handled transparently. That open release is what makes SubQ's closed marketing posture look worse by contrast, not better.

The structural critique lands because there is now a reference example of how to introduce a sub-quadratic frontier model credibly — open weights, public paper, reproducible benchmarks — and SubQ chose the opposite path. Even Adam Holter's relatively friendly analysis flags this directly: SubQ is interesting precisely because it commits to attention without quadratic fallback layers, but that architectural commitment is exactly the kind of claim that needs a full technical report to be evaluable. The launch's strategic decision to gate everything behind early-access approval is what's converting a potential breakthrough story into a credibility story.

Historical Context

2024

magic.dev's previously announced 100M+ token context window has not produced a shippable product, repeatedly cited as a cautionary precedent for SubQ's 12M-then-50M roadmap.

2026-01-02

Pre-launch debunking essay surveys Kimi Linear, DeepSeek Sparse Attention, Mamba, and RWKV, arguing all prior sub-quadratic attention work is either practically quadratic or underperforms attention at frontier scale — setting a skeptical baseline that SubQ inherits.

2026-05-05

Company emerges from stealth in Miami with $29M seed at $500M valuation, releases SubQ 1M-Preview as the production model, the SubQ Code CLI agent in beta, and announces SubQ API beta with the full 12M-token window.

Power Map

Key Players

Subject

SubQ Sub-Quadratic LLM Launch with 12M Token Context

Justin Dangel

Co-founder and CEO of Subquadratic; five-time founder (Goji/Consumer United, Voter.com) based in Miami

Alexander Whedon

Co-founder and CTO of Subquadratic; former software engineer at Meta and former Head of Generative AI at TribeAI

Subquadratic research team

11 PhD researchers and published authors with backgrounds from Meta, Google, Oxford, Cambridge, BYU, ByteDance, and Adobe

Javier Villamizar

Investor in the seed round; former partner at SoftBank Vision Fund

Justin Mateen

Investor in the seed round; co-founder of Tinder and founder of JAM Fund

Source Articles

Top 1

SubQ Debuts as First Frontier Sub-Quadratic LLM with 12M Token Context

THE SIGNAL.

Analysts

"Frames SubQ as the first frontier LLM with linear scaling in context length, citing a 52x speedup over FlashAttention at 1M tokens: 'If you double the input size with quadratic scaling laws, you need four times to compute; with linear scaling laws, you need just twice.'"

Alexander Whedon

Co-founder and CTO, Subquadratic

"Argues SubQ is 'almost surely a sparse attention finetune of Kimi or DeepSeek; their O(n) scaling claims and the speedup numbers don't seem to line up — either incredibly poorly communicated or just not real.'"

Will Depue

AI engineer / researcher

"Notes SubQ is distinctive in staying inside the attention paradigm: 'SubQ remains inside the attention paradigm while making the mechanism sparse from the ground up. No fallback quadratic layers. No hybrid scaffolding that reintroduces the original scaling behavior' — but flags the absence of a full technical report and that benchmarks stop at the 1M preview."

Adam Holter

Independent long-context analyst

"Pre-existing analysis of prior sub-quadratic approaches concludes 'pure Linear Attention still struggle with precise memory retrieval and exact copying ... they seem to underperform attention in terms of downstream benchmark performance on frontier scale models' — establishing a strong prior that sub-quadratic claims need exceptional evidence."

Vladimir Ivanov

Independent researcher (LessWrong, Jan 2026)

"Captures the binary read driving discussion: 'SubQ is either the biggest breakthrough since the Transformer... > 52x faster than FlashAttention at 1mm tok context > 20x cheaper than Opus ...or it's AI Theranos.'"

Dan McAteer

Tech commentator on X

The Crowd

"Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens"

@@alex_whedon6500

"This is the biggest LLM breakthrough in years. SubQ IS the first frontier model with a fully sub-quadratic sparse-attention architecture (SSA) + a 12 MILLION token context window. The numbers are insane: -> 52x faster than FlashAttention at 1M tokens -> Less than 5% the cost of [comparable models]"

@@hasantoxr0

"And now it's time to see what my little brother has been working on for the past couple years: An AI model fully built on sub-quadratic sparse-attention architecture. Result? 12 million token reasoning model 150 tokens/second 1/5 the cost of Opus"

@@Austen0

"12M Context Window and some some sprinkle of lies?"

@u/prokajevo27

Broadcast

A New AI Model Just Dropped With A CRAZY Claim.

12M Context AI, #1 Open Image Generator, Uncensored ai Video, Robot Girlfriends — HUGE AI

The Transformer Era is Over. Welcome to SubQ.