Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
- OpenAI and Anthropic are racing to file IPOs at trillion-dollar marks the same week a 250-expert benchmark showed agents fully pass just 2.6% of real economic tasks.
- Compute is fanning out at once: SpaceX's orbital AI1 satellite, NVIDIA's South Korea buildout, and Google's 3M-TPU order to Intel loosen the single-datacenter chokehold.
- Verification, not raw capability, is the agent bottleneck builders are funding now: developer threads about unverifiable agent code and SF's run of agent-security and formal-methods meetups point the same direction.
Bold Shots
Today's biggest AI stories, no chaser
OpenAI submitted a confidential draft S-1 to the SEC on Monday, June 8 — its first formal step toward a public offering — landing one week after Anthropic's own confidential filing at a roughly $965B valuation, higher than OpenAI's last private mark of about $852B. A draft S-1 forces OpenAI to disclose financials it raised on narrative alone, and under FASB Topic 730 frontier-training costs must be expensed as incurred, reframing the spend from "building infrastructure" to "burning cash now." The numbers: ~$9B net loss on $13.1B revenue (2025), a projected ~$14B loss in 2026, no profitability forecast before ~2030, and a reported ~$207B funding gap by 2030. Filing second and lower than Anthropic inverts the pecking order and turns a research rivalry into a Wall Street footrace.
Why it matters: Public paperwork forces the first real look at OpenAI's economics, and it drops OpenAI into a direct, lower-valuation footrace with Anthropic for the same investor dollars.
Breaking: OpenAI filed for an IPO, setting it up to potentially go public as soon as this fall. Exclusive | OpenAI Kicks Off IPO Process in Test of Investor Appetite for Top AI Labs
BREAKING: WSJ reports OpenAI just made its first formal move toward IPO. It has confidentially filed draft paperwork for an IPO. A confidential S-1 lets OpenAI start SEC review without immediately exposing financials.
During Jensen Huang's Seoul visit, NVIDIA announced AI-infrastructure and physical-AI partnerships spanning NAVER, SK Group, LG, Doosan, and Hyundai, plus a national GPU-operator selection. NAVER is building a full-stack AI factory on DSX, expanding GAK Sejong to 55MW toward gigawatt scale; SK signed a multi-year next-gen-memory deal across four NVIDIA platforms; LG builds humanoid robots on Isaac GR00T. The deals form a coordinated national division of labor — memory (SK) to compute/models (NAVER) to robots/power (LG, Doosan, Hyundai) — and because SK hynix and Samsung dominate HBM, it is mutual lock-in: Korea needs the GPUs, NVIDIA needs the HBM. Energy and thermal capacity, not GPU counts, emerge as the real bottleneck.
Why it matters: It turns a whole country into a vertically integrated AI supply chain and makes the NVIDIA-Korea tie mutual lock-in, with power and cooling, not GPU counts, as the binding constraint.
OpenAI is planning its largest ChatGPT overhaul since launch — a unified superapp merging Codex, AI agents, image generation, the Atlas browser, and third-party partner apps, rolling out first on web and mobile in the coming weeks. A senior employee told the Financial Times that "chat is dead," signaling a pivot from Q&A toward autonomous multi-step agents. The redesign reads as reverse-engineered from IPO economics: ~2M business customers already contribute ~40% of revenue (projected ~50% by year-end), and Codex weekly active users grew ~6x to 5M+. OpenAI is becoming an enterprise software company that happens to own the world's largest consumer AI app.
Why it matters: If chat really is dead, OpenAI is repositioning as enterprise agent software right before an IPO, where recurring business revenue is worth far more than consumer chat traffic.
Google upgraded NotebookLM to run on Gemini 3.5 and Antigravity, giving each notebook a secure cloud computer that can write and run code for agentic, multi-step research. Users can start from loose ideas and let NotebookLM use Google Search to discover and add high-quality sources instead of uploading documents first, then generate outputs in PDF, DOCX, Markdown, XLSX, PPTX, charts, images, and CSV/JSON. It rolls out globally on web starting June 8 to AI Ultra subscribers and Workspace business customers. The upgrade repositions NotebookLM from a passive document reader into a unified research workspace — though critics caution that web-discovery results remain hit or miss.
Why it matters: Google is pushing agentic research, code execution and live source discovery, straight into everyday Workspace workflows, raising the bar for what a research tool is expected to do.
SpaceX unveiled AI1, its first-generation AI satellite — a 150kW peak (120kW average) compute payload with a 70-meter deployed wingspan, roughly the compute of one GB300-class server rack in orbit. A May 20 S-1 recast SpaceX as a vertically integrated AI infrastructure company trading under ticker SPCX, targeting orbital compute as early as 2028; it already rents terrestrial GPU capacity at scale (~$1.25B/month from Anthropic, ~$920M/month from Google). Musk announced TeraFab, a ~$20B fab for space-hardened D3 chips, and filed with the FCC for up to one million orbital data-center satellites. Skeptics question why a flop should be worth more 250 miles up; Musk dismisses the heat-rejection worry as "a bizarre debate about radiators in space."
Why it matters: SpaceX is staking a record-setting IPO on owning the entire AI stack end to end, from chip fab to orbit, a thesis skeptics argue the physics and economics do not yet support.
Slow Drip
Blog reads worth savoring
Shows that input (not output) tokens dominate agent costs and that ontology-based knowledge graphs cut token usage ~90% across 31 repos, a concrete lever for slashing agent bills.
Walks through the exact Skill file structure, naming rules, and reliability patterns so you can ship a working, distributable Claude Skill end-to-end.
Demonstrates that emergent multi-agent behavior is fragile across model populations and that reliable outcomes require authoring deterministic events at post-decision seams, a hard-won lesson for anyone designing agent simulations.
Distills the Claude text-editor tool pattern (view / str_replace / insert) into a reusable design you can copy for any agentic text-editing feature.
The Grind
Research papers, decoded
Memory Caching lets recurrent models grow their effective memory with sequence length by caching checkpoints of hidden states at regular intervals, interpolating between the O(L) cost of RNNs and the O(L^2) cost of Transformers. Bolting MC onto an RNN like Titans yields perfect needle-in-haystack retrieval at 4K/8K context and closes much of the recall gap to Transformers while staying far cheaper. If your recurrent backbone loses on recall-heavy long context, MC is a drop-in module that buys Transformer-like recall at near-linear cost without retraining.
A single Mixture-of-Transformers model jointly handling language, image, video, audio, and action, subsuming VLMs, video generators, world simulators, and robot policies into one backbone via a dual-tower Reasoner/Generator design with 3D Multimodal RoPE. Post-trained variants ranked #1 open-weight Text-to-Image and Image-to-Video on Artificial Analysis, topped Physics-IQ, and set RoboArena manipulation records. Weights, checkpoints, datasets, and the eval ship under an OpenMDW license, so robotics teams get a genuinely open SOTA foundation to fine-tune.
A living benchmark built with 250+ industry experts to test agents on long-horizon, economically valuable real-world tasks — 1,490 task instances across 55 professional subfields grounded in the U.S. O*NET/SOC taxonomy. The hardest tier sits at just 2.6% average pass rate, the strongest config (GPT-5.5) hits only 26.2% overall, and 47% of failures are wrong strategy, 31% understanding errors. Backbone model choice swings results ~18 points versus only 5-6 for the harness, so invest in reasoning capacity over tooling.
The Mill
Builder tools ground for action
The Frontend Stack for Agents & Generative UI. React, Angular, Mobile, Slack, and more. Makers of the AG-UI Protocol
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Agentic AI Infrastructure for magnifying HUMAN capabilities.
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
Generate any application by Vibe Coding it DeepSite is a Vibe Coding Platform designed to make coding smarter and more efficient. Tailored for developers, data scientists, and AI engineers, it integrates generative AI into your coding projects to enhance creativity and productivity. DeepSite v4 is a Hugging Face Space tagged with docker, region:us. It has 16617 likes on Hugging Face.
The Counter
Voices from the AI bar today
Technical deep dive on prompt injection as a structural, unfixable vulnerability — the "lethal trifecta" (private-data access + untrusted input + outbound channel) exploited across OpenAI, Anthropic, Microsoft, Google.
Frames harness engineering — the tools, memory, permissions, and feedback loops around a model — as the discipline that determines agent reliability beyond raw model quality.
Hands-on review of a platform giving AI agents persistent Linux environments, databases, and public deployment so they can build and host full apps autonomously.
Google just shipped a free dictation app for Mac and iPhone called AI Edge Eloquent... the model running it is Gemma 4 12B, entirely on your device.
Intel's pre-market gains have expanded to 10%... Google placed an order with Intel for over 3 million TPU chips.
A maker shows off a full League-of-Legends-style game built in under a day with Opus 4.8 — a viral showcase of coding-agent throughput.
The Gemma 4 12B release lands on Hugging Face; the local-LLM crowd dissects weights, licensing, and on-device performance.
Roast Calendar
Your AI week, day by day
Last Sip
Parting thoughts
The labs are filing for the public markets the same week the independent benchmarks say deployed agents clear 2.6% of real economically valuable work. The gap between the pitch deck and the pass rate is the whole story right now — and the builders quietly funding verification, not capability, may be reading the room more clearly than the bankers.