Jun 9, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Distilled trend
  • OpenAI and Anthropic are racing to file IPOs at trillion-dollar marks the same week a 250-expert benchmark showed agents fully pass just 2.6% of real economic tasks.
  • Compute is fanning out at once: SpaceX's orbital AI1 satellite, NVIDIA's South Korea buildout, and Google's 3M-TPU order to Intel loosen the single-datacenter chokehold.
  • Verification, not raw capability, is the agent bottleneck builders are funding now: developer threads about unverifiable agent code and SF's run of agent-security and formal-methods meetups point the same direction.

Bold Shots

Today's biggest AI stories, no chaser

OpenAI submitted a confidential draft S-1 to the SEC on Monday, June 8 — its first formal step toward a public offering — landing one week after Anthropic's own confidential filing at a roughly $965B valuation, higher than OpenAI's last private mark of about $852B. A draft S-1 forces OpenAI to disclose financials it raised on narrative alone, and under FASB Topic 730 frontier-training costs must be expensed as incurred, reframing the spend from "building infrastructure" to "burning cash now." The numbers: ~$9B net loss on $13.1B revenue (2025), a projected ~$14B loss in 2026, no profitability forecast before ~2030, and a reported ~$207B funding gap by 2030. Filing second and lower than Anthropic inverts the pecking order and turns a research rivalry into a Wall Street footrace.

Why it matters: Public paperwork forces the first real look at OpenAI's economics, and it drops OpenAI into a direct, lower-valuation footrace with Anthropic for the same investor dollars.

During Jensen Huang's Seoul visit, NVIDIA announced AI-infrastructure and physical-AI partnerships spanning NAVER, SK Group, LG, Doosan, and Hyundai, plus a national GPU-operator selection. NAVER is building a full-stack AI factory on DSX, expanding GAK Sejong to 55MW toward gigawatt scale; SK signed a multi-year next-gen-memory deal across four NVIDIA platforms; LG builds humanoid robots on Isaac GR00T. The deals form a coordinated national division of labor — memory (SK) to compute/models (NAVER) to robots/power (LG, Doosan, Hyundai) — and because SK hynix and Samsung dominate HBM, it is mutual lock-in: Korea needs the GPUs, NVIDIA needs the HBM. Energy and thermal capacity, not GPU counts, emerge as the real bottleneck.

Why it matters: It turns a whole country into a vertically integrated AI supply chain and makes the NVIDIA-Korea tie mutual lock-in, with power and cooling, not GPU counts, as the binding constraint.

OpenAI is planning its largest ChatGPT overhaul since launch — a unified superapp merging Codex, AI agents, image generation, the Atlas browser, and third-party partner apps, rolling out first on web and mobile in the coming weeks. A senior employee told the Financial Times that "chat is dead," signaling a pivot from Q&A toward autonomous multi-step agents. The redesign reads as reverse-engineered from IPO economics: ~2M business customers already contribute ~40% of revenue (projected ~50% by year-end), and Codex weekly active users grew ~6x to 5M+. OpenAI is becoming an enterprise software company that happens to own the world's largest consumer AI app.

Why it matters: If chat really is dead, OpenAI is repositioning as enterprise agent software right before an IPO, where recurring business revenue is worth far more than consumer chat traffic.

Google upgraded NotebookLM to run on Gemini 3.5 and Antigravity, giving each notebook a secure cloud computer that can write and run code for agentic, multi-step research. Users can start from loose ideas and let NotebookLM use Google Search to discover and add high-quality sources instead of uploading documents first, then generate outputs in PDF, DOCX, Markdown, XLSX, PPTX, charts, images, and CSV/JSON. It rolls out globally on web starting June 8 to AI Ultra subscribers and Workspace business customers. The upgrade repositions NotebookLM from a passive document reader into a unified research workspace — though critics caution that web-discovery results remain hit or miss.

Why it matters: Google is pushing agentic research, code execution and live source discovery, straight into everyday Workspace workflows, raising the bar for what a research tool is expected to do.

SpaceX unveiled AI1, its first-generation AI satellite — a 150kW peak (120kW average) compute payload with a 70-meter deployed wingspan, roughly the compute of one GB300-class server rack in orbit. A May 20 S-1 recast SpaceX as a vertically integrated AI infrastructure company trading under ticker SPCX, targeting orbital compute as early as 2028; it already rents terrestrial GPU capacity at scale (~$1.25B/month from Anthropic, ~$920M/month from Google). Musk announced TeraFab, a ~$20B fab for space-hardened D3 chips, and filed with the FCC for up to one million orbital data-center satellites. Skeptics question why a flop should be worth more 250 miles up; Musk dismisses the heat-rejection worry as "a bizarre debate about radiators in space."

Why it matters: SpaceX is staking a record-setting IPO on owning the entire AI stack end to end, from chip fab to orbit, a thesis skeptics argue the physics and economics do not yet support.

Slow Drip

Blog reads worth savoring

Analysis · Alibaba CloudTokenmaxxing Dilemma: Are There Immediate Solutions for Improvement?

Shows that input (not output) tokens dominate agent costs and that ontology-based knowledge graphs cut token usage ~90% across 31 repos, a concrete lever for slashing agent bills.

Tutorial · KDnuggetsAnthropic's Complete Guide to Claude Skills Building

Walks through the exact Skill file structure, naming rules, and reliability patterns so you can ship a working, distributable Claude Skill end-to-end.

Research · Hugging FaceThe crash that vanished: control and emergence in a five-model economy

Demonstrates that emergent multi-agent behavior is fragile across model populations and that reliable outcomes require authoring deterministic events at post-decision seams, a hard-won lesson for anyone designing agent simulations.

News · simonwillison.netdatasette-agent-edit 0.1a0

Distills the Claude text-editor tool pattern (view / str_replace / insert) into a reusable design you can copy for any agentic text-editing feature.

The Grind

Research papers, decoded

Sequence Modeling / Architecture6,703 upvotes · arxiv · X
Memory Caching: RNNs with Growing Memory

Memory Caching lets recurrent models grow their effective memory with sequence length by caching checkpoints of hidden states at regular intervals, interpolating between the O(L) cost of RNNs and the O(L^2) cost of Transformers. Bolting MC onto an RNN like Titans yields perfect needle-in-haystack retrieval at 4K/8K context and closes much of the recall gap to Transformers while staying far cheaper. If your recurrent backbone loses on recall-heavy long context, MC is a drop-in module that buys Transformer-like recall at near-linear cost without retraining.

World Models / Physical AI163 upvotes · alphaxiv
Cosmos 3: Omnimodal World Models for Physical AI

A single Mixture-of-Transformers model jointly handling language, image, video, audio, and action, subsuming VLMs, video generators, world simulators, and robot policies into one backbone via a dual-tower Reasoner/Generator design with 3D Multimodal RoPE. Post-trained variants ranked #1 open-weight Text-to-Image and Image-to-Video on Artificial Analysis, topped Physics-IQ, and set RoboArena manipulation records. Weights, checkpoints, datasets, and the eval ship under an OpenMDW license, so robotics teams get a genuinely open SOTA foundation to fine-tune.

Agent Evaluation104 upvotes · alphaxiv
Agents' Last Exam (ALE)

A living benchmark built with 250+ industry experts to test agents on long-horizon, economically valuable real-world tasks — 1,490 task instances across 55 professional subfields grounded in the U.S. O*NET/SOC taxonomy. The hardest tier sits at just 2.6% average pass rate, the strongest config (GPT-5.5) hits only 26.2% overall, and 47% of failures are wrong strategy, 31% understanding errors. Backbone model choice swings results ~18 points versus only 5-6 for the harness, so invest in reasoning capacity over tooling.

The Mill

Builder tools ground for action

34K stars

The Frontend Stack for Agents & Generative UI. React, Angular, Mobile, Slack, and more. Makers of the AG-UI Protocol

GitHub
47.9K stars

an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM

GitHub
42.1K stars

We write your reusable computer vision tools. 💜

GitHub
15.3K stars

Agentic AI Infrastructure for magnifying HUMAN capabilities.

GitHub
12.2K stars

Agent Skills for Google products and technologies

GitHub
3.2K stars

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

GitHub
16.6K likesHF

Generate any application by Vibe Coding it DeepSite is a Vibe Coding Platform designed to make coding smarter and more efficient. Tailored for developers, data scientists, and AI engineers, it integrates generative AI into your coding projects to enhance creativity and productivity. DeepSite v4 is a Hugging Face Space tagged with docker, region:us. It has 16617 likes on Hugging Face.

HF Spaces

The Counter

Voices from the AI bar today

19K views

Technical deep dive on prompt injection as a structural, unfixable vulnerability — the "lethal trifecta" (private-data access + untrusted input + outbound channel) exploited across OpenAI, Anthropic, Microsoft, Google.

Addie LaMarr
22K views

Frames harness engineering — the tools, memory, permissions, and feedback loops around a model — as the discipline that determines agent reliability beyond raw model quality.

AI Revolution
17K views

Hands-on review of a platform giving AI agents persistent Linux environments, databases, and public deployment so they can build and host full apps autonomously.

Shark Numbers
22K engagements

Google just shipped a free dictation app for Mac and iPhone called AI Edge Eloquent... the model running it is Gemma 4 12B, entirely on your device.

@adityarao310
10K engagements

Intel's pre-market gains have expanded to 10%... Google placed an order with Intel for over 3 million TPU chips.

@fxtrader
3.2K upvotes · 320 comments

A maker shows off a full League-of-Legends-style game built in under a day with Opus 4.8 — a viral showcase of coding-agent throughput.

r/ClaudeAI
1K upvotes · 323 comments

The Gemma 4 12B release lands on Hugging Face; the local-LLM crowd dissects weights, licensing, and on-device performance.

r/LocalLLaMA

Roast Calendar

Your AI week, day by day

Last Sip

Parting thoughts

The labs are filing for the public markets the same week the independent benchmarks say deployed agents clear 2.6% of real economically valuable work. The gap between the pitch deck and the pass rate is the whole story right now — and the builders quietly funding verification, not capability, may be reading the room more clearly than the bankers.