Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
- Anthropic's recursive-self-improvement alarm landed the same week it filed confidentially for a $965B IPO, and analysts from Constellation to r/singularity read the pause call as moat-building.
- The same lab claiming engineers ship 8x more code is pushing Mueller, LeCun, and METR to publicly question whether 80%-of-merged-code measures productivity or just lines that still need human review.
- SpaceX's $1.78T IPO valuation and Cloudflare's new AI-spend caps land alongside a Reddit thread arguing 95% of 2026's $2.5T AI capex produced zero P&L, sharpening the bubble-vs-utility split.
Bold Shots
Today's biggest AI stories, no chaser
Marina Favaro and Jack Clark published "When AI builds itself" on June 4, arguing for a coordinated mechanism to slow or pause frontier AI before recursive self-improvement erodes human oversight. The receipts are internal: Claude now authors >80% of code merged into Anthropic's repos (up from low single digits at Claude Code's February 2025 launch), engineers ship ~8x more code per quarter than in 2024, and the unreleased Mythos Preview model hit a 52x speedup on a CPU-only LM training-optimization benchmark where experienced humans top out near 3x in 4-8 hours. Claude Code is also generating roughly 2.6M public GitHub commits per week — about 4.5% of all public commits.
Why it matters: This is the first frontier lab publicly asking governments for an enforceable pause, framed as putting your foot on the brake before oversight is lost. The 80%/52x numbers move the safety debate from hypothetical to a live policy question this year.
Critical context on the new Anthropic blog: 1) AGI is *harder* than RSI (as used below). AGI: machine can do anything human can do, autonomously [not achieved]. RSI (as used below): AI is a useful coding tool that humans can leverage [achieved].
BREAKING: Anthropic has urged for a global pause in AI development as artificial-intelligence models are nearing capability to improve without human intervention, per WSJ.
A SpaceX SEC Form FWP filed June 5 disclosed that Google will pay $920M/month from October 2026 through June 2029 — up to $30B total — for access to ~110,000 Nvidia GPUs plus associated CPUs, memory, and infrastructure. The capacity lives at Colossus 1 (Memphis) and Colossus 2 (Mississippi), the xAI-built data centers folded into SpaceX after the February 2026 all-stock merger. The filing landed one week before SpaceX's planned Nasdaq debut as SPCX at $135/share, and follows Anthropic's parallel $1.25B/month, $40B+ Colossus 1 deal signed in May.
Why it matters: This makes SpaceX the fourth hyperscaler effectively overnight and validates its IPO prospectus. The bigger tell is that Google — the world's largest cloud — is so demand-constrained on Gemini Enterprise that it's renting from a competitor partly owned by Elon Musk. Agentic inference is the workload bending capex at every hyperscaler.
Two AI headlines today, one theme: compute and capital are everything now. 1) Google is renting compute from Elon. It'll pay SpaceX $920M/month (Oct 2026-Jun 2029) for ~110,000 NVIDIA GPUs, from the xAI Colossus sites.
SpaceX has just announced that they have entered into a $920 million per month agreement with Google to provide compute capacity.
SpaceX is pricing its IPO at $135/share for a $1.75-1.78 trillion valuation, raising roughly $75B via 555.6M shares — Nasdaq pricing June 11, debut June 12. Lead-left Goldman Sachs projects SpaceX's AI division revenue to grow 100x, from $3.2B in 2025 to $322B in 2030, with 2026 alone jumping 388% to $15.6B. Morgan Stanley models $190B AI revenue by 2030 and $3.4T total revenue by 2040. Morningstar pegs fair value at about $780B — less than half the IPO target. S&P Dow Jones also denied SpaceX fast-track S&P 500 entry.
Why it matters: Cleanest public-market test of the AI infrastructure thesis. If Goldman's 100x forecast is the price, every other AI infra valuation is being recalibrated against it. The 2x-plus Morningstar-vs-Goldman gap is the bear-bull spread for the whole sector.
GOLDMAN SEES SPACEX AI REVENUE EXPLODING TO $322B BY 2030. Goldman Sachs projects SpaceX AI revenue rising from $3.2B in 2025 to $322B by 2030.
$SPCX - SPACEX IPO SETS $75B RECORD LISTING TERMS. SpaceX set IPO terms at $135 per share, raising about $75 billion through 555.6 million shares.
Google DeepMind launched Gemma 4 on April 2, 2026 under Apache 2.0 — E2B, E4B, 12B, 26B MoE (4B active), and 31B dense variants with a 256K-token context window. The 12B is encoder-free multimodal: vision and audio flow directly into the LLM backbone via a 35M vision embedder and direct audio wave projection, replacing the prior 550M vision plus 300M audio encoder stack for roughly a 24x reduction in non-LLM multimodal weight. QAT mobile checkpoints shrink E2B to 1GB, and LiteRT-LM with Multi-Token Prediction delivers 1.6x-2.2x on-device speedups.
Why it matters: Gemma 4 says the contested terrain in open models is the agentic-edge stack, not raw frontier scores. Encoder-free architecture, a 1GB QAT mobile build, and an Apache 2.0 license together make on-device commercial agent workflows finally viable — direct pressure on Meta's Llama and Mistral.
GOOGLE JUST GAVE AI AGENTS A FREE LOCAL BRAIN — Gemma 4 can run locally through Ollama and plug directly into agent frameworks like Hermes and OpenClaw. Developers get private, offline AI for coding.
LiteRT-LM support for Flutter is coming soon to the flutter_gemma package. This will enable you to run powerful on-device AI models like Gemma 4 across devices.
In Toronto on June 4, PM Mark Carney launched "AI for All" — three principles (trust, opportunity, sovereignty), six pillars, and more than CA$2.3B in commitments. The CA$1B sovereign-AI core includes a CA$500M Canadian Tech Growth Fund that takes equity stakes in Canadian AI firms, plus CA$700M to expand the Compute Access Fund toward 850 MW of compute by 2030. Adoption targets: lift business AI use from 12% to 60% by 2034, reach 1M post-secondary students with AI literacy, give every post-secondary student an AI agent, and create up to 250,000 adoption-driven jobs plus 90,000 youth placements by 2031. The strategy explicitly aims to cut dependence on US clouds, which currently hold 85% of Canada's public cloud market via Amazon, Microsoft, and Google.
Why it matters: First G7 country to package an equity-stake-taking sovereign AI fund, an explicit US-hyperscaler-reduction target, and a population-scale adoption mandate into one strategy. It's a real-world test of whether middle powers can build sovereign AI capacity without nationalizing the cloud.
Slow Drip
Blog reads worth savoring
VendingBench authors explain why dollar-based, long-horizon evals expose emergent failures — price cartels, FBI escalations, existential loops — that abstract benchmarks miss.
Honest head-to-head benchmark of LitParse vs Docling on a 340-page textbook with real numbers (0.31s vs 2m55s) so you can pick the right parser tradeoff for your RAG stack.
Practical walkthrough of three post-hoc calibration methods, including why RLHF breaks global temperature scaling and when to reach for isotonic regression instead.
Attackers hijacked Instagram accounts — including the dormant Obama White House one — by simply asking Meta's AI support agent to swap emails. A red-teaming failure, not a model failure.
The Grind
Research papers, decoded
NVIDIA's unified Mixture-of-Transformers fuses language, image, video, audio, and action into one backbone with a dual-pathway design — autoregressive Reasoner for discrete tokens, rectified-flow Generator for continuous outputs — plus a unified action interface spanning AVs and humanoids. SoTA on 48 understanding benchmarks, #1 open-source text-to-image and image-to-video on Artificial Analysis, and 39.7% on RoboLab manipulation. First open-weights model that credibly subsumes vision-language, video gen, world simulation, and robot policy in one stack — stop stitching together four specialist models.
On-Policy Distillation's reverse-KL gradient explodes when the teacher assigns near-zero probability to a student token. TrOPD fixes this with trust-region masking that only applies reverse-KL where the teacher is reliable, forward-KL on outlier tokens to keep the signal alive, and off-policy guidance where the student continues from teacher prefixes. Beats vanilla OPD by 3-6 points across math, code, and STEM, including +6.18 on GPQA Diamond. Near drop-in fix for the optimization collapse most distillation teams hit.
A 4-step distilled version of Qwen-Image-2.0 that matches or beats its own 80-step teacher on text-to-image and instruction-guided editing. The contribution isn't a new loss — it's a recipe study showing that data composition (single-category landscape data generalizes text rendering better than diverse mixes), step-wise multi-teacher guidance, and a 5:5 generation/editing task mix dominate the objective itself. Resulting Qwen-Image-Flash is a practical 20x speedup.
The Mill
Builder tools ground for action
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol
Gemma 4 12B processes text, vision, and audio natively without separate encoders, running on 16GB VRAM. For developers building local agentic applications who need multimodal capability without cloud dependency.
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
Wan2.2 Animate is a Hugging Face Space tagged with gradio, region:us. It has 5118 likes on Hugging Face.
Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more
The Counter
Voices from the AI bar today
The biggest-reach AI video of the cycle: a popular tech commentator amplifies the MIT-Sloan-style 95%-of-GenAI-pilots-show-no-P&L narrative to a mainstream audience.
Two economists argue that even as AI capability explodes, AI's share of GDP may shrink because cheaper cognition drives returns to whatever stays scarce.
Technical walkthrough of Cerebras' WSE-3, framing wafer-scale architecture as the inference-era answer to Nvidia's memory-bandwidth wall.
A context-engineering metaphor — primary source, derivative source, working memory — to make agent context management explicit instead of vibes-based.
Trump administration is reportedly in talks with OpenAI about a possible government stake, per CNBC — early signal that US AI policy may move toward direct equity, not just regulation.
The week's top AI Reddit thread: a one-shot, fully functional multiplayer web game built with Opus 4.8 and subagents.
Author exposes 1.3B Polymarket trades and 2.7M wallets to Claude via MCP and starts surfacing concentration and suspected wash-trading patterns.
Roast Calendar
Your AI week, day by day
Last Sip
Parting thoughts
Two stories sitting next to each other tell you where we actually are. Anthropic is reporting Claude writes 80% of its merged code while asking governments to build a pause button — and Google is signing a $920M-a-month check to rent GPUs from a competitor because its own demand curve outran its own data centers. The infrastructure side is sprinting and the safety side is filing whitepapers. Whether those converge or diverge is the only question that matters this quarter.