Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
- Three frontier-AI firms filed to go public in two weeks even as a new 250-expert benchmark shows top agents clearing only 2.6% of real economic tasks.
- Anthropic's Fable 5 quietly reroutes cyber and bio prompts to a weaker model the same week Cloudflare names its Mythos sibling as the frontier attacker it now defends against.
- China's $295B buildout bars Nvidia while Nvidia floods Korea with GPUs and Apple runs Siri on Nvidia B200s inside Google Cloud, splitting the chip giant's map in three.
Bold Shots
Today's biggest AI stories, no chaser
Anthropic publicly released Claude Fable 5 on June 9, its first widely available Mythos-class model and SOTA across SWE, knowledge, science, and vision benchmarks. Fable 5 and partner-only Mythos 5 share the same weights; Mythos 5 has safeguards lifted in some areas and ships only through Project Glasswing with the US government. When Fable 5's classifiers detect cybersecurity, bio/chem, or model-distillation requests, the turn is silently handled by the weaker Claude Opus 4.8, firing in under 5% of sessions. Pricing runs $10/M input and $50/M output (~2x Opus 4.8), and free Pro/Max/Team/Enterprise access ends June 22.
Why it matters: Anthropic shipped the same model twice, gated two ways. The real-time capability-throttling mechanism - silently downgrading you mid-conversation - is novel and contested. Nathan Lambert called it categorically misaligned AI, while Andrej Karpathy called the model a major-version-bump-deserving step change forward.
Introducing Claude Fable 5: a Mythos-class model that we've made safe for general use. Its capabilities exceed those of any model we've ever made generally available.
This is the first review showing what Claude Fable 5 actually does, not what the press release says. One prompt built a playable 3D game while the reviewer walked away for 4 hours.
At WWDC on June 8, Apple unveiled Siri AI, a rebuilt Siri with on-screen awareness, personal-context search, web access, and a standalone app syncing over iCloud. It runs on a custom ~1.2-trillion-parameter model built on Google Gemini, hosted on Google Cloud with Nvidia Blackwell B200 GPUs, with heavy reasoning routed through Apple's Private Cloud Compute. New features land across Photos, Messages, natural-language Shortcuts, and a camera Siri mode. It was Tim Cook's final keynote as CEO. Siri AI is delayed in the EU under the DMA and unavailable in China at launch.
Why it matters: Apple broke its two-decade own-the-whole-stack doctrine, outsourcing Siri's brain to a rival's Gemini model on Google Cloud and Nvidia GPUs. That moves the privacy trust boundary to a third party's confidential-compute enclave.
OpenAI confidentially submitted a draft Form S-1 to the SEC on June 8, pegged to the $852B valuation set in its $122B round backed by Amazon, Nvidia, and SoftBank. It's working with Goldman Sachs and Morgan Stanley, with a listing possibly as soon as fall. That makes it the third frontier-AI S-1 in under two weeks, after Anthropic ($965B) and SpaceX (~$1.8T), an IPO pipeline near $3.6T.
Why it matters: OpenAI is asking public markets to price a company running roughly negative 122% operating margin, with the OpenAI Foundation keeping governance and the power to fire the board if a model is dangerous. Bridgewater's Greg Jensen says the ~35x forward-revenue multiple is priced for a monopoly outcome that does not yet exist.
OpenAI has confidentially filed for an initial public offering, setting it up for what may be the most highly anticipated market debut in recent history and a massive payday for early investors.
BREAKING: The odds of OpenAI's IPO closing above $1.5 trillion in market cap on day one of trading surge to 48% after the company files for an IPO.
SpaceX unveiled AI1, its first-gen orbital AI data center satellite, on June 8, days ahead of its IPO. The reference design packs a 150 kW solar array, a ~110 m² deployable liquid radiator, a ~70 m wingspan, and a 150 kW peak / 120 kW average compute payload running Nvidia GB300 and upcoming Rubin chips. The roadmap scales from 1 GW by end of 2027 toward 100 GW and ultimately a terawatt, with an FCC application for up to a million satellites.
Why it matters: It reframes the AI-compute race as a physics problem in orbit, where heat - not power - is the binding constraint. One researcher notes an orbital data center could still produce an order of magnitude more emissions than one on Earth once launch and reentry are counted.
China is preparing roughly 2 trillion yuan ($295B) over five years for a nationwide AI data center network, with the NDRC drafting a blueprint to link computing hubs into one interconnected national network by 2028. At least 80% of the technology, including AI chips, must come from domestic suppliers like Huawei, effectively shutting out Nvidia and AMD. State telecoms China Mobile and China Telecom would operate most of the data centers as part of China's 2026 Six Networks program.
Why it matters: The consequential clause is the 80% domestic-content rule, engineered to be unmeetable with imported silicon - a forced de-Americanization of China's compute base, backed by certification of nine domestic AI chips for state procurement.
Slow Drip
Blog reads worth savoring
Traces how DeepSeek v4 inference throughput jumped up to 100x in 26 days across NVIDIA/AMD/Huawei stacks, with concrete cost-per-token figures and a real TensorRT-LLM hidden-size bug.
Lays out Unitree's scaling math: a price cut from $50K to $27.3K at 67% margin, a BoM as low as $8,976, viable at $30/hr.
Introduces a new benchmark that scores code agents on quality rather than pass-rate.
Walks through an open-source LLM-as-judge harness that runs multi-turn voice conversations automatically and detects audio-vs-text hallucinations.
Details the concrete layered defense Cloudflare runs on itself to contain AI-accelerated exploit chains.
The Grind
Research papers, decoded
Memory Caching lets an RNN periodically checkpoint its hidden state into a growing cache, then aggregate over cached states at output time - effective memory grows with sequence length while cost stays sub-quadratic. On needle-in-a-haystack retrieval, Titans+GRM hit perfect scores at 4K/8K. A drop-in enhancement for any recurrent backbone; start with the gated-residual variant for retrieval-heavy tasks.
ALE introduces a verifiable benchmark of 1,490 long-horizon tasks built in authentic professional software (SolidWorks, DaVinci Resolve, etc.), spanning 55 subfields across 13 industries, with deterministic grading. The hardest Last-Exam tier averages just a 2.6% pass rate, and the choice of foundation model matters ~3.4x more than the agent harness. Failures cluster in wrong strategy (47%) and missing domain knowledge (31%).
OPRD aligns student and teacher hidden states across selected layers on the same rollouts via a deterministic MSE loss, bypassing the LM head for zero-variance gradients and richer per-layer signal. On AIME 2024/2025 and AIMO it closes the student-teacher gap where output-only baselines plateau, training 1.44x faster with 32-54% less memory.
A 35B-active / 1T-total MoE reasoning model trained exclusively on clean human-generated data - no distillation, no synthetic data. Pre-trained on 30T tokens across 8,192 GB200 GPUs. Reported: 97.0% AIME 2025, 73.5% SWE-bench Verified, 87.7% LiveCodeBench, 49% win rate vs Sonnet 4.6. Evidence that frontier reasoning is reachable without synthetic-data distillation.
The Mill
Builder tools ground for action
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Generate any application by Vibe Coding it DeepSite is a Vibe Coding Platform designed to make coding smarter and more efficient. Tailored for developers, data scientists, and AI engineers, it integrates generative AI into your coding projects to enhance creativity and productivity. DeepSite v4 is a Hugging Face Space tagged with docker, region:us. It has 16617 likes on Hugging Face.
browse.sh — an open catalog of browser automation skills for any website. Find reusable SKILL.md recipes that teach AI agents to complete tasks online, and install them with the browse CLI.
The Counter
Voices from the AI bar today
MLX + Apple silicon run private, distributed agentic AI workflows locally on Mac, with Xcode integration and tool calling.
A full technical walkthrough of orchestrating multi-agent software-engineering systems on Antigravity 2.0.
Anthropic's Claude Code team distills a year of running an agentic coding tool in production.
A builder connected Claude Code to a full Polymarket wallet/trade database over MCP.
A new Apache-2.0 KV-cache quantization that compresses 3-5x while preserving reasoning quality.
Roast Calendar
Your AI week, day by day
Last Sip
Parting thoughts
Three S-1s in two weeks, a model that decides on its own when to make itself dumber, Siri's brain shipped to a competitor's cloud, and a benchmark that says the agents we're pricing at trillions clear 2.6% of real work. The money and the capability charts are drawing very different pictures right now, and it's worth holding both in your head at once. The interesting question isn't whether the buildout is too big - it's which of today's stories looks obvious in hindsight and which looks like the top.