Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
Bold Shots
Today's biggest AI stories, no chaser
Amazon is dropping another $25B into Anthropic ($5B immediate, up to $20B milestone-linked), stacking on $8B from before for a potential $33B cumulative bet at a $380B valuation. In return, Anthropic pledged more than $100B over the next decade to AWS and locked in up to 5 gigawatts of Trainium compute — roughly five large nuclear plants' worth of capacity. They already run 1M+ Trainium2 chips, with deployment spanning Trainium2 through Trainium4 and options on future generations.
Why it matters: In about eight weeks, AWS has positioned itself as the primary compute home for both OpenAI and Anthropic — the first real break in the Microsoft-anchor-tenant playbook. And the 5 GW commitment means grid capacity and permitting are now on the critical path of AI progress, not just chips.
Google unveiled Ironwood at Cloud Next — the first TPU purpose-built for inference, scaling to 9,216 liquid-cooled chips per pod, 42.5 Exaflops, 192 GB HBM per chip, with 2x perf/watt over Trillium. Then: Google is in talks with Marvell to co-design a new inference-specific TPU and a memory processing unit. Marvell popped ~6% to record highs; Barclays raised its price target from $105 to $150. Anthropic is planning 1 million TPUs.
Why it matters: This isn't a Broadcom divorce — it's Google moving to an automotive-style tiered supplier model. The structural risk to Nvidia is that inference — the fastest-growing part of the compute pie — is migrating to hyperscaler-ASIC co-design. Custom ASIC is projected to grow 45% YoY vs. 16% for GPU shipments in 2026.
NSA is using Claude Mythos Preview to scan for exploitable vulnerabilities. Mythos Preview is real: Anthropic reports working exploits on first attempt in >83% of tested cases, and it's the first model to solve 'The Last Ones,' a 32-step, ~20-hour human CTF. Meanwhile, DoD formally labeled Anthropic a supply-chain risk on February 28, and Anthropic is suing the administration. Access is limited to ~40 orgs via Project Glasswing ($100M in credits), including AWS, Apple, Google, Microsoft, Nvidia, JPMorgan, and the Linux Foundation.
Why it matters: One arm of DoD bans contractors from buying Anthropic while another is operationally deploying its most offensive-capable model. Bank of England is privately briefing UK banks. Goldman, Citi, BofA, Morgan Stanley, and JPMorgan are running internal Mythos trials. This is the messy geopolitical reality of a real capability-delta — not a vibes-based panic.
On April 17, Anthropic launched Claude Design under Anthropic Labs. Prompts, screenshots, and codebases go in; prototypes, slide decks, and marketing one-pagers come out — powered by Claude Opus 4.7 at 2576px resolution (~3x prior). It exports to Canva, PPTX, PDF, HTML, and hands designs to Claude Code as a single-instruction bundle. Market reaction was brutal: Figma -7.28%, Adobe -2.7%, Wix -4.7%, GoDaddy -3%. Anthropic's CPO Mike Krieger resigned from Figma's board three days before launch.
Why it matters: The killer feature isn't Figma feature parity — it's the closed loop into Claude Code that collapses the designer-to-engineer handoff. Anthropic runs its own inference (marginal cost ~ electricity). Figma Make has to pay retail API prices to its own competitor. That's a tough asymmetry.
On April 19, Honor's 'Lightning' humanoid won the Beijing E-Town half-marathon in 50:26 over 21 km — beating Jacob Kiplimo's 57:20 human world record by nearly seven minutes. A remote-controlled Honor robot actually crossed in 48:19, but a 1.2x time-penalty coefficient handed the win to autonomous Lightning. Last year's winning time was 2:40:42 — a ~3.2x YoY compression. Honor, a smartphone maker 12 months into robotics, transferred its phone liquid-cooling IP into the robot's thermal system.
Why it matters: The penalty coefficient is industrial policy — China's Institute of Electronics is explicitly nudging the industry toward autonomous navigation because that's what matters for industrial deployment. The global humanoid market goes from $2.92B in 2025 to a projected $15.26B in 2030, and the supply chain building Lightning is the same one that makes your next phone.
The Blend
Connecting the dots across sources
Mythos raises the ceiling of AI offense — and the Vercel breach shows us the floor
- Anthropic reports Mythos Preview hits >83% first-attempt exploit success on tested CVEs, and Mythos was the first model to solve 'The Last Ones,' a 32-step, ~20-hour human CTF.
- The April 19 Vercel breach was not a Vercel platform flaw; attackers came in through Context.ai, a third-party AI tool with Google Workspace OAuth access.
- ByteByteGo's 'Security Architecture of GitHub Agentic Workflow' explicitly recommends designing as if the agent is already compromised — the same threat model the breach exposed.
- Bloomberg's top trending tweet has Singapore's regulator urging banks to patch security gaps specifically because of Mythos; WIRED framed it as a 'cybersecurity reckoning.'
- The research side reinforces the picture: 'Externalization in LLM Agents' on alphaxiv and DeepMind's 'AI Agent Traps' are both directly about the attack surface the Vercel breach exploited.
Inference is the new chip war, and every surface agrees on the numbers
- Clusters report Anthropic planning 1M TPUs and ~3.5 GW via Broadcom starting 2027; SemiAnalysis's blog independently cites the same numbers and says Ironwood 'nearly completely closes the gap' to Nvidia's flagship.
- X trending 'Google Challenges Nvidia in AI Chip Race' (Bloomberg, 18K engagement) and Reddit WSB Marvell post (1,011 upvotes) both match the $105 → $150 Barclays price target.
- alphaxiv's 'Neural Computers' (187 votes — #1 research item of the day) speaks directly to the learned-runtime paradigm these chips are optimized for.
2026 is the year the agent becomes the unit of software
- Product Hunt's top 5 is essentially all agents: Gemini app for Mac (297), Vantage (280), Verdent 2.0 (235), Perplexity Personal Computer (210), Avina (202).
- Skills.sh's top install is Vercel's `find-skills` at 1.1M installs; Clawhub's #1 is `self-improving-agent` at 401K downloads.
- Research is in lockstep: 'PaperOrchestra' (128 votes), 'In-Place Test-Time Training' (178 votes), 'Neural Computers' (187 votes) — all about agents, persistence, or adaptive runtimes.
- Towards AI's picks this week are 'Human-in-the-Loop for AI Agents' and 'Tool-Augmented RAG Agent with Session Memory' — the exact same narrative from the tutorial side.
Slow Drip
Blog reads worth savoring
A rare deep-dive into designing agent security assuming the agent is already compromised — the exact mental model every team shipping agentic workflows needs right now, especially post-Vercel.
The definitive numbers behind every AI infra budget conversation — cluster TCO, downtime economics, and goodput theory with hard data instead of vibes.
A practical blueprint for approval packets and guardrails so your campaign bot doesn't accidentally email everyone on Earth.
The capstone of a 5-part production RAG series that turns a static pipeline into a stateful multi-turn agent with Llama 3.2 and Ollama.
A crisp post-mortem of the ShinyHunters OAuth attack that sidestepped Vercel entirely via a third-party AI tool. If you OAuth anything into anything, read this today.
Real-time interactive world models are moving from research demo to shipping product faster than most people noticed.
Noetik just won a $50M GSK licensing deal — a rare biotech-as-software win that reframes trial failures as a matching problem for autoregressive transformers.
194 CMU papers at ICLR in one curated tour — the single fastest way to survey the research frontier this month.
The Grind
Research papers, decoded
Instead of separating computation, memory, and I/O like a von Neumann machine, a single neural net learns a unified runtime state end-to-end. Two prototypes: NC_CLIGen renders terminals at 40.77 dB PSNR; NC_GUIWorld hits 98.7% cursor accuracy for GUI control. Banger finding: 110 hours of goal-directed data beats 1,400 hours of random exploration. For practitioners, this is the theoretical backbone of where computer-use agents are headed.
Repurposes existing MLP projection matrices as 'fast weights' that update during inference, giving any Llama/Qwen model long-context adaptation with negligible overhead. +2.7% on RULER at 64k context; lower perplexity from 2k to 32k tokens when trained from scratch. Drop-in long-context upgrade without a from-scratch rewrite.
Multi-agent system that turns raw pre-writing material into submission-ready AI papers, decoupled from the experimental loop. 45-48 citations per paper (vs. 9-14 for baselines), autonomous diagram generation via a 'PaperBanana' module, simulated acceptance rates of 84% at CVPR and 81% at ICLR on a new 200-paper benchmark. The autonomous-research wave gets its most concrete benchmark yet.
Across 13 Olmo 3 checkpoints: (1) data composition, not algorithm, drives collapse — narrow distillation loses 62% semantic diversity at SFT; (2) chain-of-thought is NOT the culprit — suppressing it cuts accuracy by up to 48%; (3) RL-Zero preserves 94% of base-model diversity. If you rely on self-consistency, pass@k, or test-time compute scaling, diversity collapse directly caps your gains.
On Tap
What's trending in the builder community
Native macOS Gemini app with a global shortcut, active-window context sharing, and local file analysis — basically Cmd+Space for Gemini.
Google Research experiment using AI avatars to simulate real team collaboration and produce a personalized Skill Map.
'Your AI Technical Cofounder.' End-to-end agent that plans, codes, and drives product progress with project memory — works even while you're offline.
Turns your machine into an AI orchestrator across local files, native apps, connectors, and the web.
GTM agents that find, enrich, score, and auto-run personalized email/ABM campaigns against your ICP.
Zhang Xiaojun Podcast deep-dive with Axiom founder (fresh $200M Series A) on AI-for-math, Lean formalization, and mathematical intuition.
IndyDevDan benchmarks M5 Max running local LLMs via MLX vs. GGUF — 118 vs. 60 tok/s — and argues local Apple Silicon now undercuts cloud APIs for many agentic workloads.
Nate B Jones dissects three 'world model' architectures for replacing middle management and why they silently fail without a human interpretive layer.
Evolving AI breaks down Q.ANT's photonic NPU at the Leibniz Supercomputing Centre — light-based matrix math, big energy wins, and the OE-conversion hurdles still ahead.
@business — Mythos worry hits Asian regulators within days of the US Pentagon/NSA reporting.
@dair_ai's weekly roundup — automated research is now the headline theme.
@business on the Google-Marvell inference TPU narrative hitting mainstream finance coverage.
@CoinMarketCap — literal persona agents in production at a public crypto exchange.
Vercel's meta-skill for discovering and installing skills from the open agent ecosystem — the fact that this is #1 at 1.1M installs is the whole story.
Performance optimization skill with 70 rules across 8 categories for automated React/Next.js refactoring.
Anthropic's answer to generic AI aesthetics — production-grade frontend interfaces that look like design, not slop.
Clawhub's #1 skill — captures learnings, errors, and corrections across runs for continuous improvement.
Roast Calendar
Upcoming events & gatherings
Last Sip
Parting thoughts & a teaser for tomorrow
If there's one thought to take with you into the rest of your week, it's this: the dominant story today isn't Mythos, or Ironwood, or Claude Design — it's that they all shipped in the same 72 hours, and they're all facets of the same larger shift. The model is no longer the product. The agent is. The chip is optimized for the agent's workload. The design tool is a wrapper that hands a blueprint to a coding agent. The cybersecurity threat is an OAuth'd agent. The half-marathon winner is an autonomous agent.
Tomorrow we'll be watching the Mythos fallout — specifically whether any more financial regulators break cover, and what the next Glasswing waitlist tells us about who's really cleared for frontier capability. Also keeping an eye on Figma's response; silence is a strategy, but not a long one.
Drink water. Pet something. See you tomorrow.