Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
Bold Shots
Today's biggest AI stories, no chaser
Anthropic unveiled Claude Mythos Preview, a general-purpose LLM whose emergent cyber chops turned up thousands of zero-days across every major OS and browser, with 99%+ still unpatched. Instead of a public release, Anthropic launched Project Glasswing — restricted access for ~40 partners (Apple, Microsoft, CrowdStrike, Palo Alto, etc.) plus $100M in credits. Treasury Secretary Bessent and Fed Chair Powell promptly briefed Wall Street; UK AISI clocked 73% on expert CTFs and the first full 32-step corporate attack sim by an AI.
Why it matters: This is the first time a frontier lab said 'nah, we're not shipping this' on safety grounds — and the government immediately treated the model as a national-security asset. The 'AI finds zero-days faster than humans can patch' problem just became policy, not theory.
OpenAI shipped a major Agents SDK update yesterday: native sandbox execution across Blaxel, Cloudflare, Daytona, Docker, E2B, Modal, Runloop, Unix-local, and Vercel, plus a model-native harness built for agents that run for hours, days, or weeks with snapshotting and session rehydration. Architecture cleanly splits harness (control) from compute (execution). Python-only at launch, but the SDK already sits at ~19K GitHub stars and 10.3M monthly downloads.
Why it matters: This isn't another framework — it's OpenAI staking out a Kubernetes-style control plane for agents. If you're building, you basically get to pick your sandbox vendor without rewriting your orchestration. If you're a sandbox vendor, you just got platformed.
Yes, really. The wool-sneaker company sold its footwear/brand assets to American Exchange Group for $39M, secured a $50M convertible facility, and rebranded as NewBird AI — a GPU-as-a-Service outfit. Shares spiked 700%+ from under $3 to over $17. For context: its post-IPO market cap went from $4B+ in 2021 to ~$21M before this pivot.
Why it matters: The real asset being sold here is the Nasdaq listing itself. Sheel Mohnot already called it: 'Long Island Iced Tea company.' When commentators are openly calling something 'the dumbest AI investment mistake you could possibly make' and the stock still 7x's, you're looking at the purest market signal that the AI narrative has fully decoupled from capability.
Google shipped a 100% native Swift Gemini app for macOS — its first real desktop AI assistant. System-wide Option+Space hotkey, a 'Desktop Intelligence' mode that reads your on-screen docs/code/data with opt-in per-app permissions, plus Nano Banana image gen and Veo video gen baked in. Free for users 13+ on macOS 15+. The team (internally called 'Antigravity') shipped 100+ features in under 100 days.
Why it matters: All three frontier assistants — ChatGPT, Claude, Gemini — now run natively on your Mac. Combined with Apple confirming its next-gen Foundation Models will be Gemini-based, the desktop is officially the new battleground. The browser tab era of AI is ending.
Stanford's 2026 AI Index dropped: $581.7B in global corporate AI investment (up 130% YoY), with the US outspending China 23-to-1. IMF projects AI adds a best-case 0.8 ppt to annual GDP growth. PwC says 75% of measurable gains flow to the top 20% of companies. Meanwhile, the US is shedding ~16,000 net jobs/month to AI displacement, and young developer employment is down ~20% since 2024.
Why it matters: The gap between what's being spent and what's measurable is now the central macro debate. When a Reddit thread titled 'AI Added Basically Zero to US Economic Growth Last Year' gets 19,692 upvotes on the same day Anthropic's valuation rockets past $800B, something's going to give.
The Blend
Connecting the dots across sources
The agent execution layer just became a platform war — on a single day
- OpenAI Agents SDK v2 adds 9 sandbox providers (Blaxel, Cloudflare, Daytona, Docker, E2B, Modal, Runloop, Unix-local, Vercel) plus long-running harness; SDK has 10.3M monthly downloads.
- Cloudflare shipped Project Think + Browser Run on the exact same day (April 15).
- Product Hunt's top launches are uniformly agent-execution plays — Figma for Agents (518), CatDoes v4 (397), Ovren (323); Clawhub's top skill is self-improving-agent (6,176 installs).
- alphaXiv 'Externalization in LLM Agents: A Unified Review' provides the academic formalization of the exact pattern all five vendors shipped.
Mythos made cybersecurity-AI a mainstream policy problem in under a week
- Anthropic's red-team post (thousands of zero-days, 99%+ unpatched) + Project Glasswing ($100M credits, ~40 partners).
- Fireship YouTube 'Claude Mythos is too dangerous for public consumption' hit 1M+ views; Reddit r/ClaudeAI became saturated.
- Simon Willison published 'Cybersecurity Looks Like Proof of Work Now' framing the dynamic for developers.
- UK AISI formal evaluation (73% CTF, first 32-step attack sim) + Gambit Security offense-side report (2,518 X votes) provide the institutional/research counterparts.
The skeptic track is getting quantitative — and it rhymes
- 'The AI Layoff Trap' paper (Falk/Tsoukalas, 14,624 X votes): competitive over-automation Nash equilibrium — firms mathematically overshoot socially optimal layoffs.
- Stanford AI Index: $581.7B corporate AI spend vs. IMF best-case 0.8 ppt GDP growth; 75% of gains flow to top 20% of firms.
- Luma event literally titled 'Revenue Over Hype' packing out in SF the same week.
- Reddit r/Futurology 'AI Added Basically Zero to US Economic Growth Last Year' at 19,692 upvotes.
Slow Drip
Blog reads worth savoring
a16z's strategic map of physical AI — robot learning, autonomous science, and new interfaces. The day's most-resonant strategic read (161 engagements) for a reason.
Notion's cofounder and head of AI walk through what it actually took to ship knowledge-work agents after five full rebuilds. The maturity retrospective the field needed.
A painfully relatable tour of how naive JSON parsing collapses once LLM output hits real production schemas. If you've ever cursed at a malformed tool call, this is for you.
Hands-on walkthrough combining agentic orchestration with hybrid search — a combo quickly becoming the production RAG default.
Cloudflare unveils its 'batteries-included' Agents SDK. Read it next to OpenAI's announcement and the competitive picture snaps into focus.
The definitive round-up of silicon advances that will underpin next-gen AI systems — co-packaged optics, HBM4, and more.
IBM dissects how real agents actually fail at reasoning and tool use. Required reading if you're debugging pipelines, not hyping them.
The Grind
Research papers, decoded
A Penn/BU paper showing that when firms compete, each one captures the full cost savings from automating jobs but only bears 1/N of the resulting demand destruction — producing a Nash equilibrium where firms collectively over-automate even when it hurts their own profits. Only a Pigouvian automation tax fully corrects the externality; UBI and capital taxes don't. Reframes AI displacement from 'retrain workers' to 'prevent competitive overshoot.'
A drop-in technique that lets an already-trained LLM keep updating a slice of its own weights during inference, so the model effectively 'learns' from the current context without any architecture changes or retraining. Boosts Qwen3-4B from 74.8% to 77.0% on RULER at 128k context with negligible throughput overhead — and the gap widens at 256k. A practical path to longer effective context on existing billion-parameter models.
An integrated humanoid-robot stack combining an RL-trained lower-body controller, VR teleoperation, and a 'Humanoid Transformer with Touch Dreaming' (HTD) that learns to predict future tactile latents — treating touch as something to anticipate, not just observe. Across five real-world contact-rich tasks, HTD delivered a 30 percentage-point absolute gain in success rate (~91% relative improvement). Argues self-supervised tactile prediction is the lever for dexterous contact-rich skills.
On Tap
What's trending in the builder community
Figma's new use_figma MCP tool lets AI agents consume your design system so agent-generated UIs stay on-brand.
A no-code builder whose agent 'Compose' runs in a cloud VM so you can close the tab and it keeps coding.
Generates real internal tools (with DB + business logic) from plain English — built to actually work in production.
Frontend and backend AI engineers that close scoped backlog tickets inside your real codebase.
Plugin for Claude Code/Cursor/Windsurf/Copilot that claims to cut ~75% of Claude's output tokens. The name alone earns the click.
Latent Space deep-dive into Notion's Custom Agents covering eval design, agent composition, pricing, and software engineering in an AI-first world.
Lume's investigative report based on leaked police files showing how U.S. VC and tech (IJOP algorithm) enabled AI-powered surveillance.
Dwarkesh Patel interviews Jensen on supply-chain bottlenecks, TPU competition, chip architecture, and geopolitics.
Nate B Jones argues the real bottleneck for agents isn't tooling — it's writing high-quality specs and compressing tacit knowledge.
Barely Human Labs presents evidence that AI-generated code is actually raising review burden and cognitive load on dev teams.
Elon Musk announces Tesla's AI5 tape-out — 5x the useful compute of a dual-SoC AI4. 90K likes, 13M views.
Josh Woodward launches the native Gemini macOS app — the team internally called 'Antigravity.'
Dwarkesh Patel teases his Jensen Huang interview covering TPU competition, supply-chain bottlenecks, and the Nvidia moat.
Vercel Labs meta-skill for discovering and installing skills from the open agent-skills ecosystem — the gateway skill.
Clawhub skill that captures learnings, errors, and corrections so agents compound capability across runs.
Roast Calendar
Upcoming events & gatherings
Last Sip
Parting thoughts & a teaser for tomorrow
Here's what I keep coming back to: Mythos is too powerful to ship publicly, the agent execution layer just commoditized across five vendors in a day, and a sneaker company is now a GPU reseller. Meanwhile academics are quietly proving AI layoffs will overshoot what's even good for the layoff-ers. The contradictions aren't bugs — they are the story right now.
Tomorrow we're watching whether Glasswing partners start leaking details about what Mythos actually found, whether Cloudflare's Project Think gets real traction against OpenAI's sandbox lock-in, and whether NewBird AI is still a stock by end of week. Catch you then.