Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
Bold Shots
Today's biggest AI stories, no chaser
Anthropic's Claude Mythos can autonomously discover and exploit zero-day vulnerabilities. It hit a 72% exploit rate on Firefox (compared to 1% for previous models) and found a 27-year-old OpenBSD bug for under $50 in compute. Nicholas Carlini said it 'found more bugs in weeks than my entire career.' Project Glasswing restricts access to 12 partners with $100M in credits. The Fed and Treasury held an emergency meeting with bank CEOs. Cybersecurity stocks dropped 5-11%.
Why it matters: We crossed a threshold. When an AI can find and weaponize vulnerabilities faster and cheaper than any human team, every assumption about digital security needs re-examination.
Anthropic tripled revenue to $30 billion, dropped $5.5M into open-source tooling, and is designing custom chips. But they quietly softened safety pledges and are locked in a standoff with the Pentagon over access. 60 Minutes ran a 913K-view segment.
Why it matters: Anthropic is trying to be both the safety-first lab and the $30B revenue machine. Those two identities are grinding against each other in public now.
Models went from 8.8% to over 50% on PhD-level tasks in a single year. The US-China capability gap shrank to 2.7%. The US ranks 24th in AI adoption. The expert-public divide is 50 points wide. Transparency scores cratered from 58 to 40.
Why it matters: Models are getting smarter while the institutions around them get more opaque. That 50-point expert-public divide is a governance time bomb.
OpenAI's CRO Dresser wrote an internal memo that leaked. Microsoft 'limited our ability' — direct quotes. Amazon swooped in with a $50B deal. Dresser attacked Anthropic's reported revenue as 'overstated' by $8B. Enterprise share: Anthropic 40%, OpenAI 27%.
Why it matters: The enterprise AI race is tighter than the headlines suggest, and OpenAI's own leadership is admitting their biggest partner is also their biggest constraint.
Google released Gemma 4 under Apache 2.0. The 31B model hit #3 on the Arena leaderboard. Math scores jumped from 20.8% to 89.2%. The MoE variant activates only 4B of 26B parameters, meaning it runs on a Raspberry Pi.
Why it matters: When a top-3 model runs on a $75 computer under a permissive license, the 'AI requires massive infrastructure' narrative takes a real hit.
Sam Altman was targeted with a Molotov cocktail and a shooting at his SF home. The suspect carried a 23-page anti-AI manifesto listing AI CEOs and was active on Pause AI's Discord. Authorities are evaluating domestic terrorism charges.
Why it matters: The backlash against AI is no longer just Twitter arguments. When ideology turns to violence, the entire industry faces a different kind of risk.
Microsoft assembled an 'Ocean 11' team to bring OpenClaw-style agents to Copilot. OpenClaw has 354K GitHub stars. Build 2026 preview expected June 2. Microsoft's own security team warned against self-hosted OpenClaw — then the product team started building it anyway.
Why it matters: Microsoft is betting that the agent runtime layer — not the model layer — is where the real platform lock-in happens.
Zuckerberg built a photorealistic AI avatar of himself trained on his voice and mannerisms. Meta claims 30% engineer output increases from AI tools and is committing $115-135B in capex for 2026.
Why it matters: $135B in capex means Meta believes AI infrastructure is an existential investment, not a line item.
Intel surged 56% over nine trading days on a $25B Terafab deal with Elon Musk and a Google Xeon 6 AI partnership — the best stock streak since its 1971 IPO.
Why it matters: Intel's fab capacity could break TSMC's near-monopoly on cutting-edge manufacturing if Terafab delivers.
The Blend
Connecting the dots across sources
Capability is accelerating while trust infrastructure crumbles
- Mythos 72% exploit rate vs 1% prior, triggering Fed/Treasury emergency meeting with bank CEOs (clusters, YouTube: Fireship 906K views)
- Stanford AI Index transparency score cratered from 58 to 40 while models hit PhD-level (clusters, X: Horvitz 320K views)
- Anthropic softened safety pledges while tripling revenue to $30B (clusters, 60 Minutes 913K views)
- Expert-public opinion divide hit 50 points on AI job impact (Stanford AI Index)
The agent runtime layer is the new battleground
- hermes-agent pulling 76K stars and 11K/day growth on GitHub
- Anthropic open-sourced agent framework and invested $5.5M in OSS (news clusters)
- Cloudflare Sandboxes GA gives agents persistent shell and filesystem (blog)
- OpenAI $50B Amazon deal for enterprise distribution lock-in (leaked CRO memo)
Massive capital concentration meets genuine open-source democratization
- Intel Terafab $25B + Musk backing, Meta $115-135B capex commitment (financial news)
- NVIDIA $20B Groq acquisition reshaping inference economics (Towards AI blog)
- Counter-signal: Gemma 4 Apache 2.0 runs on Raspberry Pi, hits #3 Arena (Google, 800K+ YouTube views)
Slow Drip
Blog reads worth savoring
The chip war just got redrawn: NVIDIA bought its most interesting rival instead of crushing it.
85% accuracy per step sounds great until you realize it compounds to 20% success across 10 steps.
A rare look at what it actually takes to ship LLM-powered ranking to over a billion people.
Cloudflare just gave AI agents a persistent shell, filesystem, and background processes.
400 human-agent web sessions studied to answer the question no one else is asking.
No GPU, no excuses: a real test of whether Google's latest open model holds its own on commodity hardware.
The Grind
Research papers, decoded
Models adapt on the fly by using existing MLP weights as fast memory. 2-3% gains on 256K context windows. Seems incremental until every model ships it next year.
What if the AI model IS the computer? 98.7% cursor accuracy for GUI control. One step closer to agents that can actually use your desktop.
Multi-agent framework that writes research papers. 14-38% quality improvement, 45-48 citations vs 9-14 from baselines.
First study of LLMs serving multiple users. Models leak info between users and flip priorities under conflict. Found out before shipping everywhere — barely.
LLM supply chain attacks measured and documented. If you're building agents, read this yesterday.
On Tap
What's trending in the builder community
Open-source agent framework pulling 11K stars per day. 76K total and climbing fast.
Claude Code plugin for capturing and compressing session context. 52K stars.
CLAUDE.md file to improve Claude Code behavior. 23K stars, 5.8K/day.
Python tool for converting files to Markdown. 106K stars.
An AI Hedge Fund Team. 52K stars.
Gemini transforms complex topics into custom 3D visualizations you can manipulate.
Your personal CFO in the terminal. Open source, runs locally.
36-minute deep technical breakdown of why Anthropic jailed its most capable model.
Discover and install skills from the open agent skills ecosystem. 1M installs.
Captures learnings and corrections for continuous agent improvement. 383K downloads on ClawHub.
Roast Calendar
Upcoming events & gatherings
Last Sip
Parting thoughts & a teaser for tomorrow
Here's the thing that keeps rattling around my head today: a model found a 27-year-old bug in OpenBSD for less than fifty bucks. Twenty-seven years. Thousands of security researchers looked at that code. And an AI found it for the cost of a nice lunch.
We're not in the "AI is coming" era anymore. We're in the "AI is here and the institutions haven't caught up" era. The Stanford numbers prove it — transparency is dropping while capability is spiking. That gap is where all the interesting (and terrifying) things happen next.
Tomorrow: we're tracking the fallout from the Fed's emergency meeting, early benchmarks from Gemma 4 in production, and whether those prediction markets on Anthropic's June model are moving. Stay caffeinated.