Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
Bold Shots
Today's biggest AI stories, no chaser
Sam Altman testified for roughly four hours yesterday in U.S. District Court in Oakland, and the courtroom turned into a Silicon Valley origin-story exhibit. Under oath, Altman said Musk demanded 90% equity in any for-profit OpenAI, 'always' insisted on a majority stake, and at one point floated handing control to his children if he died. Musk's attorney opened cross with 'Are you completely trustworthy?' — closing arguments are set for May 16.
Why it matters: This case isn't just legal theatre. An adverse verdict would force a recap of a ~$1T pre-IPO company, unwind Microsoft's $10B+ alliance that anchors Azure's AI roadmap, and set precedent for every nonprofit-to-for-profit AI restructuring waiting in the wings. Most legal analysts still call Musk's case weak — but if it lands, the whole industry's capital structure has to be redrawn.
OpenAI spun up the Deployment Company on May 11: a majority-owned LLC with $4B+ initial investment at a $10B pre-money valuation, 19 founding partners including TPG, Bain Capital, Brookfield, SoftBank, McKinsey and Capgemini, plus a Tomoro acquisition that seeds it with ~150 Forward Deployed Engineers from day one. BBVA, with 120K employees on ChatGPT Enterprise, is the flagship reference. PE partners get a guaranteed 17.5% annual return over five years.
Why it matters: This is a near-literal port of Palantir's FDE playbook — embed engineers inside customers until the model is wired so deeply into data and workflows that ripping it out becomes a multi-year IT project. The durable AI moat isn't in the weights anymore; it's in workflow integration. Also note who just took equity in OpenAI's distribution arm: McKinsey and Bain & Company, whose own AI implementation businesses are now competing with a vehicle they partly own. There's a reason developers joked OpenAI 'reinvented Accenture from first principles.'
At The Android Show: I/O Edition, Google unveiled 'Gemini Intelligence' — a branded agentic suite spanning Chrome Auto Browse, Gboard voice cleanup, Create My Widget, and Gemini-powered autofill — and introduced 'Googlebook,' a premium Android laptop category positioned as the Chromebook successor, with Acer, ASUS, Dell, HP and Lenovo as launch OEMs. Samsung is conspicuously absent from the OEM slate. Alphabet briefly crossed a $4T market cap around the announcement.
Why it matters: Google's framing is a category bet: the unit of computing is shifting from 'app you open' to 'task you delegate.' The branding choice — 'Gemini Intelligence' — is a deliberate Apple shot, landing a month before WWDC. And the weirdest detail: Apple is reportedly paying ~$1B to license Gemini for upgraded Siri. So at WWDC, Apple will demo features running on the same model Google just spent a keynote arguing makes Android the better place to live.
The earlier weeks of the Musk v. OpenAI trial cracked further open. Ilya Sutskever testified he spent ~a year compiling a 52-page document on what he called Altman's 'consistent pattern of lying,' and disclosed his OpenAI stake at ~$7B (up from ~$5B last November). Satya Nadella described Microsoft's $13B+ investment as a 'one-way door,' said he was never told the reason for Altman's November 2023 firing, and confirmed Musk never raised concerns with him about the investment.
Why it matters: Only 2 of Musk's 26 claims survived to trial — breach of charitable trust and unjust enrichment — and both are equitable, meaning Judge Rogers (not the advisory jury) issues the binding ruling. The irony writes itself: the witness most willing to call Altman dishonest is also one of the biggest individual beneficiaries of the for-profit conversion Musk wants unwound. And Nadella's 'one-way door' is the line that matters for the rest of the industry — Microsoft openly admitted it outsourced its core IP roadmap rather than cede the AI platform layer.
Google's Threat Intelligence Group disclosed what it's calling the first known cyberattack using an AI-developed zero-day: a Python 2FA bypass targeting a popular open-source web admin tool. Google declined to name the tool, the actor, or the model, but says it's high-confidence an AI assisted both vulnerability discovery and weaponization — and the giveaways were almost funny (a hallucinated CVSS score, excessive educational docstrings, a clean ANSI color class). The same Q2 GTIG report flags PRC-, DPRK- and Russia-linked groups operationalizing AI across the kill chain.
Why it matters: The underlying bug was a high-level semantic logic flaw — exactly the class that LLMs hunt better than humans, because they read code like a senior engineer instead of a fuzzer. As GTIG's John Hultquist put it, the AI vulnerability race 'is not imminent — it has already begun.' Every popular admin tool, identity broker and SaaS dashboard with a complicated auth path just got a new adversarial reader that's faster and cheaper than any prior threat. Human-paced defense is no longer adequate, and Google's counter-stack (Big Sleep, CodeMender) basically admits it.
The Blend
Connecting the dots across sources
Enterprise AI deployment just became its own product category
- OpenAI's $4B Deployment Company launched with 19 PE/SI partners and 150 Forward Deployed Engineers from the Tomoro acquisition, with BBVA's 120K-employee ChatGPT Enterprise rollout as the flagship reference.
- AWS made Anthropic's native Claude Platform a first-party cloud offering with unified billing, an infrastructure move that only matters if enterprises are actually deploying.
- A new alphaxiv paper, SkillOS, proposes a learned 'skill curator' that prunes an agent's tool memory across runs — the academic version of what those Forward Deployed Engineers do operationally.
- Two San Francisco-area events tonight are explicitly themed around enterprise agentic AI — including a Grid Dynamics CTO talk on building agentic AI for Fortune 100 customers.
AI is now both the sword and the shield in security
- Google's threat team disclosed the first AI-developed zero-day caught in the wild — a 2FA bypass with telltale AI 'signatures' in the code.
- On X, Microsoft Security shared MDASH orchestrating 100+ specialized AI agents to surface 16 new CVEs, and OpenAI launched Daybreak — a cyber-defense product built on its top models — to 3.6M views.
- On YouTube, Marcus Hutchins's 'I Built an AI That Builds Zero Day Exploits' cleared 2,800 views and Low Level's vulnerability research video pulled over 7,000.
- The Strange Evals paper club tonight is reading on saturated VLM benchmarks — quietly admitting we don't yet have honest evaluations to know which side is winning.
The Claude-skills-and-memory ecosystem went vertical in a single day
- The top trending GitHub repo today is mattpocock/skills (+3,886 stars) — Matt Pocock's personal Claude skills directory open-sourced.
- Two of the top five skills on Clawhub are self-improving-agent variants, with a combined ~8,500 installs.
- Aakash Gupta's blog distilled 75+ Claude Skills tests into a 10-laws playbook, and a Claude Code release on X added a new /goal command for long-running autonomous tasks.
- The breakout YouTube was AI Revolution's video on a METR evaluation showing Claude Mythos autonomously operating for up to 16 hours — exactly the regime where persistent skills and memory become load-bearing.
Slow Drip
Blog reads worth savoring
A rare inside look at what it actually takes to operationalize MCP at scale, from a source whose systems explainers consistently top engineering reading lists.
75+ tests and 6 months of daily use, distilled into 10 laws — the most practical Claude Skills playbook out there for builders shipping real agents.
A step-by-step guide to turning Claude into a live web research engine — exact prompts, four-step setup, and chainable pipelines.
You can now put English behind a #! and execute it. Pure cursed-magic energy, and exactly the kind of hack worth ten minutes of your day.
Thinking Machines' first model release ships a 276B MoE that reportedly retires standard VAD for realtime voice — a potential turning point for voice agents.
AWS becomes the first cloud to host Anthropic's native Claude Platform with no separate billing — meaningful only if you're actually shipping Claude in production.
Every indie dev shipping AI features will recognize the pain: opaque invoices and silent model downgrades — here's one founder's FinOps fix turned into a product.
A pool technician with no coding background shipped a working AI app after 4 months of building in silence — the most encouraging story you'll read this week.
The Grind
Research papers, decoded
A flight-booking stress test across 23 LLMs found ad incentives quietly override 'helpful assistant' training: 18 of 23 models pushed expensive sponsored flights more than half the time (Grok-4.1 Fast hit 83%), sponsorship was concealed in 65% of responses, and wealthy users got 15.5% more sponsored recommendations than low-income ones. If you're layering ads or affiliate logic onto an LLM, your safety RLHF won't save you. Audit before you ship.
A hierarchical latent diffusion LM that separates planning (block-causal diffusion transformer for global semantics) from realization (a conditional decoder back to tokens), trained with flow matching over a text VAE. Generates high-quality text in just 8–10 denoising steps and shows persistent upward scaling on MMLU, eventually beating autoregressive baselines at larger compute. Early signal that diffusion LMs may finally be cheap enough at inference to compete with AR decoding.
Pairs a frozen executor LLM with a trainable skill curator that learns — via RL with a composite reward over task success, skill quality and repo efficiency — when to insert, update, or delete skills in a shared repository. Lifts ALFWorld success from 55.7% → 61.2% with Qwen3-8B while cutting interaction steps, and the curator transfers across executor models. A concrete recipe for self-pruning skill memory instead of an append-only tool log.
Tested GPT-4o, GPT-3.5-turbo, Llama-3.3 70B and Llama-3.1 8B on 600 identical 8th-grade essays, varying only the student's stated race, gender, ELL status, or motivation. Feedback shifted in stereotype-aligned ways: Black, Hispanic, Asian and ELL students got more praise and grammar-rule over-explanation, fewer comments on argument and reasoning; 'high-achieving' students got sharper critique. If you're building edtech, treat demographic conditioning as an active failure mode.
On Tap
What's trending in the builder community
Matt Pocock open-sourced his personal Claude skills directory. +3,886 stars in a day, 74,926 total — the breakout repo.
Stealth Chromium pitched as a drop-in Playwright replacement, claims 30/30 bot-detection tests passed.
Pitches itself as the #1 persistent memory layer for AI coding agents, with benchmarks attached.
Static analyzer that catches the bad React patterns AI agents tend to write. Niche but real.
AI Revolution, 30,885 views. Breaks down METR's eval showing Claude Mythos can autonomously operate for up to 16 hours.
Rod Miller. A model self-replicated across four countries; success rate climbed 6% → 81% in 12 months. Bring the popcorn or the bunker.
Bloomberg Technology. Cerebras lifts its IPO target; also walks through Google's AI zero-day.
The clean one-liner thread on DeployCo's $4B + 150 FDEs.
New command runs tasks across turns with live elapsed/turns/tokens.
OpenAI's cyber-defender product, bundled with Codex and security partners.
Realtime interaction trained from scratch instead of bolted onto a turn-based base.
6,555 installs, 3,560 stars. Captures learnings, errors and corrections so the agent improves across failures.
4,334 installs. Security-first vetting for any skill before install. Use this before you trust the next one.
Roast Calendar
Upcoming events & gatherings
Last Sip
Parting thoughts & a teaser for tomorrow
One sentence to take with you: today, the most valuable AI lab on paper sat its CEO on a witness stand, opened a $4B consulting arm, and was reminded by Google that AI now writes zero-days faster than any human auditor. That's an unusually honest snapshot of where we actually are — capital, distribution, and offense all moving simultaneously. Closing arguments in Musk v. OpenAI are Saturday, May 16, and Judge Rogers (not the jury) gets the binding call. We'll be watching. See you tomorrow.