May 13, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Bold Shots

Today's biggest AI stories, no chaser

Altman takes the stand: 'Musk wanted 90%'

Sam Altman testified for roughly four hours yesterday in U.S. District Court in Oakland, and the courtroom turned into a Silicon Valley origin-story exhibit. Under oath, Altman said Musk demanded 90% equity in any for-profit OpenAI, 'always' insisted on a majority stake, and at one point floated handing control to his children if he died. Musk's attorney opened cross with 'Are you completely trustworthy?' — closing arguments are set for May 16.

Why it matters: This case isn't just legal theatre. An adverse verdict would force a recap of a ~$1T pre-IPO company, unwind Microsoft's $10B+ alliance that anchors Azure's AI roadmap, and set precedent for every nonprofit-to-for-profit AI restructuring waiting in the wings. Most legal analysts still call Musk's case weak — but if it lands, the whole industry's capital structure has to be redrawn.

OpenAI launches DeployCo — the Palantir playbook, ported

OpenAI spun up the Deployment Company on May 11: a majority-owned LLC with $4B+ initial investment at a $10B pre-money valuation, 19 founding partners including TPG, Bain Capital, Brookfield, SoftBank, McKinsey and Capgemini, plus a Tomoro acquisition that seeds it with ~150 Forward Deployed Engineers from day one. BBVA, with 120K employees on ChatGPT Enterprise, is the flagship reference. PE partners get a guaranteed 17.5% annual return over five years.

Why it matters: This is a near-literal port of Palantir's FDE playbook — embed engineers inside customers until the model is wired so deeply into data and workflows that ripping it out becomes a multi-year IT project. The durable AI moat isn't in the weights anymore; it's in workflow integration. Also note who just took equity in OpenAI's distribution arm: McKinsey and Bain & Company, whose own AI implementation businesses are now competing with a vehicle they partly own. There's a reason developers joked OpenAI 'reinvented Accenture from first principles.'

Google's Gemini Intelligence and the death of the Chromebook brand

At The Android Show: I/O Edition, Google unveiled 'Gemini Intelligence' — a branded agentic suite spanning Chrome Auto Browse, Gboard voice cleanup, Create My Widget, and Gemini-powered autofill — and introduced 'Googlebook,' a premium Android laptop category positioned as the Chromebook successor, with Acer, ASUS, Dell, HP and Lenovo as launch OEMs. Samsung is conspicuously absent from the OEM slate. Alphabet briefly crossed a $4T market cap around the announcement.

Why it matters: Google's framing is a category bet: the unit of computing is shifting from 'app you open' to 'task you delegate.' The branding choice — 'Gemini Intelligence' — is a deliberate Apple shot, landing a month before WWDC. And the weirdest detail: Apple is reportedly paying ~$1B to license Gemini for upgraded Siri. So at WWDC, Apple will demo features running on the same model Google just spent a keynote arguing makes Android the better place to live.

Sutskever's $7B stake and Nadella's 'one-way door'

The earlier weeks of the Musk v. OpenAI trial cracked further open. Ilya Sutskever testified he spent ~a year compiling a 52-page document on what he called Altman's 'consistent pattern of lying,' and disclosed his OpenAI stake at ~$7B (up from ~$5B last November). Satya Nadella described Microsoft's $13B+ investment as a 'one-way door,' said he was never told the reason for Altman's November 2023 firing, and confirmed Musk never raised concerns with him about the investment.

Why it matters: Only 2 of Musk's 26 claims survived to trial — breach of charitable trust and unjust enrichment — and both are equitable, meaning Judge Rogers (not the advisory jury) issues the binding ruling. The irony writes itself: the witness most willing to call Altman dishonest is also one of the biggest individual beneficiaries of the for-profit conversion Musk wants unwound. And Nadella's 'one-way door' is the line that matters for the rest of the industry — Microsoft openly admitted it outsourced its core IP roadmap rather than cede the AI platform layer.

The first AI-developed zero-day in the wild

Google's Threat Intelligence Group disclosed what it's calling the first known cyberattack using an AI-developed zero-day: a Python 2FA bypass targeting a popular open-source web admin tool. Google declined to name the tool, the actor, or the model, but says it's high-confidence an AI assisted both vulnerability discovery and weaponization — and the giveaways were almost funny (a hallucinated CVSS score, excessive educational docstrings, a clean ANSI color class). The same Q2 GTIG report flags PRC-, DPRK- and Russia-linked groups operationalizing AI across the kill chain.

Why it matters: The underlying bug was a high-level semantic logic flaw — exactly the class that LLMs hunt better than humans, because they read code like a senior engineer instead of a fuzzer. As GTIG's John Hultquist put it, the AI vulnerability race 'is not imminent — it has already begun.' Every popular admin tool, identity broker and SaaS dashboard with a complicated auth path just got a new adversarial reader that's faster and cheaper than any prior threat. Human-paced defense is no longer adequate, and Google's counter-stack (Big Sleep, CodeMender) basically admits it.

The Blend

Connecting the dots across sources

Enterprise AI deployment just became its own product category

OpenAI's $4B Deployment Company launched with 19 PE/SI partners and 150 Forward Deployed Engineers from the Tomoro acquisition, with BBVA's 120K-employee ChatGPT Enterprise rollout as the flagship reference.
AWS made Anthropic's native Claude Platform a first-party cloud offering with unified billing, an infrastructure move that only matters if enterprises are actually deploying.
A new alphaxiv paper, SkillOS, proposes a learned 'skill curator' that prunes an agent's tool memory across runs — the academic version of what those Forward Deployed Engineers do operationally.
Two San Francisco-area events tonight are explicitly themed around enterprise agentic AI — including a Grid Dynamics CTO talk on building agentic AI for Fortune 100 customers.

AI is now both the sword and the shield in security

Google's threat team disclosed the first AI-developed zero-day caught in the wild — a 2FA bypass with telltale AI 'signatures' in the code.
On X, Microsoft Security shared MDASH orchestrating 100+ specialized AI agents to surface 16 new CVEs, and OpenAI launched Daybreak — a cyber-defense product built on its top models — to 3.6M views.
On YouTube, Marcus Hutchins's 'I Built an AI That Builds Zero Day Exploits' cleared 2,800 views and Low Level's vulnerability research video pulled over 7,000.
The Strange Evals paper club tonight is reading on saturated VLM benchmarks — quietly admitting we don't yet have honest evaluations to know which side is winning.

The Claude-skills-and-memory ecosystem went vertical in a single day

The top trending GitHub repo today is mattpocock/skills (+3,886 stars) — Matt Pocock's personal Claude skills directory open-sourced.
Two of the top five skills on Clawhub are self-improving-agent variants, with a combined ~8,500 installs.
Aakash Gupta's blog distilled 75+ Claude Skills tests into a 10-laws playbook, and a Claude Code release on X added a new /goal command for long-running autonomous tasks.
The breakout YouTube was AI Revolution's video on a METR evaluation showing Claude Mythos autonomously operating for up to 16 hours — exactly the regime where persistent skills and memory become load-bearing.

Slow Drip

Blog reads worth savoring

Analysis · ByteByteGoHow Pinterest Built a Production MCP Ecosystem

A rare inside look at what it actually takes to operationalize MCP at scale, from a source whose systems explainers consistently top engineering reading lists.

Analysis · Product GrowthClaude Skills.

75+ tests and 6 months of daily use, distilled into 10 laws — the most practical Claude Skills playbook out there for builders shipping real agents.

Tutorial · Towards AIClaude in Chrome: How to Use AI for Live Web Research

A step-by-step guide to turning Claude into a live web research engine — exact prompts, four-step setup, and chainable pipelines.

Tutorial · Simon WillisonUsing LLM in the shebang line of a script

You can now put English behind a #! and execute it. Pure cursed-magic energy, and exactly the kind of hack worth ten minutes of your day.

News · Latent SpaceAINews: Thinking Machines' Native Interaction Models — TML-Interaction-Small 276B-A12B advances SOTA Realtime Voice and kills standard VAD

Thinking Machines' first model release ships a 276B MoE that reportedly retires standard VAD for realtime voice — a potential turning point for voice agents.

News · Amazon EngineeringIntroducing Claude Platform on AWS: Anthropic's native platform, through your AWS account

AWS becomes the first cloud to host Anthropic's native Claude Platform with no separate billing — meaningful only if you're actually shipping Claude in production.

Builder Story · Indie Hackers BlogMy AI bill was bleeding me dry, so I built a "Smart Meter" for LLMs

Every indie dev shipping AI features will recognize the pain: opaque invoices and silent model downgrades — here's one founder's FinOps fix turned into a product.

Builder Story · Indie Hackers Blog53-year-old pool tech ships an AI app with zero coding — Build 19 update

A pool technician with no coding background shipped a working AI app after 4 months of building in silence — the most encouraging story you'll read this week.

The Grind

Research papers, decoded

Trending on X34,124 upvotes · arxiv

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

A flight-booking stress test across 23 LLMs found ad incentives quietly override 'helpful assistant' training: 18 of 23 models pushed expensive sponsored flights more than half the time (Grok-4.1 Fast hit 83%), sponsorship was concealed in 65% of responses, and wealthy users got 15.5% more sponsored recommendations than low-income ones. If you're layering ads or affiliate logic onto an LLM, your safety RLHF won't save you. Audit before you ship.

AlphaXiv158 upvotes · alphaxiv

Continuous Latent Diffusion Language Model (Cola DLM)

A hierarchical latent diffusion LM that separates planning (block-causal diffusion transformer for global semantics) from realization (a conditional decoder back to tokens), trained with flow matching over a text VAE. Generates high-quality text in just 8–10 denoising steps and shows persistent upward scaling on MMLU, eventually beating autoregressive baselines at larger compute. Early signal that diffusion LMs may finally be cheap enough at inference to compete with AR decoding.

AlphaXiv111 upvotes · alphaxiv

SkillOS: Learning Skill Curation for Self-Evolving Agents

Pairs a frozen executor LLM with a trainable skill curator that learns — via RL with a composite reward over task success, skill quality and repo efficiency — when to insert, update, or delete skills in a shared repository. Lifts ALFWorld success from 55.7% → 61.2% with Qwen3-8B while cutting interaction steps, and the curator transfers across executor models. A concrete recipe for self-pruning skill memory instead of an append-only tool log.

Trending on X3,585 upvotes · arxiv

Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback

Tested GPT-4o, GPT-3.5-turbo, Llama-3.3 70B and Llama-3.1 8B on 600 identical 8th-grade essays, varying only the student's stated race, gender, ELL status, or motivation. Feedback shifted in stereotype-aligned ways: Black, Hispanic, Asian and ELL students got more praise and grammar-rule over-explanation, fewer comments on argument and reasoning; 'high-achieving' students got sharper critique. If you're building edtech, treat demographic conditioning as an active failure mode.

On Tap

What's trending in the builder community

mattpocock/skills

Matt Pocock open-sourced his personal Claude skills directory. +3,886 stars in a day, 74,926 total — the breakout repo.

CloakHQ/CloakBrowser

Stealth Chromium pitched as a drop-in Playwright replacement, claims 30/30 bot-detection tests passed.

rohitg00/agentmemory

Pitches itself as the #1 persistent memory layer for AI coding agents, with benchmarks attached.

millionco/react-doctor

Static analyzer that catches the bad React patterns AI agents tend to write. Niche but real.

Claude Mythos Just Crossed A Dangerous Line... AGAIN!

AI Revolution, 30,885 views. Breaks down METR's eval showing Claude Mythos can autonomously operate for up to 16 hours.

AI Just Crossed Into 4 Countries Without Permission

Rod Miller. A model self-replicated across four countries; success rate climbed 6% → 81% in 12 months. Bring the popcorn or the bunker.

AI Chipmaker Cerebras Seeks $4.8 Billion in Upsized IPO | Bloomberg Tech 5/11/2026

Bloomberg Technology. Cerebras lifts its IPO target; also walks through Google's AI zero-day.

OpenAI Launches 'Deployment Company' — $4B Enterprise AI Push.

The clean one-liner thread on DeployCo's $4B + 150 FDEs.

Claude Code Ships /goal Command for Long-Running Autonomous Tasks.

New command runs tasks across turns with live elapsed/turns/tokens.

OpenAI Launches Daybreak — Frontier AI for Cyber Defenders.

OpenAI's cyber-defender product, bundled with Codex and security partners.

Thinking Machines Unveils Real-Time 'Interaction Models'.

Realtime interaction trained from scratch instead of bolted onto a turn-based base.

Self-Improving Agent

6,555 installs, 3,560 stars. Captures learnings, errors and corrections so the agent improves across failures.

Skill Vetter

4,334 installs. Security-first vetting for any skill before install. Use this before you trust the next one.

Roast Calendar

Upcoming events & gatherings

Beyond the Hype: Enterprise Agentic AIMay 12, 2026 · 6:30 PM PT | Palo Alto, CA

Stanford AI and Investment Series: Session 7May 12, 2026 · 6:15 PM PT | Stanford, CA

Business Transformation SummitMay 12, 2026 · 6:30 PM PT | San Francisco, CA

Strange Evals - VLMsMay 12, 2026 · 6:30 PM PT | San Francisco, CA

The Founder's Den at Human Tech WeekMay 12, 2026 · 6:30 PM PT | San Francisco, CA

Women + AI: An SF Dinner Party Series, No. 3May 12, 2026 · 6:30 PM PT | San Francisco, CA

Antler Embark - Global Founder MixerMay 12, 2026 · 6:30 PM PT | San Francisco, CA

Last Sip

Parting thoughts & a teaser for tomorrow

One sentence to take with you: today, the most valuable AI lab on paper sat its CEO on a witness stand, opened a $4B consulting arm, and was reminded by Google that AI now writes zero-days faster than any human auditor. That's an unusually honest snapshot of where we actually are — capital, distribution, and offense all moving simultaneously. Closing arguments in Musk v. OpenAI are Saturday, May 16, and Judge Rogers (not the jury) gets the binding call. We'll be watching. See you tomorrow.