May 15, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Bold Shots

Today's biggest AI stories, no chaser

Cerebras priced its Nasdaq debut at $185 a share — above the $150-$160 marketed range — and sold 30M Class A shares for $5.55B. The ticker CBRS started trading May 14 at a fully diluted valuation of about $56.4B, more than double the $26.6B implied just five weeks earlier. Demand exceeded shares by more than 20x, making it the largest US tech IPO since Snowflake's $3.8B debut in 2020. The whole pitch is anchored on a $20B+ master relationship agreement with OpenAI for 750 MW of inference capacity, scaling toward 2 GW by 2030.

Why it matters: This is the opening salvo of an AI IPO wave (OpenAI and SpaceX are rumored next). If CBRS trades cleanly, the cost of capital drops across AI infra; if it breaks issue, the pipeline thins out fast. The wafer-scale chip story is a real shot at "end of GPU homogeneity" in inference — but P/S north of 110x means the market is paying tomorrow's price today.

Ramp's April 2026 AI Index put Anthropic at 34.4% of paying businesses versus OpenAI's 32.3% — the first crossover ever. Anthropic gained 3.8 points month-over-month while OpenAI lost 2.9, and Anthropic has quadrupled in a year while OpenAI added 0.3 points. OpenAI's chief application officer Fidji Simo told staff the company is in "code red" mode. Claude Code is the wedge — Anthropic's fastest-growing product ever — and a freshly expanded PwC alliance will certify 30,000 professionals on Claude.

Why it matters: This is a year-long slope, not a one-month blip. Claude Code is the Trojan horse into finance (Citadel, BNY, FIS, Mizuho), legal (Freshfields, Quinn Emanuel, Holland & Knight), and consulting (PwC). The enterprise AI competitive map has been quietly re-rated.

The Musk-vs-Altman trial finished its liability phase in Oakland after 11 days. Musk is seeking $134B in damages, the removal of Sam Altman and Greg Brockman, and an unwind of OpenAI's for-profit conversion. The witness list reads like an AI history podcast — Altman, Brockman, Sutskever, Murati, Helen Toner, Shivon Zilis, Joshua Achiam, and Satya Nadella. Sutskever, Murati, and Toner all testified about a "consistent pattern of lying" memo that preceded Altman's brief 2023 ouster. Judge Yvonne Gonzalez Rogers retains discretion to overturn the jury verdict.

Why it matters: If the charitable-trust framing sticks, every mission-driven nonprofit-turned-for-profit suddenly has legal exposure. Nadella's testimony also confirmed Microsoft is rewiring its AI dependence toward multi-vendor independence — Azure has been hosting xAI since 2024. Sworn allegations of dishonesty are an obvious overhang on any near-term OpenAI IPO.

Cisco reported Q3 FY2026 revenue of $15.84B (+12% YoY) and non-GAAP EPS of $1.06. The headline: it doubled its full-year AI infrastructure order forecast from $5B to $9B, with $5.3B already booked YTD and networking product orders up over 50% YoY. The company is cutting fewer than 4,000 jobs (<5% of workforce) with up to $1B in pre-tax restructuring charges, and the stock surged ~14% — its best single-day move in more than 20 years.

Why it matters: Investors rewarded the layoffs because the math told a coherent silicon-thesis story: every freed-up dollar is being redeployed into the bucket whose order book just doubled. Cisco's Silicon One inside Nvidia Spectrum-X is the wedge into hyperscale AI Ethernet long held by Broadcom and Arista. As CEO Chuck Robbins put it: "If you don't have silicon you're going to struggle to be relevant to the hyperscalers."

Notion launched its Developer Platform on May 13: Workers (a hosted runtime on Vercel Sandbox for deploying custom code with zero server provisioning), Database Sync in beta, an External Agents API in alpha that lets Claude Code, Cursor, Codex, and Decagon operate inside Notion as native workspace participants, and a new CLI called ntn. Workers are free during public beta, with credit-based billing kicking in August 11. Customers have already built more than a million agents in the three months since Custom Agents shipped in February.

Why it matters: This reframes the workspace as the AI agent orchestration layer and collapses the Zapier/Pipedream middleware tax into one sandbox deploy. It also lowers switching costs across coding agents while raising the stakes for glue vendors. If Notion executes, it goes head-to-head with Microsoft Power Platform.

Meta launched Incognito Chat for WhatsApp and the standalone Meta AI app on May 13, calling it "the first major AI product where there is no log of conversations stored on servers." Inference runs inside a Trusted Execution Environment built on AMD SEV-SNP confidential VMs and NVIDIA H100s in confidential computing mode, fronted by Oblivious HTTP relays with remote attestation. Conversations vanish when the app closes or the phone locks. Text-only at launch, with voice, image, and a branching "Side Chat" feature signaled for later.

Why it matters: This shifts AI privacy from retention policy to hardware guarantee — a structural threat to OpenAI, Gemini, and Claude's policy-based "temporary chat." It also lands during OpenAI lawsuits where preserved logs are being used as evidence. The cheapest log to defend is the one that never existed.

The Blend

Connecting the dots across sources

The coding agent is now the wedge for every other enterprise AI decision

  • Anthropic's first-ever crossover past OpenAI in paid business share lines up exactly with Claude Code's rise as the company's fastest-growing product, and walks straight into finance, legal, and consulting wins like the PwC deal to certify 30,000 professionals.
  • OpenAI's response on the same day — Codex shipping to the ChatGPT mobile app, Hooks for programmatic customization, and two months free for switchers — reads like a defensive playbook against exactly this dynamic.
  • On GitHub, the day's trending lists are dominated by Claude Code skill repos, and on Product Hunt the second-place launch is literally an observability tool for Claude Code token spend.
  • Princeton's tau-bench research, going viral on X with 34K+ votes, shows even GPT-4o passes only about 61% of single-try real-world tasks and drops below 25% across eight attempts — a sobering counter-narrative to all this coding-agent enthusiasm.

Inference economics, not model quality, are the binding constraint now

  • Cerebras priced 2026's largest tech IPO almost entirely on a single $20B+ OpenAI contract for 750 MW of inference capacity, with investors paying around 110x sales for the wafer-scale bet.
  • Cisco doubled its AI infrastructure order forecast to $9B and saw its biggest single-day stock move in more than 20 years, with the CEO openly saying companies without silicon are about to be irrelevant to hyperscalers.
  • Anthropic's quiet move to meter programmatic Claude usage and 3x its image prompt pricing — the move developers are loudly complaining about — is the same token-economics squeeze expressed from the model-vendor side.
  • A trending Semianalysis deep dive on Cerebras's WSE-3 architecture and a Google Cloud paper on proxy models that cut LLM-powered SQL cost 100x are both reading the same room: speed and cost-per-token are now the products.

AI trust is moving from policy promises to hardware attestation

  • Meta's Incognito Chat on WhatsApp claims zero server-side logs by running inference inside AMD confidential VMs and H100s in confidential computing mode — a hardware guarantee, not a privacy policy.
  • The launch lands directly into the Musk vs. Altman backdrop, where preserved ChatGPT logs are being used as courtroom evidence and Microsoft's CEO testified about needing real agency at every layer of the stack.
  • An Indie Hackers post about red-teaming tool PromptBrake going on-prem is the SMB version of the same story: keep prompts inside customer infrastructure.
  • Princeton's tau-bench shows 25% of agent failures are policy violations — empirical evidence that policy-based trust is no longer enough.

Slow Drip

Blog reads worth savoring

Analysis · Semianalysis SubstackCerebras — Faster Tokens Please

A meticulously sourced deep dive into Cerebras's $24.6B OpenAI deal, WSE-3 architecture tradeoffs, and the economics of speed-optimized inference. Essential if you want to actually understand what investors paid for today.

Analysis · a16z NewsFrom 'System of Record' to 'System of Intelligence'

A sharp a16z thesis on how reasoning layers are eating CRMs and reshaping where enterprise software value accrues. Required framing if you're building or buying agentic tools.

Tutorial · KDnuggets5 Small Language Models for Agentic Tool Calling

A practical shortlist of open-weight SLMs (SmolLM3, Qwen3-4B, Phi-3-mini, Gemma-4, Mistral-7B) that actually do structured tool calls — perfect for engineers shipping agents on edge or budget hardware.

Tutorial · Towards AI (Medium)Building the AI Memory Stack: Layered Storage, Async Extraction and Atomic Persistence

A real production blueprint — three-tier memory, debounce batching, confidence scoring, crash-safe writes — for builders tired of agents that forget everything between sessions.

News · Latent Space[AINews] Codex Rises, Claude Meters Programmatic Usage

The clearest single read on Anthropic's metered-credit 'rug pull,' OpenAI's two-months-free Codex push, and why developer loyalty is suddenly up for grabs.

Research · Google Cloud Blog — AI & MLThe power of LLMs on your data, more than two orders of magnitude faster and cheaper

Google's SIGMOD paper on proxy models cuts LLM-powered SQL cost and latency by 100x and it's already live in BigQuery and AlloyDB — a glimpse at how semantic analytics actually becomes affordable.

Research · CMU Machine Learning BlogTeaching Vision-Language Models to Speak Cinema

A CVPR 2026 Highlight showing that an 8B model with expert-curated cinematic captions beats GPT-5 at video generation control — a vivid case study in data quality beating scale.

The Grind

Research papers, decoded

Agents34,177 upvotes · arxiv
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

τ-bench stress-tests agents in multi-turn conversations with simulated users, real APIs, and strict policy documents across retail and airline domains. The kicker: even GPT-4o passes only about 61% of retail tasks on a single try, and drops below 25% when you require success across 8 independent attempts (a new 'pass^k' reliability metric). Failures are dominated by wrong API arguments (33%) and policy-violating decisions (25%) — exactly the things single-run benchmarks hide.

Generative Models151 upvotes · alphaxiv
ELF: Embedded Language Flows

ELF brings continuous-space diffusion to language modeling by running Flow Matching almost entirely in embedding space and only snapping to discrete tokens at the very last step. Competitive quality with just 45B training tokens vs 500B+ for comparable models, classifier-free guidance out of the box, and beats leading discrete diffusion models (MDLM, Duo) with fewer sampling steps. A credible path to faster, more controllable non-autoregressive text generation.

Model Merging4 upvotes · huggingface
FeatCal: Feature Calibration for Post-Merging Models

FeatCal tackles model merging's known pain — the merged checkpoint trailing the individual task experts — by tracing the gap to 'feature drift' and fixing it with a tiny calibration set applied layer-by-layer via a closed-form (no gradient descent) update. Beats Surgery and ProbSurgery on CLIP-ViT-B/32 (85.5% vs 77-78%) and FLAN-T5-base GLUE, runs about 4x faster, and reaches strong accuracy with as few as 8 examples per task.

On Tap

What's trending in the builder community

tinyhumansai/openhuman

A privacy-first local AI super-assistant pitched as 'your personal AI super intelligence' — written in Rust and gained 3,476 stars today.

mattpocock/skills

Matt Pocock open-sourced his entire .claude directory of skills and engineers are eating it up — 2,971 new stars today.

rohitg00/agentmemory

'#1 Persistent memory for AI coding agents based on real-world benchmarks' — the agent memory problem keeps generating top-tier tools.

obra/superpowers

An agentic skills framework and software development methodology that's quietly racked up 190K+ stars total.

ruvnet/RuView

Turns commodity WiFi signals into real-time spatial intelligence and vital-sign monitoring — no camera required.

Memoket Gem

An AI wearable that remembers your conversations all day — memory keeps showing up as the universal pain point.

Latitude for Claude Code

See where Claude Code burns tokens and hit your limits less — Anthropic's new metering already has a thriving observability layer.

CraftBot with Living UI

Grow your own software that's 'alive' — UI that mutates and reshapes itself as you use it.

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

AI agents autonomously fine-tuning vision-language models with full-weight access, inference routing, benchmark filtering, MCPs, and a live training demo.

The biggest AI breakthrough in medicine & drug discovery

MAMMAL, a foundation biology model that outperforms AlphaFold 3 on toxicity, antibody design, and cancer drug development.

AI Is Running Out Of Bandwidth.. These Companies Win

The bandwidth, networking, optics, and power thesis behind today's Cisco pop, explained in 20 minutes.

OpenAI: Codex in the ChatGPT mobile app

OpenAI's announcement that the Codex coding agent is now in preview inside the ChatGPT mobile app, with desktop/devbox continuity.

xAI: Grok Build early beta

An agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers at x.ai/cli.

Self-Improving Agent

Captures learnings, errors, and corrections to enable continuous improvement — the most-installed skill on Clawhub today.

Skill Vetter

Security-first skill vetting for AI agents — the supply-chain layer for skills is starting to form.

Roast Calendar

Upcoming events & gatherings

DevTools Meetup @ Transpose: Manage AI-Generated CodeThu May 14, 6:30 PM PT, Local, San Francisco, CA

Freestyle.sh-hosted dev night on wrangling AI-generated code in production with Ben Werner and James Tan — useful if you ship Claude or Codex output to real users.

[Paper Reading] R2Code: A Self-Reflective LLM Framework for Requirements-to-Code TraceabilityThu May 14, 7:00 PM PT, Local, Fremont, CA

SupportVectors' applied-AI reading group on closing the requirements-to-code loop — strong companion to today's τ-bench discussion.

Ultimate Bots: Fight x Dance (Powered by Nebius)Thu May 14, 7:00 PM PT, Local, San Francisco, CA

Eight piloted humanoid fight matches plus robot dance battles at Temple Nightclub — pure spectacle, also a recruiting moment for Nebius.

General Catalyst x Proximal: Research MeetupThu May 14, 6:30 PM PT, Local, San Francisco, CA

Intimate AI research meetup with Sophia Han and Calvin Chen, hosted by GC-backed Proximal — good for serious researchers.

Strategy as an Emotional Act (Human+Tech Week @ Frontier Tower)Thu May 14, 6:30 PM PT, Local, San Francisco, CA

Frontier Tower's Human+Tech Week session with Chelsea Borruano on the human side of strategy work — useful counterweight if your week was all silicon and IPOs.

Silicon Valley Economic Forum (SVEF)Fri May 15, 8:00 AM PT (runs through Sat), Local, San Jose, CA

Multi-day forum on the Valley's economic and tech future hosted by AW3 Technology — 195+ interested attendees and the right audience for today's macro AI threads.

Chief of Staff Connect — San FranciscoFri May 15, 8:00 AM PT, Local, San Francisco, CA

Chief of Staff Network's flagship SF gathering for operators and CoSes at fast-moving AI companies — solid networking if you operate behind a high-velocity founder.

Last Sip

Parting thoughts & a teaser for tomorrow

If today felt like the day enterprise AI's pecking order got reshuffled, that's because it kind of was. Anthropic is in the lead on paid business share, OpenAI is in defensive mode with mobile Codex and discount switching, Cerebras just printed the biggest tech IPO since Snowflake, Cisco is making its silicon thesis stick, Notion put the agent orchestration layer inside its workspace, and Meta moved AI privacy from a policy promise to a hardware guarantee. Underneath all of it, τ-bench keeps quietly insisting that the agents driving all this enthusiasm still fail more than half the time when you make them try real tasks.

Tomorrow we'll watch CBRS's first full trading day for the real verdict on the IPO appetite, see whether Anthropic's metered programmatic pricing turns the developer grumbles into a measurable migration, and check whether Notion's External Agents API starts showing up in third-party benchmarks. See you in the next brew.