May 5, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Bold Shots

Today's biggest AI stories, no chaser

Pentagon excludes Anthropic from DoD AI vendor deal

The DoD signed agreements on May 1 with eight AI vendors — SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft, AWS, Oracle — to deploy frontier AI on classified IL6/IL7 networks while pointedly leaving Anthropic out. Anthropic was frozen out after refusing to permit Pentagon use of Claude "for all lawful purposes," insisting on contractual red lines against domestic mass surveillance and autonomous lethal weapons. The standoff has already produced a Trump executive order banning Anthropic federally, a Hegseth Supply-Chain Risk designation 90 minutes later, and a split federal court record (preliminary injunction in N.D. Cal., loss at the D.C. Circuit).

Why it matters: This isn't really about safety branding — it's about the "all lawful purposes" clause that would license domestic surveillance use of Claude. Pentagon CTO Emil Michael's "irresponsible to be reliant on any one partner" rationale is rewriting DoD AI procurement into an explicit multi-vendor doctrine, and Anthropic's CFO is warning of "multiple billions" in 2026 revenue impact and >$500M public-sector ARR at risk.

Anthropic and OpenAI launch enterprise AI joint ventures

On May 4, Anthropic announced a $1.5B enterprise services firm with Blackstone, Hellman & Friedman, and Goldman Sachs, embedding Claude into mid-sized businesses across healthcare, manufacturing, financial services, retail, real estate, and infrastructure. The same day, OpenAI finalized The Deployment Company (DeployCo): a $10B JV anchored by ~$4B in PE capital from TPG, Brookfield, Bain, Advent and 15 other firms, with up to $1.5B from OpenAI and a 17.5% guaranteed annual return to PE backers over five years. Both ventures lift Palantir's forward-deployed engineer model wholesale, embedding small high-caliber teams inside customer ops instead of selling licenses.

Why it matters: The bottleneck for enterprise AI has shifted from model quality to deployment labor. By owning the FDE layer through PE-funded JVs, Anthropic and OpenAI capture implementation revenue that would otherwise leak to McKinsey, BCG, Accenture, and Deloitte. Anthropic's enterprise share already jumped from 24% to 40% in under a year, and DeployCo gets access to 2,000+ PE portfolio companies as a built-in deployment surface.

OpenAI o1 outperforms ER physicians in diagnosis

A peer-reviewed Science paper from Harvard Medical School and Beth Israel Deaconess (with Stanford collaborators), published April 30, ran o1-preview on 76 real ER triage cases. The model produced the "exact or very close" diagnosis 67.1% of the time vs. 55.3% and 50.0% for two attending internal medicine physicians on the same charts. In one case, o1 flagged a rare flesh-eating infection in a transplant patient 12-24 hours before the treating physician. Two blinded reviewers couldn't consistently distinguish AI vs. human assessments — though the comparator was internal medicine, not ER specialists, and the model is 19 months old.

Why it matters: The methodological leap is feeding raw EHR charts instead of curated vignettes — closer to how a real consult works. And it lands in a liability vacuum: there's no FDA pathway and no malpractice standard for general-purpose LLM-assisted diagnosis. The researchers explicitly call for prospective randomized trials before this hits the bedside.

Cerebras IPO opens with order books at ~$10B

Cerebras Systems is offering 28M Class A shares at $115-$125 to raise up to $3.5B at a ~$26.6B implied market cap, listing on Nasdaq as CBRS with pricing expected the week of May 11. The roadshow launched May 4 and order books are reportedly running near $10B against $3.5B of stock — heavy oversubscription. The OpenAI tangle is wild: OpenAI is simultaneously the biggest customer (750 MW / >$20B Master Relationship Agreement through 2028), one of the biggest creditors (~$1B loan secured by 33M+ share warrants), and an early insider via execs' angel checks.

Why it matters: At ~70x trailing sales (Nvidia trades around 23x), this is the cleanest litmus test for whether non-Nvidia AI silicon can attract real institutional capital. The OpenAI relationship is both the bull thesis and the governance risk — a clean inverse of why Cerebras's 2024 attempt died over G42 customer concentration.

Sierra raises $950M Series E at $15.8B

Bret Taylor's Sierra closed a $950M Series E co-led by Tiger Global and Google's GV at a post-money valuation up to $15.8B — a 3.5x markup from the $10B round just eight months earlier. Sierra cleared $150M ARR in eight quarters, has Fortune 50 penetration above 40%, and counts Prudential, Cigna, BCBS, and Rocket Mortgage among named customers. Total raised is ~$1.585B across four rounds in 27 months since emerging from stealth in February 2024.

Why it matters: Sierra is the pure-play counterpoint to the Anthropic/OpenAI JV news — chasing the same enterprise wallet share without owning a frontier model. The headwind is direct: Salesforce Agentforce Contact Center went GA in February, and every customer-experience deal is now a head-to-head bake-off.

The Blend

Connecting the dots across sources

Owning deployment now beats owning the model

Across the news today, both Anthropic and OpenAI announced massive PE-backed enterprise services arms on the same May 4 — Anthropic's $1.5B firm with Blackstone, Hellman & Friedman, and Goldman Sachs, and OpenAI's $10B Deployment Company — proving the labs see implementation labor, not model IP, as the next moat.
Sierra's $950M raise at a $15.8B valuation, with $150M ARR and 40%+ Fortune 50 penetration, lands in the same week and shows that pure-play deployment shops can also command frontier valuations even without owning a model.
On X today, Palantir's Q1 print of 104% YoY US revenue growth confirms that the forward-deployed engineer playbook the labs are now copying is already a public-market winner.
On GitHub, multi-agent orchestration repos like ruvnet/ruflo (+2,594 stars today) and the agency-agents project show the actual technical primitive these FDE teams plug into customers — agent swarms, not just chat APIs.

Anthropic's safety stance is repelling the Pentagon and seducing Wall Street simultaneously

Across the news today, the Pentagon's May 1 IL6/IL7 contracts went to eight vendors with Anthropic pointedly excluded after it refused the "all lawful purposes" clause — costing the company more than $500M in public-sector ARR.
Just three days later, Blackstone, Hellman & Friedman, and Goldman Sachs put $1.5B behind a Claude-led enterprise services firm aimed at regulated industries like healthcare and financial services — exactly the buyers who want contractual red lines on surveillance.
On Reddit, a thread about Anthropic refusing the Pentagon collected 47,071 upvotes, signaling that the safety brand is a popular-culture asset even as it loses federal procurement.
In the blogs today, Nathan Lambert's "distillation panic" piece reframes the policy environment Anthropic is navigating — the same posture costing it Pentagon contracts is precisely what European regulators and risk-averse banks are buying.

The AI capex bubble question is now everyone's question

Across the news today, Cerebras opened its roadshow at a ~$26.6B implied cap with order books at ~$10B against $3.5B of stock — investors are testing whether non-Nvidia silicon can clear ~70x sales while Nvidia trades at ~23x.
On X, hyperscaler capex forecasts of $805B in 2026 and $1.1T in 2027, paired with Goldman explicitly calling AI inflationary, frame the macro stakes behind the Cerebras pricing.
In the research community, "The AI Layoff Trap" paper drew 17,426 votes by arguing firms over-automate in a Prisoner's Dilemma — directly tying compute spend to the labor-cost question driving the capex thesis.
At this week's events, the SFSBW session "The Reward Of AI Readiness: Cost Risk and Value" has practitioners working through the same cost/risk math investors are pricing into Cerebras's IPO book.

Slow Drip

Blog reads worth savoring

Analysis · ByteByteGoConnecting LLMs to the Real World: Tool Use, Function Calling, and MCP

Traces the practical evolution from raw tool use to MCP so you can architect agent stacks with the right abstraction layer.

Analysis · InterconnectsThe distillation panic

A respected ML researcher reframes the alarmist "distillation attacks" narrative shaping current frontier-model policy debates.

Tutorial · KDnuggets7 Practical Ways to Reduce Claude Code Token Usage

Concrete, immediately applicable tactics to cut your Claude Code bill by attacking context bloat.

Tutorial · Towards AILangGraph Multi-Agent Architecture: Building a Self-Critiquing AI Debate System

Deep technical walkthrough of stateful graphs, Pydantic routing, and critique agents.

News · Anthropic ResearchBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic's own announcement of yesterday's $1.5B capital-backed push into enterprise AI services.

News · Amazon EngineeringCapacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Solves a real production headache by letting SageMaker auto-failover across instance types when GPU capacity is constrained.

Research · Towards AIMeasuring Behavioral Drift in LLMs: 22 Signals, 5 Dimensions, and the Calcification Effect

Borrows LIWC, OCEAN, and VAD from behavioral science to give engineers a reproducible numeric framework for detecting LLM personality drift.

Research · Towards AIQuCo-RAG: Count What You Know, Retrieve What You Don't

A novel RAG strategy targeting the "confidently wrong" failure mode by deciding when to retrieve based on what the model already knows.

The Grind

Research papers, decoded

Economics of AI17,426 upvotes · arxiv

The AI Layoff Trap

When firms automate jobs individually, they capture 100% of the cost savings but feel only a small fraction of the resulting demand destruction — the rest spills onto competitors. The result is a Prisoner's Dilemma where rational private decisions produce collectively disastrous over-automation, and the authors argue a Pigouvian automation tax (not UBI or capital taxes) is the only fully corrective policy. Highly relevant to today's discussion of AI agents replacing knowledge work.

Multimodal Reasoning123 upvotes · alphaxiv

Thinking with Visual Primitives

DeepSeek-AI bakes spatial coordinates — points [x,y] and bounding boxes — directly into the chain-of-thought of multimodal models, treating them as first-class reasoning tokens. Built on DeepSeek-V4-Flash with GRPO-based RL, the model averages 77.2% across seven benchmarks (beating Gemini-3-Flash and GPT-5.4) and is especially strong on topological tasks like maze navigation (66.9% vs GPT-5.4's 50.6%).

Multi-Agent Systems25 upvotes · huggingface

Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

A hierarchical multi-agent system where an upper-level Orchestrator decomposes complex queries into subtasks and lower-level Workers execute them in parallel, coordinating through a shared "Workboard" memory. It improves without fine-tuning the underlying LLM — the agents evolve human-readable skill files over time, achieving a 7.5x improvement on the WideSearch benchmark. A blueprint for scaling deep research agents via memory and coordination rather than bigger models.

On Tap

What's trending in the builder community

ruvnet/ruflo

TypeScript, +2,594 stars today (40,697 total). Multi-agent orchestration platform for Claude with self-learning swarm intelligence, RAG integration, and native Claude Code/Codex integration.

TauricResearch/TradingAgents

Python, +2,181 stars today (66,802 total). Multi-agent LLM framework for financial trading.

Hmbown/DeepSeek-TUI

Rust, +1,277 stars today (3,658 total). A terminal coding agent built specifically for DeepSeek models.

soxoj/maigret

Python, +1,116 stars today (24,567 total). OSINT tool that builds a dossier on a person from 3,000+ sites by username.

msitarzewski/agency-agents

Shell, +828 stars today (92,306 total). A complete "AI agency" — curated cast of specialized agents each with personality and processes.

GPT-5.5 VERIFIED Opus 4.7: A Pi Coding Agent That REVIEWS Like YOU

IndyDevDan introduces a "Verifier Agent" pattern: one AI builds, another autonomously reviews via atomic claim validation. 3,782 views.

Why AI's '12-Hour' Task Number Is a Mirage — Beth Barnes & David Rein

Machine Learning Street Talk hosts METR researchers Beth Barnes and David Rein deconstructing the popular task time horizon metric. 1,686 views.

AI Deleted Everything in 9 Seconds… Again

Olive Badger's postmortem of the PocketOS incident where an AI coding agent wiped a production database and backups. 9,511 views.

Howard Marks: AI, Debt vs Equity & The Next 40 Years Of Investing | Nikhil Kamath | People by WTF

Nikhil Kamath sits with Howard Marks on AI, debt vs equity, and the next 40 years of investing. 11,426 views.

Claude Just Changed the Real Estate Market Forever! (Tutorial)

Zubair Trabzada's AI Workshop builds a Claude Code tool that runs five parallel agents. 1,271 views.

AI Goes Macro

ServiceNow projected it would generate $30 billion of subscription revenue in 2030; Goldman calls AI inflationary; OpenAI costs quadruple. Engagement 9,761.

Tesla FSD Crosses 10 Billion Miles

@SawyerMerritt and others highlight Tesla FSD passing 10 billion miles driven. Engagement 6,877.

Musk vs. OpenAI Trial

Live audio of the Musk vs. OpenAI trial begins Mon May 4. Engagement 6,697.

Hyperscaler AI Capex $805B in 2026 / $1.1T in 2027

@tengyanAI on Morgan Stanley's hyperscaler capex forecasts. Engagement 3,642.

Palantir Q1 2026 Crushes

@PalantirTech: U.S. revenue +104% YoY; FY guide raised to 71%. Engagement 2,508.

find-skills

1,300,000 installs. Discovery skill that helps agents find other skills.

vercel-react-best-practices

369,700 installs. Curated React best practices skill from Vercel Labs.

frontend-design

366,500 installs. Anthropic's frontend design skill for shipping polished UIs.

web-design-guidelines

294,700 installs. Vercel Labs web design guideline skill.

microsoft-foundry

292,600 installs. Microsoft Azure Foundry skill for the Azure AI stack.

Roast Calendar

Upcoming events & gatherings

90/30 Club (ML reading) #51: Recursive Language Models (RLMs): Scaling Beyond Context WindowsMon, May 4 (7pm PT) | San Francisco, CA

Devtools founders coffee and waffles: Agent Experience (AX)Tue, May 5 (9am PT) | San Francisco, CA

Make Your Own Financial Agent for InvestorsMon, May 4 (6:30pm PT) | San Francisco (Frontier Tower)

Distilling Lab for Local Power - Open RegistrationMon, May 4 (7pm PT) | San Francisco (Frontier Tower)

The Reward Of AI Readiness: Cost Risk and ValueTue, May 5 (8:30am PT) | San Francisco, CA

Construction AI Networking Night | ENR FutureTechMon, May 4 (7pm PT) | San Francisco, CA

Last Sip

Parting thoughts & a teaser for tomorrow

The theme of the day is who owns the customer. Anthropic and OpenAI both decided yesterday that they do — even if it means picking a fight with the consultancies they were partnering with last quarter. The Pentagon decided Anthropic doesn't get to own the customer — at least not in the federal classified channel. And Cerebras is about to find out whether the public market thinks anyone but Nvidia gets to own the silicon.

Tomorrow we'll be watching: Cerebras IPO pricing chatter as the roadshow heats up, any DeployCo customer announcements (the PE backers will want flagship logos fast), and whether Anthropic's preliminary injunction in N.D. Cal. survives the inevitable appeal. Drink up.