Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
Bold Shots
Today's biggest AI stories, no chaser
The US government's stance on Anthropic flipped into open contradiction this week. The Pentagon awarded classified-network AI work on May 1 to OpenAI, Google, Microsoft, AWS, Nvidia, xAI, and Reflection - but explicitly excluded Anthropic, with CTO Emil Michael citing it as a 'supply chain risk.' At the same time, the NSA is reportedly already running Mythos, Treasury has requested access, and the White House is drafting a Mythos-specific carve-out from its own ban. Anthropic also shipped Claude Security in public beta, using Opus 4.7 to scan codebases and propose fixes with confidence ratings. The capability numbers behind the political drama: Mythos hit 73% on UK AISI expert-level cyber tasks (up from 0% a year ago), found 181 working Firefox exploits versus 2 for Opus 4.6, and posts an 83% first-attempt exploit success rate.
Why it matters: Federal procurement is now operating on two operating systems - one that bans Anthropic and one that quietly carves Mythos out. If that carve-out becomes precedent, capability-tier strategic models can route around standard supply-chain bans entirely. For every other AI lab, the lesson is simpler: published usage policies are now procurement risk.
On the Q2 FY2026 earnings call, Tim Cook had to admit Mac mini, Mac Studio, and the new MacBook Neo will be supply-constrained for months because AI-driven demand is outrunning Apple's own forecasts. Mac revenue hit $8.4B (+6% YoY) in a record $111.2B quarter. Apple's online store is now quoting 4-5 months on higher-RAM Mac mini and Mac Studio configs; the 512GB Mac Studio was pulled outright. The MacBook Neo ($599, $499 for education) has been supply-constrained from launch. Cook fingered TSMC 3nm SoC capacity as the binding constraint for now - but warned the DRAM crunch (contract prices up 80-90% this quarter, data centers absorbing ~70% of all 2026 memory chips) will hit harder from June onward. Apple even called out OpenClaw developers by name as one of the demand drivers.
Why it matters: Apple stumbled into being the dominant non-NVIDIA local-AI hardware platform faster than its own forecasts could absorb - and now the M-series is competing for the same wafers as the iPhone 17. The shortage is a leading indicator that AI's appetite for HBM is bleeding into consumer hardware economics across the entire PC industry. IDC already cut 2026 PC shipments by 11.3%.
OpenAI shipped 'Codex for (almost) everything' - turning what used to be a coding agent into a general-purpose desktop agent that drives its own cursor across macOS apps. There are role-based onboarding flows for research, planning, docs, slides, and spreadsheets, plus 90+ plugins covering Microsoft, Google Workspace, Slack, Notion, Atlassian, GitHub, and GitLab. Multiple agents run in parallel, resume after pauses, and schedule work days out. Computer Use Agent is 42% faster post-update with another ~20% on browser/slide/sheet tasks. 3M+ devs already use Codex weekly. The catch: macOS-only at launch, and unavailable in the EU, UK, and Switzerland.
Why it matters: OpenAI is openly contesting Anthropic's Claude Cowork 'Digital Colleague' positioning. swyx's framing on Latent Space pretty much nails the new market split: Codex for knowledge work, Claude for creative work. The fact that OpenAI shipped an intentionally partial product (no EU, macOS-only) tells you they care more about claiming the category than gating the rollout.
Meta closed its acquisition of Assured Robot Intelligence on May 1, pulling in co-founders Xiaolong Wang (ex-Nvidia, UC San Diego) and Lerrel Pinto (ex-NYU) into Meta Superintelligence Labs. ARI brings VLA foundation models for whole-body humanoid control plus its e-Flesh tactile sensor for human-level dexterity. They join Meta Robotics Studio, the Reality Labs hardware group led by former Cruise CEO Marc Whitten. The strategic frame? Meta does not want to make humanoids. It wants to be the OS that other manufacturers run.
Why it matters: This crystallizes a three-tier humanoid market: vertically integrated makers (Tesla, 1X), platform/OS providers (Meta), and component suppliers. The labor market angle is wild too - Pinto's previous startup, Fauna Robotics, was acquired by Amazon two months ago. Hyperscaler M&A is now the default exit and acquihire timelines are measured in quarters, not years.
OpenAI launched Advanced Account Security on April 30 - opt-in passkey/hardware-key login that disables passwords entirely for ChatGPT and Codex. Email and SMS recovery are gone, replaced by backup passkeys and recovery keys. Sessions are shorter, login alerts are sharper. AAS-enrolled accounts get a hidden bonus: conversations are automatically excluded from training, no manual opt-out required. OpenAI also partnered with Yubico on co-branded YubiKey C NFC and Nano keys at consumer-friendly pricing ($68 for a two-pack vs. $126 retail). And starting June 1, AAS becomes mandatory for individual Trusted Access for Cyber members unless their org attests to phishing-resistant SSO.
Why it matters: OpenAI is treating AI accounts like online banking. The bundled training opt-out also conflates two consumer concerns - account security and data use - into one decision, signaling that the cohort it most wants to protect from external attackers is also the cohort whose conversations it least wants to ingest. When a chatbot ships YubiKey co-brand SKUs, the chatbot is no longer a toy.
The Blend
Connecting the dots across sources
Karpathy named the discipline; the entire stack shipped it the same week
- Karpathy at Sequoia AI Ascent retired 'vibe coding' with the line 'vibe coding raised the floor; agentic engineering raises the ceiling' - and the framing exploded across X with multiple 5K+ engagement tweets in a single day.
- On GitHub trending the same week, mattpocock/skills (+3,649 stars today, 51,757 total), obra/superpowers (175K stars), warpdotdev/warp (+3,403), and 1jehuang/jcode landed in the top 10 - all explicitly framed as agent harnesses or skills frameworks.
- On the skills marketplace, find-skills hit 1.3M installs as the de facto entry point, while vercel-react-best-practices (363K) and Anthropic's frontend-design (357K) cleared a quarter-million each. Clawhub's #1 and #2 are literally Self-Improving Agent and Skill Vetter.
- In the research, Recursive Multi-Agent Systems on AlphaXiv argues for replacing English message-passing with latent vector exchanges, claiming 20.2% accuracy gains and 75.6% fewer tokens - the academic version of Karpathy's loop.
Anthropic is being squeezed politically and commercially in the same 24 hours
- Across the news today, the Pentagon signed classified-network AI deals with seven labs (OpenAI, Google, Microsoft, AWS, Nvidia, xAI, Reflection) and explicitly excluded Anthropic, while the White House blocked Anthropic from expanding Mythos access from ~50 to 120 organizations.
- On X, the AI Security Institute confirmed GPT-5.5 is 'the second model to complete a multi-step cyber-attack simulation end-to-end' (1.4M views) - directly pricing in Mythos's loss of unique cyber-capability status the same week.
- In the blog coverage, Simon Willison published 'Our evaluation of OpenAI's GPT-5.5 cyber capabilities' calling it 'comparable to Mythos, but unlike Mythos it's generally available right now' - and on the same day, OpenAI confirmed Codex revenue doubled in under seven days.
- In the research, 'Safety Drift After Fine-Tuning' and 'FlashRT: Red-Teaming for Prompt Injection' provide the academic underpinning for the cyber-capability story driving the procurement drama.
Every May 2 builder event in San Francisco is an agent event
- At this week's events, Builders of Tomorrow's AI Super Hackathon pulled 655+ interested attendees in Mountain View - explicitly billed as building VC-backable AI agent businesses.
- In the same window, GMI Cloud's Build Matcha & Code, Gumloop x Toast's 'Anti-Busy Club' agent build, and two separate OpenClaw events (Open Build in SF, OpenClaw + Hermes Seminar in Milpitas) all run the same day.
- On Product Hunt, Wonder launched as 'the AI design agent that works on your canvas' with MCP connectors to Cursor and Claude Code - the productized form of what these workshops are teaching.
- In the blog coverage, The Neuron AI's 'The 4-tool agent quietly powering OpenClaw' landed on May 1 - the exact same brand showing up at two live events the next day.
Slow Drip
Blog reads worth savoring
A sharp Semianalysis read on how value in AI is consolidating toward model labs, with TSMC and Vera Rubin VR NVL72 as anchors for the shift. Highest engagement in today's batch.
Meticulous deep-dive on how a silent Databricks normalization step creates a ~270x discrepancy in similarity scores. A footgun every vector-search shipper needs to know about.
End-to-end reference covering CLAUDE.md, hooks, GitHub Actions, MCP, sub-agents, and FastMCP - the agentic-engineering playbook in one post.
Practical RLAIF walkthrough on Amazon Nova showing how LLM judges actually drive fine-tuning loops in production.
One-shot catch-up on Musk vs Altman in court, Anthropic's reported $900B raise, NSA testing Mythos, OpenAI hitting 10GW capacity years ahead of schedule.
Simon flags that Codex now ships its own Ralph-loop-style /goal command that runs until the goal is met or the token budget runs out - a meaningful step toward more autonomous coding agents.
A solo-built, from-scratch transformer engine in C - catnip for anyone who wants to understand the stack below PyTorch.
The Grind
Research papers, decoded
A viral arxiv-shared paper arguing that companies racing to replace knowledge workers with AI may be walking into a trap: layoffs gut the in-house expertise needed to validate, supervise, and course-correct the very models they depend on, leaving firms structurally dependent on vendors with no internal fallback. With 17K votes on X, this is clearly resonating with a workforce watching the AI hiring/firing cycle play out in real time. Treat this as the macro backdrop to every 'agent replaces team' pitch.
A clever method to reverse-engineer the secret parameter counts of closed models like GPT-5.5 and Claude Haiku by measuring how much hard-to-compress factual knowledge they hold. Using a 1,400-question benchmark across seven obscurity tiers and calibrating against 89 open-weight models (R-squared = 0.917), the author estimates ~65B for Claude Haiku and ~9.7T for GPT-5.5. Big takeaway: factual knowledge does NOT obey the 'Densing Law' - unlike reasoning, you still need raw parameters to memorize more facts.
Replaces text-based message passing in multi-agent LLM systems with 'latent thoughts' - direct vector exchanges between agents - treating the whole system as a single recursive computation. Reports up to 20.2% accuracy gains, 2.4x faster inference, and 75.6% fewer tokens across math, science, medicine, and code benchmarks. A serious challenge to the prevailing 'agents talk in English' orthodoxy as token costs mount in production.
Argues the traditional PDF-as-paper format imposes a 'storytelling tax' and 'engineering tax' on research agents - only 45.4% of reproduction requirements are fully specified in typical papers. Proposes ARA, an executable artifact with cognitive/code/exploration-trace/evidence layers, hitting 93.7% knowledge extraction accuracy (vs. 72.4% baseline) and 64.4% reproduction success.
First systematic study of 'reasoning conflicts' - what happens when you instruct an LLM to use a reasoning pattern (induction/deduction/abduction) that doesn't fit the task. Models consistently prioritize sensibility over compliance, quietly reverting to the reasoning style they think the task wants. Reasoning types are linearly encoded in middle-to-late layers, opening the door to activation-level steering.
On Tap
What's trending in the builder community
Matt Pocock's curated .claude/skills dump went viral as devs race to copy a real engineer's Claude Code workflow rather than build skills from scratch. +3,649 stars today (51,757 total).
Warp's pivot from terminal to full agentic IDE is getting fresh attention as the Codex/Claude Code competition heats up. +3,403 stars today (51,020 total).
Multi-agent LLM trading framework keeps climbing on retail-investor agent interest. +2,115 stars today (59,139 total).
Still the de facto skills/agent methodology repo, now near 175K stars. +1,098 stars today (175,338 total).
New Rust-based coding agent harness; yet another Codex/Claude Code alternative shipping in Rust. +404 stars today.
Browserbase's official Claude Agent SDK web-browsing skill set, riding the skills-ecosystem wave. +334 stars today.
Opinionated AI motion-design studio that takes one prompt and outputs a finished launch video in 10 minutes.
End-to-end creator stack: trend discovery, AI scripts, teleprompter with auto jump cuts, B-roll, subtitles, direct social publishing.
WYSIWYG, git-synced, live-collab docs editor where humans and agents share the canvas.
Canvas-native design agent with MCP connectors straight into Cursor and Claude Code.
Two flavors of research agent now in the Gemini API; both speak MCP and emit native charts.
Yanasa TV investigative report on how Eastern Washington's 'Project Tree' is reallocating senior water rights and 70,000 acres of public land to feed Amazon's data-center buildout.
Rod Miller's independent run of 250 safety tests across Claude Opus 4.7/4.6, GPT-5.5/5.4, Gemini 3.1 Pro, and Grok 4.20 - one frontier model scored zero where it mattered most, and the new OpenAI release regressed.
Brockman says OpenAI is '80% of the way to AGI,' agentic tools went from writing 20% of code in December to 80% now, and previews orgs running 100,000 agents at once.
Hannah Fry and Sourcery's Brendan Maginnis hand a real bank card to an autonomous agent. It opens a novelty mug shop and leaks passwords to a stranger. Required viewing if you build agent guardrails.
Half of Waymo's 20M autonomous rides happened in just the last seven months; the multimodal Waymo Foundation Model now powers driver, simulator, and critic.
OpenAI publicly posted that GPT-5.5 API revenue is growing 2x faster than any prior release and Codex revenue doubled in under seven days - the headline number behind the 'Claude Code moment' framing.
Altman tried to defuse the Codex-vs-Claude-Code comparison wars - 14K likes / 765K views.
@neil_xbt's recap thread of Karpathy's Sequoia AI Ascent talk became the conceptual frame everyone is now arguing about.
Viral thread on the AI infrastructure crunch - GE/Siemens/Mitsubishi order books stretch to 2029, prices nearly tripled since 2019. 630K views.
Meta-skill that helps agents discover and install other skills from the open ecosystem - the de facto entry point. 1.3M installs.
Anthropic's official skill for production-grade frontend interfaces that reject 'generic AI aesthetics.' 357.3K installs.
Captures errors, corrections, and learnings to enable continuous improvement. Top of Clawhub at 416,918 downloads.
Security-first vetting before installing skills from ClawdHub/GitHub - direct response to the supply-chain risk that's growing as the skills ecosystem explodes. 226,745 downloads.
Roast Calendar
Upcoming events & gatherings
Last Sip
Parting thoughts & a teaser for tomorrow
The weirdest pattern of the day isn't Karpathy's reframe or Codex doubling. It's that the Pentagon's 'too dangerous' model is the same one the NSA already runs in production. Frontier capability is now leaking through procurement bans because the procurement bans cannot afford to enforce themselves. Watch what the White House does with that Mythos carve-out - if it lands, it becomes the template for every future model the government can't quite ban.
Meanwhile, if you have a free Saturday in the Bay, every single non-blockchain meetup is teaching you to build an agent. That is not a coincidence either. See you tomorrow.