Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
- Anthropic is running a two-front strategy: endorse Vatican-led AI oversight on the public stage while shipping a model that found 6,202 critical zero-days faster than maintainers can patch them.
- Micron's $1T cap, Goldman's $650B AI-spend tally, and Huawei's EUV-free Tau Scaling roadmap show the AI hardware story has split into a US memory-oligopoly trade and a Chinese architectural workaround, not a single race.
- The agent economy is becoming infrastructure-grade: x402 settled $73M across 176M agent transactions, AWS shipped Bedrock AgentCore Payments, and solo operators are already running $18.8K/month businesses on seven Claude Code agents.
Bold Shots
Today's biggest AI stories, no chaser
Pope Leo XIV released his first encyclical on Monday — Magnifica Humanitas — a 42,300-word, 245-paragraph document that retires classical just-war theory for the AI era and forbids delegating lethal or irreversible decisions to autonomous systems. He presented it alongside Anthropic co-founder Chris Olah, the first time a frontier-AI executive has shared that stage at an encyclical launch. The Washington Post framed the optics bluntly: Anthropic publicly aligning with the Holy See over the White House. A LessWrong stylometric pass estimated 10–15% of the final text was AI-written.
Why it matters: The encyclical converts 'disarm AI' into a concrete checklist labs and militaries can be measured against — no autonomous lethal decisions, mandatory traceability, mass unemployment treated as a moral failure. Olah's presence makes it harder to dismiss as outside criticism.
AWS launched Amazon Bedrock AgentCore Payments in preview on May 7, built with Coinbase (x402 + USDC) and Stripe (Privy wallets) — the first hyperscaler-managed payment service for autonomous agents. A Keyrock / Coinbase / Tempo / Virtuals report documented $73M settled across 176M AI-agent transactions between May 2025 and April 2026, with 98.6% of volume in USDC. The x402 Foundation now sits under Linux Foundation governance with Stripe, Shopify, Solana, Visa, and Mastercard at the table.
Why it matters: 76% of agent transactions fall below Visa's $0.30 fixed-fee floor, while Layer-2 stablecoin settlement on Base costs ~$0.0001 — a 3,000x gap. Cards structurally cannot serve sub-dollar machine commerce, and the rails are landing well ahead of any regulatory framework for machine-to-machine liability.
At IEEE ISCAS 2026 in Shanghai on May 25, Huawei's He Tingbo unveiled the Tau scaling law — a time-domain optimization framework pitched as the successor to geometric transistor shrinkage. Flagship technique LogicFolding vertically stacks logic into a dual-layer architecture paired with a UnifiedBus interconnect. Huawei says it has quietly mass-produced 381 chips on this methodology, with LogicFolding debuting in fall 2026 Kirin silicon, Ascend AI processors by 2030, and 1.4nm-equivalent density by 2031 — all without EUV. Test silicon shows 55% density gain and 41% power-efficiency improvement.
Why it matters: Tau reframes 'progress' away from nanometer shrinks toward end-to-end cycle time, giving China a credible non-EUV path to leading-edge AI compute. If the roadmap lands inside the disclosed bounds, the scarcity premium underpinning Nvidia's valuation becomes a debate rather than a default.
Micron surpassed $1 trillion in market cap on May 26, closing up 19.29% at $895.88 — its best single session since November 2011. The rally has added roughly $650B in market value since the March 30 low (~180% appreciation). UBS more than tripled its target from $535 to $1,625, the Street high among 46 covering analysts. CEO Sanjay Mehrotra says calendar-2026 HBM capacity is sold out and Micron can meet only 50–66% of customer demand.
Why it matters: UBS's thesis is that AI has structurally re-rated the entire memory complex — 3-to-5-year contracts with AWS, Azure, Google Cloud, Meta, Oracle and Nvidia at partially fixed pricing turn DRAM from a spot-priced commodity into something closer to an infrastructure franchise.
Anthropic unveiled Claude Mythos Preview on April 7, a frontier model capable of autonomously discovering and exploiting zero-days across major OSes and browsers. Project Glasswing gave ~50 partners (AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan, NVIDIA, Palo Alto Networks) restricted access backed by $100M in usage credits. Mythos scanned 1,000+ OSS projects and surfaced 23,019 potential vulnerabilities — 6,202 high or critical. Independent validation on 1,752 findings confirmed >90% as true positives. Only 75 have been patched. BNP Paribas extended its Mistral partnership three years to build a sovereign European Mythos hedge.
Why it matters: Anthropic itself wrote that 'no company — including Anthropic — has developed safeguards strong enough to prevent such models from being misused.' The 75-of-1,100 patch ratio is the real story: detection is solved, maintainer bandwidth is not.
Slow Drip
Blog reads worth savoring
Maps the four-phase transition from 48V to 800VDC datacenter power, why solid-state transformers hit 98.5% efficiency, and which suppliers win the projected $13B TAM by 2030.
Why NVL72-class racks are now split across cabinets and how 2W-per-end active copper (no DSP) wins the 3-meter inter-rack gap optics can't justify and passive copper can't reach.
Walks through C-SPANN's hierarchical K-means tree stored as ordinary table rows, plus RaBitQ single-bit quantization that cuts vector size 94% while keeping accuracy via reranking.
Makes the case that the $40B/yr compliance market is AI's most overlooked enterprise wedge, with VLM document parsing turning KYC and SAR filing from cost centers into revenue accelerants.
Hands-on walkthrough of Google's single-API-call managed agents: Linux sandboxes, multi-turn state via environment_id, mounting Git/GCS data, and locking outbound traffic with an egress proxy.
The Grind
Research papers, decoded
Apple built controllable puzzle environments to stress-test reasoning models (o3-mini, DeepSeek-R1, Claude 3.7 Sonnet Thinking) and found three regimes: on easy problems plain LLMs win, on medium ones thinking helps, on hard ones every LRM cliffs — even when handed the exact algorithm. Models actually reduce their reasoning effort as problems get harder despite remaining token budget. Don't pay the reasoning-token tax on simple tasks, and measure the cliff threshold for your domain before shipping.
Standard RL post-training like GRPO optimizes a scalar reward and quietly causes diversity collapse — sampling a mode-collapsed model 100 times barely beats sampling it once. VPO is a drop-in GRPO replacement that treats rewards as vectors and trains the policy to cover the Pareto frontier via Dirichlet-sampled scalarizations. Across four tasks it matches or beats scalar baselines on pass@k/best@k, gap widens with more samples. Swap GRPO for VPO if you do best-of-N at inference.
Delta-rule linear-attention models tie erasing and writing memory to a single scalar gate. Gated DeltaNet-2 splits these into independent channel-wise gates with a chunkwise WY training algorithm that keeps throughput nearly flat from 2K to 16K context. At 1.3B params on 100B FineWeb-Edu tokens it beats Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling and reasoning. The new linear-attention baseline to beat for long-context workloads.
First large-scale empirical study of LLMs paired with the Lean compiler attacking open math problems. AlphaProof Nexus wraps Gemini 3.1 Pro in a generate/verify loop with an evolutionary variant. The strongest agent autonomously resolved 9 of 353 open Erdős problems (some open 56 years) and 44 of 492 OEIS conjectures for a few hundred dollars per problem. For any domain with a verifier, try a tight generate-verify loop before reaching for elaborate scaffolds.
RLVR worked for math and code because of cheap verifiable rewards; CUA-Gym ports that recipe to computer-use agents by co-generating tasks, executable environment states, and reward functions via an adversarial Generator/Discriminator/Orchestrator loop. Trained with GSPO on 32,112 verified tuples across 110 environments, CUA-Gym-A17B hits 72.6% on OSWorld-Verified (+10.4pp over base) and transfers to held-out WebArena. Environment diversity scales independently from trajectory volume.
The Mill
Builder tools ground for action
Turns any codebase into an interactive knowledge graph you can explore, search, and query; works with Claude Code, Codex, Cursor, Copilot and Gemini CLI.
Hands-on 'learn it, build it, ship it' curriculum for AI engineering.
Anthropic's open-source plugin repo for Claude Cowork, aimed at knowledge workers.
MCP-native self-updating context layer for your AI.
AI that learns how you work and turns it into software.
The Counter
Voices from the AI bar today
OpenAI's data-infra lead Emma details how uneven acceleration between app and platform teams is the real bottleneck as agents scale across orgs.
Hassabis walks through Gemini-powered drug discovery, clinical-trial acceleration and AI as a co-scientist, plus his read on recursive self-improvement.
Bull case framing semiconductors as the upstream commodity for the singularity.
OpenBMB ships MiniCPM5-1B as an open small model plus a local 'MiniCPM Desk Pet,' signaling China's push into on-device AI.
Open-weights jailbreak/abliteration project gets a cease-and-desist from Meta, igniting an OSS-vs-corporate-IP debate.
Salesforce's spend as the canonical proof-point that enterprises are substituting tokens for headcount.
Roast Calendar
Your AI week, day by day
Last Sip
Parting thoughts
The Vatican-and-Mythos pairing is the part worth sitting with. Same company, same week — one stage arguing for restraint, one report logging 6,202 critical zero-days with a 7% patch rate. You can read it as cynical or as honest, but it does answer the question of where the real AI safety conversation is happening: in the gap between what the model can find and what the world can fix.