May 26, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Distilled trend

Google's AI Mode default and DeepSeek's permanent 75% V4-Pro cut land the same week, pushing publishers toward an Ahrefs-measured 58% CTR loss while making frontier inference 28-34x cheaper than Anthropic or OpenAI.
Salesforce's $300M Anthropic spend with zero new engineers collides with 74% of enterprises rolling back agents and Claude Dispatch's 50% task-success rate, exposing the gap between agent budgets and agent reliability.
Google's WebMCP and Universal Cart, Anthropic's Stainless acquisition, and OpenAI Codex cancelling SaaS subscriptions from a phone all point to the same wedge: agents are now buying, building, and unsubscribing on the user's behalf, with per-seat SaaS as the explicit target.

Bold Shots

Today's biggest AI stories, no chaser

Anthropic's Mythos finds 6,200+ critical OSS bugs — Cloudflare says don't ship the patches

Anthropic's Claude Mythos Preview, gated behind Project Glasswing, scanned 1,000+ open-source projects and surfaced 23,019 vulnerabilities — 6,202 rated high/critical, 90.6% confirmed valid in a 1,752-finding sample. It also uncovered a CVSS 9.1 certificate-forgery flaw in wolfSSL (CVE-2026-5194). Twelve founding partners got preview access — AWS, Apple, Google, Microsoft, NVIDIA, JPMorgan among them. Cloudflare CSO Grant Bourzikas ran Mythos against 50+ internal repos and publicly argued the AI-written patches "are not safe to ship blind" — some silently broke their own code. Startup Depthfirst claims a task-specialized model matches Mythos at one-tenth the cost.

Why it matters: Vuln discovery used to queue behind a small population of skilled researchers; Mythos breaks the bottleneck and shifts the rate-limit to patching. But Cloudflare reframes the debate — the right posture might be assumed-compromise architecture, not faster patch SLAs. Depthfirst's cheaper task-specialized rival directly undercuts the "bigger frontier model always wins" thesis.

Introducing Project Glasswing: an urgent initiative to help secure the world's most critical software. It's powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.

@AnthropicAI·46.1K engagements

Claude Mythos Preview in 6 Minutes

Developers Digest·178.9K views

Cloudflare just published what they found after running Anthropic's Mythos Preview against 50+ of their own repos

r/artificial·86 upvotes

Project Glasswing: Anthropic says Claude found 10,000+ vulnerabilities in 1 month

r/Futurology·955 upvotes

OpenAI Codex now cancels your Amazon subscription from a locked phone

A viral clip of OpenAI Codex autonomously opening a billing page and cancelling an Amazon subscription has been seized on as the cleanest "agents eat SaaS" moment yet. The capability shipped April 16 with Codex's desktop computer-use. A May 22 update extended it to drive Mac apps while the screen is off and locked, with task triggering from a phone. OpenAI's own docs explicitly warn against unattended use for "account, security, privacy, network, payment, or credential-related settings" — the exact workflow being demoed.

Why it matters: Codex weekly actives went from 1.6M in March to 4M+ by mid-2026, with token throughput up ~5x in the same window. Simon Taylor calls it "SaaSpocalypse": if an agent can drive a vendor's billing UI to cancel, it can drive the product UI to replace the seat. iShares' tech-software ETF is starting to read agent capability as a SaaS demand risk. The pushback (HN's benzible, MindStudio) is that Codex's computer-use is narrower than Anthropic's by design and domain expertise still moats a lot of SaaS — but the optics matter regardless.

Using computer use, you can ask codex to cancel subscriptions you don't need anymore. Very pleasant to watch. No particular one in mind, works on all of them. chatgpt.com/codex/

@thsottiaux·3.1K engagements

I truly believe codex team is about to hit the inflection point. Nearly everything I've been complaining about internally has been addressed. Clay on the wheel is centered. We're about to throw off the…

@jxnlco·455 engagements

Your Apps Don't Need an API Anymore. Codex Just Proved It.

AI News & Strategy Daily | Nate B Jones·194.9K views

Codex computer use is INSANE

r/codex·377 upvotes

BofA: SpaceX + OpenAI IPOs would push US concentration past every modern bubble peak

SpaceX filed its S-1 on May 20 for a Nasdaq listing under SPCX, targeting $1.75T-$2T and a raise of up to $75B. OpenAI confidentially filed a draft prospectus the same week for a Q4 2026 listing led by Goldman and Morgan Stanley at $852B-$2T+. SpaceX's S-1 identifies $26.5T of AI exposure inside a $28.5T TAM, and the company merged with xAI two months ago at a combined ~$1.25T valuation. BofA's Michael Hartnett estimates the two IPOs would push US single-sector market concentration from ~40% to ~48% — past every modern bubble peak, dot-com included.

Why it matters: Index-inclusion rules force passive ETF and 401(k) money into the new mega-caps within weeks of listing — retail can't opt out. The fundamentals strain the narrative: OpenAI generated ~$13.1B revenue in 2025 against a ~$9B net loss and ~$22B cash burn, and projects another ~$14B operating loss in 2026 against a $207B capital gap through 2030. April CPI ran 3.8%, near the 4% line BofA flags as a high-valuation IPO warning level. Anthony Scaramucci is calling SpaceX/OpenAI/Anthropic a "holy trinity" that may mark a market top.

2026 IPO Launches Will Be Historic: 1. SpaceX: Expected at $1.5 trillion valuation 2. OpenAI: Expected at $1+ trillion valuation 3. Anthropic: Expected at $500 billion valuation.

@KobeissiLetter·3.8K engagements

SpaceX revealed eye-popping numbers in its IPO prospectus, including a $26.5 trillion potential market for an empire spanning artificial intelligence and telecommunications.

@business·159 engagements

SpaceX and OpenAI: The Mega IPO Grift

Ben Felix·297.1K views

SpaceX' $75B+ Historic IPO, GPT5.5 Outperforms Polymarket, AI Solves 80yr old math problem | EP #257

Moonshot Mates·95.9K views

Pope Leo XIV's first encyclical calls for disarming AI — co-presented with Anthropic's Chris Olah

Pope Leo XIV released Magnifica Humanitas on May 25 — a 42,000-word, five-chapter encyclical on safeguarding the human person in the age of AI. He signed it on May 15, the 135th anniversary of Leo XIII's Rerum Novarum. The text calls for AI to be "disarmed" from logics of military and economic domination, says classic just-war theory is outdated in an age of algorithmic warfare, and names hidden labor exploitation behind AI systems as "new forms of slavery." Pontiffs usually delegate encyclical unveilings to cardinals; Leo personally co-presented this one alongside Anthropic co-founder Christopher Olah — the first AI executive ever to help unveil a papal encyclical.

Why it matters: The stagecraft is the story. The text describes AI as "more cultivated than built" — language closer to a research note than curial Latin — and names hyperscalers as concentrating epistemic and political power. The co-presentation lands against Anthropic's ongoing legal fight with the Trump administration over military uses of its models. The Vatican is positioning itself as a moral authority on AI architecture and corporate incentives, not just AI use.

Pope, urging AI regulation, warns some weapons now beyond human control reut.rs/3PGQTHi

@Reuters·12K engagements

BREAKING: AI company cofounder Chris Olah said in a press conference on Pope Leo's first encyclical that those outside of the artificial intelligence industry need to hold developers to account.

@CatholicNewsSvc·2.9K engagements

LIVE | Presentation of Pope Leo XIV's Encyclical Magnifica Humanitas from the Vatican | May 25, 2026

EWTN News·13.9K views

Pope Leo issues AI encyclical warning that opaque algorithms concentrate power in a few companies

r/technology

DeepSeek freezes V4-Pro at 75% off — 28-34x cheaper than Claude Opus 4.7 and GPT-5.5 PRO

DeepSeek made its 75% V4-Pro discount permanent on May 22, freezing what was supposed to be a promo expiring May 31. List pricing is now $0.435/M input (cache-miss), $0.003625/M (cache-hit), and $0.87/M output. Against Claude Opus 4.7 and GPT-5.5 PRO at ~$30/M output, V4-Pro lands ~28-34x cheaper. V4-Pro is a 1.6-trillion-parameter model optimized for Huawei Ascend 950 chips rather than Nvidia (Huawei targeting ~750K 950PR units in 2026).

Why it matters: Counterpoint's Neil Shah argues V4-Pro has effectively closed the performance gap on math and reasoning while leading on openness and inference cost. Marcus Schuler's framing: Western labs "structurally cannot match the price without breaking the revenue models their valuations depend on." The second layer no spreadsheet resolves: buyers can't simply route production traffic through DeepSeek given the model runs on Huawei silicon while the White House escalates IP-theft accusations. Developers have already moved — the dominant pattern is plugging V4-Pro into Claude Code via OpenRouter and running overnight agentic loops that were previously prohibitive.

DeepSeek's pricing is insane. > $0.87 per 1M output tokens > 5.75M output tokens with the price of a Starbucks coffee (~$5) > that's almost 14,000 pages of books

@Hesamation·3.1K engagements

Slow Drip

Blog reads worth savoring

Analysis · Data Science CollectiveThe Memory Wall Is Strangling Your LLM: Why GPUs Are Faster Than You Think and Slower Than You Need

Quantifies the 200x gap between H100 theoretical throughput (62K tok/s) and real-world inference (100-300 tok/s) and walks through KV caching, speculative decoding, and diffusion LLMs as fixes — clean mental model for memory-bound vs compute-bound regimes.

Analysis · Towards AISliding Windows Forget: Why Long-Running LLM Apps Need Memory Policy

Open-source benchmark across 7 context policies — importance-based memory retains 90.7% of critical facts vs 10.8% for sliding windows at the same token budget. Actionable if you're building persistent agents.

Analysis · The AI CornerClaude Dispatch: The AI That Keeps Working When You Don't

Hands-on review of Anthropic's phone-to-desktop delegation with an honest 50/50 success-rate breakdown by task type — file searches reliable, terminal and multi-step tasks fail silently. Read it before you trust it with anything important.

Tutorial · Data Science CollectiveA Qwen 3.5 122B LLM on a 16 GB Mac mini: MoE Expert Streaming with TurboQuant-MLX

Reproducible recipe for running a 122B MoE on a $599 Mac mini by streaming only the 8 active experts per token from SSD — 9 GB peak RAM, 54 GB on disk via 3-bit quant. Local inference reframed as a disk-bandwidth problem, not a RAM one.

News · Towards AITwo HTML Attributes Now Turn Your Website Into an AI Agent Tool — Inside Chrome's WebMCP

Chrome 149 origin trial lets sites expose forms to AI agents via data-mcp-name / data-mcp-args, replacing brittle vision-based UI actuation with direct function calls. Web devs should track this now.

The Grind

Research papers, decoded

Reasoning & Evaluation8,729 upvotes · X / arxiv / alphaxiv · X

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Apple stress-tests Claude 3.7 Sonnet Thinking, DeepSeek-R1, and o3-mini on four controllable puzzles at the same 64K-token budget. Finds three sharp regimes: at low complexity standard LLMs beat the thinking variants, at medium complexity LRMs win, beyond a model-specific threshold both collapse to ~0. As problems get harder, the models reduce thinking tokens even with budget left, and handing them the explicit Tower of Hanoi algorithm barely helps.

Latent Reasoning198 upvotes · alphaxiv

Generative Recursive Reasoning Models (GRAM)

Turns deterministic single-trajectory recursive reasoning into probabilistic multi-trajectory computation via amortized variational inference, with a hierarchical high-level / low-level state structure and learned perturbation distributions. Hits 97.0% on Sudoku-Extreme (vs 87.4% deterministic), 52.0% on ARC-AGI-1, and works as an unconditional generative model (99.05% valid Sudoku boards). For tasks with multiple valid answers, deterministic recurrent reasoning is leaving accuracy on the table.

Test-Time Search67 upvotes · alphaxiv

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Drop-in replacement for GRPO's scalar advantage — trains the policy to anticipate vector-valued rewards (per-test-case, multi-reward, multi-persona) and emit a set of solutions specialized to different trade-offs. Combines multi-answer chains with stochastic Dirichlet scalarization. The gap with GRPO widens with the search budget — on LiveCodeBench evolutionary search, VPO models solve problems GRPO models can't solve at all.

Test-Time Scaling48 upvotes · alphaxiv

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

Reframes iterative latent reasoning as a dynamical system converging to task-conditioned attractors. Three training tricks plus Adaptive Computation Time halting let the model unroll the equivalent of 40,000 layers — Sudoku-Extreme goes from 2.6% to 99.8%, Maze-Unique hits 93.0% accuracy with 17.4x less average compute via adaptive halting.

The Mill

Builder tools ground for action

30K stars, +5.6K today

Lum1104/Understand-Anything

Turns any codebase into an interactive knowledge graph you can explore, search, and Q&A against. Works with Claude Code, Codex, Cursor, Copilot, and Gemini CLI. Devs are tired of paying token tax to re-explain their repo to every agent.

24K stars, +3.2K today

colbymchenry/codegraph

Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent: fewer tokens, fewer tool calls, 100% local. Same thesis as Understand-Anything — agent-native code indexing is the new ctags.

18K stars, +3.2K today

rohitg00/ai-engineering-from-scratch

Practical hands-on AI engineering curriculum. 3K stars in a day says the 'year of agent experience but no fundamentals' crowd is finally looking for a structured ramp.

154K stars, +2.8K today

multica-ai/andrej-karpathy-skills

Single CLAUDE.md file distilling Karpathy's observations on LLM coding pitfalls — drop-in for Claude Code. Surging alongside Karpathy's reported move to Anthropic.

15K stars, +1.4K today

anthropics/knowledge-work-plugins

Open-source plugins for Claude Cowork, aimed at knowledge workers (not just devs). Anthropic officially leaning into the plugin ecosystem.

457 votesProduct Hunt

Stitch 3.0 by Google

Generate and iterate UI screens with AI on a live canvas. Google's design-tool entry — direct shot at Vercel v0 and Figma Make.

Design Tools / User Experience

305 votesProduct Hunt

ModelHub

The missing menu bar app for local LLMs on Mac. Pairs neatly with M5 Max / DGX Spark local-inference chatter — managing local models is the new pain point.

Open Source / Developer Tools

286 votesProduct Hunt

Freu AI

Automate any Mac app with $0 recurring run cost. Local-first Mac automation — same 'stop paying SaaS per agent' thread as ModelHub.

Artificial Intelligence / GitHub

177 votesProduct Hunt

Edgee Fallback Models

Claude Code that never stops. Automatic model failover for Claude Code sessions — market response to GPU rentals up 200%.

Productivity / Software Engineering

155 votesProduct Hunt

Runway Agent

Generate edited, sound-designed videos via chat. Runway moves from tool to agent.

Design Tools / Social Media

The Counter

Voices from the AI bar today

33K views

Why the AI boom is about to hit a wall

Argues AI procurement is becoming a supply-chain problem, not a software problem — HBM, advanced packaging, and grid power are the binding constraints. Aimed at people planning 2026 capacity.

AI News & Strategy Daily | Nate B Jones

1.1K views

Top #1 Opportunity for Senior Engineers: Agentic Engineering

Five-pillar framework: agent harnesses, software factories, extensible software, always-on agents, agentic access. Useful if you're deciding whether to specialize in orchestration vs keep writing code.

IndyDevDan

5.1K views

Why The AI Boom Is Reshuffling The Global Stock Market Hierarchy

How AI infra is repricing Taiwan/Korea equities via TSMC, Samsung, and SK Hynix concentration. Complements the Nate B Jones supply-chain thesis.

CNBC International

11K engagements

Anthropic doesn't have Claude. Claude has Anthropic.

The line that anchored the Codex-vs-Claude rivalry topic this cycle — 9.8K likes, 556 RT, 703K views.

@naval

10K engagements

120 quadrillion tokens per month by 2030. Goldman Sachs just modeled agentic AI token consumption — 24x growth in 4 years.

Goldman's number is what every infra slide will cite for the next quarter.

@BankXRP

2.6K upvotes · 129 comments

Anthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and Claude Code!)

Top-voted thread of the cycle. Anthropic moves from product to education distribution — direct response to 'where do I learn agentic engineering' demand.

r/ClaudeAI

1.6K upvotes · 513 comments

$300M on Anthropic tokens, zero new engineers hired — Salesforce is the clearest case study of where this is going

— highest in the pool. Discussion centers on whether token spend actually replaces headcount or just shifts the cost line.

r/ArtificialInteligence

Roast Calendar

Your AI week, day by day

Tue26

4:30 PM PT•Menlo Park

From Agentic AI to Physical AI: Talks + AI Agent Workshops, Ft AWS, Bedrock Robotics, Zoox & Knightscope

6:00 PM PT•San Francisco

Hard Problems Night for Agent Builders

5:00 PM PT•San Francisco

Codex Community Hackathon - San Francisco #5

Wed27

12:00 PM PT•Stanford

Stanford OpenLab Seminar with Guido Appenzeller, GP a16z AI Infrastructure

5:30 PM PT•San Francisco

Operationalizing Agents with Google DeepMind, Snowflake, and Google Research

7:00 PM PT•San Francisco

SkillsBench 1.1 Launch Party @ ACM CAIS

Thu28

3:00 PM PT•Stanford

XTrace x Stanford: Build Agents That Remember

4:00 PM PT•San Francisco

Frontier Residency - Demo Day

5:30 PM PT•San Francisco

Building reliable AI Agents for Fighting Crime

Fri29

12:00 PM PT•San Francisco

Production AI with Metaflow Meetup at DoorDash

5:00 PM PT•Mountain View

Gemini Meetup

5:30 PM PT•San Francisco

Continual Learning Circle Meetup & Dinner

Sat30

May 30 - May 31•San Francisco

Autoresearch Systems Hackathon with Modal, OpenAI, Raindrop & Antler

May 30 - May 31•San Francisco

Web Data UNLOCKED - Enterprise AI, Two Day Hackathon

2:00 PM PT•Stanford

Building & Investing in Fintech in the AI Wave: Fireside Chat with Jefferson Chen

Sun31

10:00 AM PT•Stanford

GDG Stanford Hackathon. Win up to $5M in seed funding

9:00 AM PT•San Francisco

Applied Intelligence Hackathon

2:00 PM PT•San Jose

AI Demo Day

Mon1

12:00 PM PT•Stanford

Stanford OpenLab Seminar with Sebastian Thrun (Sage AI, co-founder Waymo, GoogleX, Udacity)

5:30 PM PT•San Francisco

MCP Connect San Francisco with Sentry, Bitmovin and Alpic

4:00 PM PT•San Francisco

Live Session: Is Context Engineering the New Analytics Engineering? (Snowflake Summit)

Last Sip

Parting thoughts

The pattern of the day is pretty clear once you line up Salesforce's $300M token bill, Claude Dispatch's 50/50 success rate, and the SF event titled "What actually breaks first when they run 24/7." The bets are getting placed faster than the ops layer can hold them. If you're shipping an agent this week, the most useful read here might just be Samarth Vinayaka's memory-policy benchmark — pick a policy before your agent picks one for you.

Agentic Brew Daily

Fresh Batch

Bold Shots

Introducing Project Glasswing: an urgent initiative to help secure the world's most critical software. It's powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.

Claude Mythos Preview in 6 Minutes

Cloudflare just published what they found after running Anthropic's Mythos Preview against 50+ of their own repos

Project Glasswing: Anthropic says Claude found 10,000+ vulnerabilities in 1 month

Using computer use, you can ask codex to cancel subscriptions you don't need anymore. Very pleasant to watch. No particular one in mind, works on all of them. chatgpt.com/codex/

I truly believe codex team is about to hit the inflection point. Nearly everything I've been complaining about internally has been addressed. Clay on the wheel is centered. We're about to throw off the…

Your Apps Don't Need an API Anymore. Codex Just Proved It.

Codex computer use is INSANE

2026 IPO Launches Will Be Historic: 1. SpaceX: Expected at $1.5 trillion valuation 2. OpenAI: Expected at $1+ trillion valuation 3. Anthropic: Expected at $500 billion valuation.

SpaceX revealed eye-popping numbers in its IPO prospectus, including a $26.5 trillion potential market for an empire spanning artificial intelligence and telecommunications.

SpaceX and OpenAI: The Mega IPO Grift

SpaceX' $75B+ Historic IPO, GPT5.5 Outperforms Polymarket, AI Solves 80yr old math problem | EP #257

Pope, urging AI regulation, warns some weapons now beyond human control reut.rs/3PGQTHi

BREAKING: AI company cofounder Chris Olah said in a press conference on Pope Leo's first encyclical that those outside of the artificial intelligence industry need to hold developers to account.

LIVE | Presentation of Pope Leo XIV's Encyclical Magnifica Humanitas from the Vatican | May 25, 2026

Pope Leo issues AI encyclical warning that opaque algorithms concentrate power in a few companies

DeepSeek's pricing is insane. > $0.87 per 1M output tokens > 5.75M output tokens with the price of a Starbucks coffee (~$5) > that's almost 14,000 pages of books

DeepSeekV4 + Claude Code = 100X Cheaper

DeepSeek just popped the American AI bubble.

DeepSeek just popped the American AI bubble.

Slow Drip

The Grind

The Mill

The Counter

Roast Calendar

Last Sip