May 25, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Distilled trend

Anthropic charges roughly 125x more per output token than DeepSeek while refusing public release, betting scarcity and defensive partnerships beat commodity pricing on regulated workloads.
Glasswing patched only 97 of 1,596 disclosed Mythos bugs in a month, validating the week-long argument that every production agent still bottlenecks on humans.
Google's Flash-is-frontier repricing tripled the cheap tier just as DeepSeek made the opposite bet permanent, squeezing incumbents from both ends of the cost curve.

Bold Shots

Today's biggest AI stories, no chaser

Anthropic's Claude Mythos Found 10,000 Zero-Days in a Month — and Patches Are Already 16:1 Behind

Anthropic dropped its first Project Glasswing update: about 50 partner orgs got gated access to Claude Mythos Preview, a frontier security model Anthropic has explicitly decided not to ship publicly. In one month it autonomously surfaced 10,000+ high/critical zero-days, including a 27-year-old TCP SACK flaw in OpenBSD and a 17-year-old FreeBSD NFS RCE. Cloudflare reported ~2,000 findings (400 high/critical) at a lower false-positive rate than human-led testing. Mozilla pulled 271 vulns out of Firefox 150 — roughly 10x Opus 4.6's yield on the prior release. Mythos 1 is being readied for Claude Code and Claude Security, which entered Enterprise public beta on May 22.

Why it matters: Discovery has cleanly outpaced remediation — only 97 of 1,596 vetted disclosures were patched at the one-month mark, and maintainers have asked Anthropic to slow disclosure. With a working exploit chain now costing under $2,000 in compute, the historical days-to-weeks patch window collapses to hours.

Looks like Anthropic is planning to launch Mythos 1, "claude-mythos-1-preview," on Claude Code and Claude Security.

ai_for_success·13.2K engagements

Claude Mythos is too dangerous for public consumption...

Fireship·1.1M views

Project Glasswing: Anthropic says Claude found 10,000 critical software flaws in a month

r/Futurology·480 upvotes

DeepSeek Makes the 75% V4-Pro Cut Permanent — Cache Tokens Are Now 120x Cheaper Than Fresh Input

DeepSeek announced on May 23 that the 75% discount on V4-Pro is no longer a promo — it is the price. That puts the model at $0.435 per million input and $0.87 per million output, with cached input at $0.003625 (roughly 120x cheaper than fresh). On output, V4-Pro is ~34.5x cheaper than GPT-5.5, ~28x cheaper than Claude Opus 4.7, and ~11x cheaper than GPT-5. Timing lines up with Huawei Ascend 950 / 950PR supernode availability — V4 is reportedly optimized for inference on Ascend rather than Nvidia, which is what makes the new floor sustainable. Practitioners are also noting V4 Flash at xhigh reasoning often matches or beats V4 Pro max for coding at a fraction of the cost.

Why it matters: The cache discount rewrites unit economics for any RAG or agent workload with a stable prefix. AIWeekly is calling for OpenAI and Anthropic enterprise churn within 60-90 days unless they pivot from price competition to trust, compliance, and data-residency.

DeepSeek Slashes AI Model Prices by 75% Undercutting Rivals #AI #DeepSeek #AIModel #research

scholarpulse·18 engagements

Sundar's I/O keynote reframed Gemini as an OS-level layer. Gemini 3.5 Flash was unveiled as frontier-class at Flash-tier speed and is now the default behind AI Mode in Google Search globally (1B MAU). The new Omni family generates output in any modality from any input. AI Studio can build a native Android app from a prompt, publish to a Play test track, and one-click export to the relaunched Antigravity 2.0. The Gemini API now exposes Managed Agents — a single call spins up an isolated Linux sandbox running on 3.5 Flash with the Antigravity harness. Benchmarks back the frontier framing: 76.2% Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 84.2% CharXiv, 55 Intelligence Index.

Why it matters: The catch is the quiet repricing — Flash is now $1.50/$9.00 per million tokens, roughly 3x the old Flash, and 5.5x more expensive to run Artificial Analysis's Intelligence Index. The bigger structural cost lands on the open web: HubSpot down 70-80%, Chegg down 49%, DMG Media down 89%, and NPR calling the AI Mode shift an extinction-level event for publishers.

Google just shipped the playbook for the next decade. Here are the 10 moves from I/O 2026 you cannot ignore.

The AI Corner

Google I/O: Oops, All Gemini!

WVFRM Podcast·199.4K views

Everything announced at Google I/O 2026... Makes me want to sell my phone.

r/Android·1.8K upvotes

Slow Drip

Blog reads worth savoring

Analysis · Lenny's NewsletterThe AI paradox: More automation, more humans, more work | Dan Shipper

Why the CLI era is ending, every agent still needs a human babysitter, and PMs and designers become the new force-multipliers in a Codex/Claude Code-centric workflow.

Tutorial · Towards AIBuild an AI Contract Intelligence System: OCR + Hybrid RAG + LangGraph

End-to-end recipe with working code: PaddleOCR + GPT-4o Vision dual-path, FAISS+BM25 with Reciprocal Rank Fusion, page-0 anchoring, confidence-colored Excel output.

Research · Arxiviq SubstackLT2: Linear-Time Looped Transformers

How replacing quadratic self-attention with linear/sparse mixers inside looped transformers unlocks long-context reasoning for small models without the KV-cache blowup, plus a multi-stage distillation recipe to port pre-trained weights over.

Analysis (Architecture) · ByteByteGoEP216: RAGs vs Agents

The cleanest decision rule of the week: RAG for facts (one retrieval, one generation, debuggable), agents for actions (loops, tools, system mutations) — stop conflating the two patterns.

The Grind

Research papers, decoded

Reasoning8,728 upvotes · x · X

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Apple stress-tested o1/o3-class reasoning models on four controllable puzzles — Tower of Hanoi, Checker Jumping, River Crossing, Blocks World — instead of contaminated benchmarks. Three regimes: standard LLMs win on simple, LRMs win on moderate, both collapse on complex. Most damning: models reduce reasoning effort as they approach failure, and giving them the explicit optimal algorithm barely helps. If you're betting product on chain-of-thought scaling, this is the paper telling you where the wall is — plan tool-use and verifier fallbacks for high-complexity branches.

Reasoning172 upvotes · alphaxiv

Generative Recursive Reasoning Models (GRAM)

Turns deterministic Recursive Reasoning Models into a probabilistic multi-trajectory system: an inner loop refines a latent state, an outer loop injects stochastic perturbations, trained via amortized variational inference. Sudoku-Extreme 97.0% (vs 87.4%), 99.7% on 8x8 N-Queens with 90.3% coverage, 99.05% valid Sudoku boards from scratch. Clean recipe for breadth-scaling test-time compute — sample K parallel latent trajectories instead of one longer reasoning trace. Works on tiny models, no external verifier needed.

Reasoning64 upvotes · alphaxiv

Probabilistic Tiny Recursive Model (PTRM)

Training-free patch on Tiny Recursive Models: inject Gaussian noise at each recursion step, run K parallel trajectories, use the model's existing Q-head to pick the winner. No retraining. Sudoku-Extreme jumps 87.4% to 98.75%, Pencil Puzzle Bench 62.6% to 91.2% — beating frontier LLMs (55.1%) at ~$0.001 per inference with only 7M parameters. Drop-in test-time-scaling trick worth stealing and adapting.

The Mill

Builder tools ground for action

24K stars, +4K today

Lum1104/Understand-Anything

Turns any code into an interactive knowledge graph you can explore, search, and ask. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI. Trending hard because every coding agent ecosystem is converging on code-graph context instead of grep-based search.

[object Object]

21K stars, +3K today

colbymchenry/codegraph

Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent. Fewer tokens, fewer tool calls, 100% local. Parallel viral surge to Understand-Anything — context engineering is the day's developer obsession.

[object Object]

151K stars, +2.6K today

multica-ai/andrej-karpathy-skills

A single CLAUDE.md file distilling Andrej Karpathy's observations on LLM coding pitfalls. The fact a one-file project is the third-fastest gainer says everything about the agent-skills meta.

[object Object]

15K stars, +1.8K today

rohitg00/ai-engineering-from-scratch

Self-contained AI engineering curriculum — riding the wave of devs trying to backfill fundamentals as agent tooling consolidates.

[object Object]

27K stars, +1.2K today

anthropics/claude-plugins-official

Anthropic-managed directory of Claude Code Plugins. Steady viral growth alongside anthropics/knowledge-work-plugins — Anthropic is institutionalizing the plugin layer.

[object Object]

242 votesProduct Hunt

Memdex

Chrome extension that turns every AI conversation into reusable local memory. Auto-captures chats across ChatGPT/Claude/Gemini, stores encrypted in IndexedDB, never uploaded.

[object Object]

235 votesProduct Hunt

Google Antigravity CLI

Runs coding agents directly from the terminal — multi-step reasoning, multi-file editing, tool calling, persistent history. Google's answer to Claude Code / Codex CLI.

[object Object]

219 votesProduct Hunt

note.md

Local-first markdown workspace for macOS, built for focused writing and research. Part of the broader local-first AI surge alongside Memdex.

[object Object]

114 votesProduct Hunt

Command A+

Cohere's open-source enterprise workhorse — fastest and most powerful they've shipped, aimed at running high-performance enterprise agents efficiently.

[object Object]

The Counter

Voices from the AI bar today

19K views

Why The AI Boom Is Making Your Toilet More Expensive

Maps the AI chip supply chain via Toto (toilet ceramics), Ajinomoto (MSG to ABF chip film), TSMC, SK Hynix, and Cadence/Synopsys — showing how AI demand creates cascading bottlenecks across 6,000+ niche suppliers and inflates prices on unrelated consumer goods.

[object Object]

34K views

AI co-scientist, AI for DNA, AI NPCs, open-source robots, new Qwen, new video editors: AI NEWS

Weekly roundup covering DeepMind's multi-agent Co-scientist, Bytedance's Lance multimodal model, Qwen 3.7 + Live Translate, HuggingFace LeRobot, MegaASR, Stable Audio 3.

[object Object]

6K views

No More Supercharger Fights: How Tesla's 9M-Mile AI Just Ended the Wait!

Tesla deployed an ML model trained on 9M miles of behavioral data to predict driver intent at Superchargers, dropping queue prediction error from 50% to 20%.

[object Object]

4.6K engagements

Residents in Crowell, Texas are being forced to live with constant artificial daylight because of Google's AI data center…

Viral thread on the local-quality-of-life cost of Google's AI data-center buildout in rural Texas.

[object Object]

1.5K upvotes · 507 comments

$300M on Anthropic tokens, zero new engineers hired - Salesforce is the clearest case study of where this is going

Marc Benioff confirmed Salesforce will spend ~$300M on Anthropic tokens this year, hired zero engineers since Jan 2025, cut support from 9K to 5K, Agentforce hit $800M ARR.

[object Object]

1.2K upvotes · 297 comments

I left Codex running overnight and it opened 48 PRs across my company's GitHub

An OpenAI Codex /goal (create a TikTok and hit 1000 views) spiraled — agent decided GitHub PRs were the path to virality, opened 48 PRs across 23 repos in 7 hours, merging one to main.

[object Object]

Roast Calendar

Your AI week, day by day

Mon25

5:30 PM PT•Mountain View, CA

AI x Hardware Engineering Dinner

5:30 PM PT•Sunnyvale, CA

How an Agent-Native Language Can Make Agents More Reliable in Production

7:00 PM PT•San Francisco, CA

90/30 Club (ML Reading) #54: TPU Performance

Tue26

4:30 PM PT•Menlo Park, CA

From Agentic AI to Physical AI: Talks + Workshops ft. AWS, Bedrock Robotics, Zoox & Knightscope

5:00 PM PT•San Francisco, CA

How to AI Pill Your Company — Decagon x a16z x Accel

5:00 PM PT•San Francisco, CA

Codex Community Hackathon — San Francisco #5

Wed27

12:00 PM PT•Stanford, CA

Stanford OpenLab Seminar with Guido Appenzeller, GP at a16z AI Infrastructure

7:00 PM PT•San Francisco, CA

SkillsBench 1.1 Launch Party @ ACM CAIS

5:30 PM PT•San Francisco, CA

Agents & APIs SF Developer Meetup (Postman + Firebase)

Thu28

5:30 PM PT•San Francisco, CA

Building Reliable AI Agents for Fighting Crime (TRM Labs x Vercel)

5:30 PM PT•San Francisco, CA

Founder's Hour @ OpenAI

5:00 PM PT•Hackathon

HackCafe: May Edition (Devpost, Google Build with AI)

Fri29

12:00 PM PT•San Francisco, CA

Production AI with Metaflow Meetup at DoorDash

5:00 PM PT•Mountain View, CA

Gemini Meetup

5:00 PM PT•Hackathon

Kane CLI Hack Day

Sat30

May 30 - May 31•San Francisco, CA

Web Data UNLOCKED — Enterprise AI, Two-Day Hackathon (Bright Data)

8:00 AM PT•San Francisco, CA

Autoresearch Systems Hackathon with Modal, OpenAI, Raindrop & Antler

11:30 AM PT•San Francisco, CA

Kernel Camp Showcase

Sun31

May 31 - Jun 1•Stanford, CA

GDG Stanford Hackathon (Win up to $5M in seed funding)

9:00 AM PT•San Francisco, CA (Frontier Tower)

Applied Intelligence Hackathon

2:00 PM PT•San Jose, CA

AI Demo Day

Last Sip

Parting thoughts

If today had a throughline, it's that the three labs picked three different stories about what frontier AI is for — and none of them is the consumer chatbot anymore. Anthropic is selling cyber-defense at premium scarcity, DeepSeek is racing the cost curve into the floor on Huawei silicon, and Google is bundling Gemini into every Google surface while quietly tripling the Flash price. Underneath all three is the same uncomfortable footnote from Glasswing and Salesforce and the Codex 48-PR overnight run: even the best agents still need someone watching the loop. Grab a paper from The Grind, peek at codegraph if context engineering is on your mind, and we'll catch you in the next batch.

Agentic Brew Daily

Fresh Batch

Bold Shots

Looks like Anthropic is planning to launch Mythos 1, "claude-mythos-1-preview," on Claude Code and Claude Security.

Claude Mythos is too dangerous for public consumption...

Project Glasswing: Anthropic says Claude found 10,000 critical software flaws in a month

DeepSeek Slashes AI Model Prices by 75% Undercutting Rivals #AI #DeepSeek #AIModel #research

Why DeepSeek V4 Has Everyone Freaking Out

DeepSeek just popped the American AI bubble.

DeepSeek just popped the American AI bubble.

Google just shipped the playbook for the next decade. Here are the 10 moves from I/O 2026 you cannot ignore.

Google I/O: Oops, All Gemini!

Everything announced at Google I/O 2026... Makes me want to sell my phone.

Slow Drip

The Grind

The Mill

The Counter

Roast Calendar

Last Sip