Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
Bold Shots
Today's biggest AI stories, no chaser
Google made Gemini 3.5 Flash generally available the same day it was announced, and rolled it across Search's AI Mode, the Gemini app, the API, and Antigravity 2.0. It scores 76.2% on TerminalBench 2.1, 83.6% on MCP Atlas, and 84.2% on CharXiv — beating their own 3.1 Pro on nearly all benchmarks at less than half the price of GPT-5.5 or Opus 4.7. Antigravity 2.0 ships as a standalone desktop app with a CLI, SDK, and Managed Agents service, and a new $100/mo AI Ultra tier bundles YouTube Premium Lite and the always-on Gemini Spark agent.
Why it matters: A "Flash" model outscoring last cycle's flagship inverts the small-vs-frontier hierarchy and reprices the agent tier — 3.5 Pro plays orchestrator while thousands of Flash sub-agents fan out underneath. Google's live demo built an OS in 12 hours with 93 sub-agents for under $1,000 in tokens.
Meta started its ~8,000-person layoff at 4am local on May 20, alongside ~7,000 forced reassignments into four new AI units and ~6,000 cancelled open roles — a 14,000-position effective reduction during a quarter that just printed $56.3B in revenue and $26.8B in net income. CTO Andrew Bosworth also confirmed the Model Capability Initiative, mandatory laptop monitoring software that captures keystrokes, mouse activity, and screenshots across Google, GitHub, Slack, and internal tools to train AI agents on how employees actually work. UK staff began organizing under UTAW with "Employee Data Extraction Factory" flyers.
Why it matters: This is the template for AI-era restructuring — cuts during record profits, employees reassigned rather than rehired, and the workers themselves generating the training data for the agents that will replace the next cohort. The 2026 AI capex guide of $125-145B is roughly double 2025.
Andrej Karpathy announced on May 19 that he is joining Anthropic's Claude pre-training team under Nick Joseph, with a mandate to use Claude itself to accelerate pre-training research — essentially Claude helping make the next Claude. The move pauses but doesn't end his Eureka Labs education work. Karpathy is the third senior ex-OpenAI figure to land at Anthropic after Jan Leike and John Schulman, and joins six CTOs from billion-dollar companies who've taken IC research roles there in the last 18 months.
Why it matters: Karpathy reportedly turned down $100M Meta signing bonuses to take this seat — strong signal that team density and mandate now beat headline comp at the frontier. The r/ClaudeAI read was blunt: "Karpathy was literally hired for RSI" (recursive self-improvement).
An internal OpenAI general-purpose reasoning model produced a one-shot counterexample to Paul Erdos's 1946 unit distance conjecture, constructing point sets in the plane with more unit-distance pairs than the previous best-known squarish-grid constructions. The proof routes through algebraic number theory — infinite class field towers and CM number fields — applied to a combinatorial-geometry problem. Nine mathematicians including Fields medalist Tim Gowers, Noga Alon, and Melanie Wood co-signed a Remarks paper verifying the result. Reported compute: 5-32 hours, ~$120-$1,000 in tokens, with the key insight landing on page 39 of a 125-page chain of thought.
Why it matters: First AI-generated proof of a celebrated open problem from a general-purpose reasoning model — not a math-tuned scaffold. Gowers said he'd recommend acceptance to the Annals of Mathematics "without any hesitation," and Wood noted no human team was likely to have even tried this attack route because it crossed disciplinary lines mathematicians implicitly enforce.
Nvidia reported Q1 FY2027 revenue of $81.62B against an estimate of $79.19B, adjusted EPS of $1.87 (est $1.75), and 75% adjusted gross margin. Data-center revenue hit an all-time high, and Q2 guidance came in at $91B — comfortably above consensus. The buildout has not peaked.
Why it matters: Nvidia's quarter is the demand-side mirror to Meta's $125-145B capex and Google's $100 Ultra pricing — the same dollars Big Tech is committing to AI infrastructure flow straight to Nvidia's data-center line. A 75% gross margin while the largest customers publicly debate their own AI ROI is the cleanest signal that compute is still the bottleneck.
The Blend
Connecting the dots across sources
Gemini 3.5 Flash is not just a launch — it's a price-performance reset everyone is racing to verify
- Across the news today, Google made Flash the default in Search, the Gemini app, and the API, and benchmarked it as faster and cheaper than rival flagships.
- On YouTube, the BridgeMind "Vibe Coding With Gemini 3.5 Flash" video ran a live head-to-head against GPT-5.5 and Claude Opus 4.7 within hours of the keynote.
- On X, a TPU 8i throughput demo of 600-1480 tokens per second went viral alongside Pichai's announcement.
- On the blog side, Simon Willison's deep read flagged a 3-6x price hike on Flash and the new Interactions API as the developer-facing story most coverage missed.
The agent platform is now the product — and the toolchain around it is exploding in parallel
- In the news, Google shipped Antigravity 2.0 as a standalone CLI plus SDK plus Managed Agents service, while Anthropic poached Karpathy to bootstrap Claude pre-training with Claude.
- On GitHub, three of the day's top trending repos are agent harnesses and code graphs: a Karpathy-derived CLAUDE.md skills repo gained 2,620 stars, a pre-indexed code knowledge graph for Claude Code added 1,910, and the obra/superpowers agent framework picked up another 1,776.
- On Product Hunt, Cursor's Composer 2.5 and the Linux ops agent CtrlOps both placed in the top five.
- In the research, a paper on Self-Distilled Agentic Reinforcement Learning shows multi-turn agents internalizing teacher signal so they no longer need the teacher at inference — exactly the production reliability gap builders are blogging about.
Compute is consolidating while the workforce is being restructured around it
- In the news, Nvidia booked $81.6B with a $91B forward guide the same week Meta cut 8,000 employees and forcibly moved 7,000 more into new AI units.
- On X, the trending "AI Builders' Land Grab" thread captured the mood — one widely-shared tweet mocked most AI startups as wrappers that take user input, hit an API, and charge $19 a month.
- On the blog side, Gergely Orosz's survey of 900-plus engineers found AI is amplifying existing engineering culture and quietly stranding a shrinking senior bench with mountains of AI-generated code to maintain.
- In the research, Anthropic's "2028: Two scenarios for global AI leadership" was the day's top-voted paper, framing the next 24 months as a capital and policy fight rather than a model-quality one.
Slow Drip
Blog reads worth savoring
Snap's Bento platform turned every user request into thousands of model evaluations — this teardown shows the two-stage retrieval, GPU/CPU split, and serialization tricks that cut data-plane costs 10x.
900-plus engineers surveyed: AI is amplifying existing culture good and bad, code quality is sliding into "AI slop," and a shrinking senior bench is maintaining what juniors generated.
An open-source folder-of-markdown system where Claude reads before answering and sweeps every Friday — validated at 99.5% across 17 PM scenarios with no vector DB.
Three implementation paths (self-hosted ECS, managed AgentCore, Anthropic SDK proxy) that cut tokens 87-92% and dropped one real audit workload from $15.6K to $1.56K per month.
Willison digs into the developer doc nobody else read and surfaces the 3-6x price hike, the new Interactions API, and the GA-across-Search-and-Gemini implications.
Qwen3.7-Max runs autonomously for 35 hours executing 1,000-plus tool calls, while the new Zhenwu M890 chip triples performance with 144GB on-chip memory.
Two founders, no team, no funding, $230M ARR — and the acquirer was so impressed they renamed the entire 14-year-old parent company after the startup they bought.
Four months and $1,600 on six AI distribution stacks produced zero signups, until a spreadsheet revealed founders are automating exactly the wrong 30% of their work.
Part 1 of a deep dive into TurboQuant, the quantization technique tackling the KV cache bottleneck quietly capping every long-context inference workload.
The Grind
Research papers, decoded
Anthropic lays out two divergent 2028 trajectories for global AI leadership: democratic labs holding the frontier vs. geopolitical rivals catching up. It's a policy-positioning piece tied to safety, compute access, and export controls — and the lens Anthropic is using as it absorbs talent like Karpathy. The day's top-engagement paper.
ELF is a diffusion language model that stays in continuous embedding space until the final step, then snaps to discrete tokens via a shared-weight network. Because it stays continuous, classifier-free guidance and few-step sampling tricks from the image world port over cleanly. Reportedly matches or beats leading discrete and continuous DLMs while training on roughly 10x fewer tokens — a much cleaner template for non-autoregressive text generation.
SDAR combines reinforcement learning with token-level self-distillation for multi-turn agents, using a sigmoid gate driven by the log-prob gap between a privileged teacher and the student to decide when to trust distillation vs. raw RL reward. On ALFWorld, WebShop, and Search-QA with Qwen2.5/Qwen3, it beats GRPO by +7-10% and holds that gain after the teacher is removed at inference — real internalization, not test-time crutching.
VGGT-Omega scales feed-forward 3D reconstruction by replacing global cross-frame attention with register attention — cross-frame info passes only through a small compact register set, cutting GPU memory by ~70% and enabling training on 15x more supervised data plus large unlabeled video corpora. Camera pose accuracy on dynamic Sintel scenes improves by 77%, and the learned geometric registers measurably help downstream vision-language-action robot models.
SANA-WM is a 2.6B-parameter open-source world model that generates 720p, one-minute videos with precise 6-DoF camera control, using a hybrid attention stack (frame-wise Gated DeltaNet for long-context memory + softmax attention for recall) plus a two-stage long-video refiner. Trained on just ~213K public clips in 15 days on 64 H100s, the distilled variant denoises a 60-second clip in 34 seconds on a single RTX 5090 with NVFP4 — roughly 36x higher throughput than open baselines at comparable quality.
On Tap
What's trending in the builder community
Your Personal AI super intelligence. Private, simple, and extremely powerful. Rust.
A single CLAUDE.md file derived from Karpathy's observations on LLM coding pitfalls.
Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode. 100% local.
An agentic skills framework and software development methodology that works.
Give your agent a real number and voice to make calls.
Mobile tests that write, run, and fix themselves.
Cursor's most powerful model yet.
Collaboration platform where teams work with AI together.
Jordan reframes AI as a collective economic and statistical system.
Four-level agent maturity model with five rules for state-machine agents.
Live benchmark of Gemini 3.5 Flash vs GPT-5.5 and Claude Opus 4.7.
Features METR on the Frontier Risk Report and independent alignment evaluations.
Tom Blomfield on making organizational processes legible to AI.
@MKBHD's tongue-in-cheek roll-call of Google's I/O product blitz drove the day's biggest topic on X.
OpenAI announced its general-purpose reasoning model produced a one-shot counterexample to a 1946 Erdős conjecture, verified by Tim Gowers and eight other mathematicians.
Builders take swings at thin OpenAI-wrapper startups while job-board posts pile up — the day's hot debate on what AI is actually displacing.
A draft executive order would require AI labs to share frontier models with the U.S. government 90 days before public release — a voluntary framework with sharp open-source implications.
Discover and install skills from the open agent skills ecosystem.
Distinctive, production-grade frontend interfaces that reject generic AI aesthetics.
70 rules across 8 categories for React/Next.js refactoring and code generation.
Work with Microsoft Foundry: model discovery, agent lifecycle, eval workflows.
Captures learnings, errors, and corrections for continuous improvement.
Security-first skill vetting before installing any third-party skill.
Roast Calendar
Upcoming events and gatherings
Hands-on builder night turning natural-language prompts into working mobile apps, sites, and tools alongside 60-plus women shipping with AI.
High-signal Startup Valley mixer aimed at AI founders, operators, and investors looking for collaborators and capital.
Low-key SF coding hang centered on vibecoding workflows — bring a laptop, ship something small, swap tricks with other builders.
AI founders bond over go-karting, part of an outdoor community for Bay Area builders who want real-world connection.
Intimate steakhouse dinner with HR, TA, and people-ops leaders digging into how AI is reshaping recruiting workflows in the Bay.
Last Sip
Parting thoughts and a teaser for tomorrow
If you zoom out on the past 48 hours, the through-line isn't any single model. It's that the unit of capability is shifting from "one big model answering one question" to "one orchestrator fanning out a thousand cheap, fast workers." Google's demo of building an OS with 93 sub-agents for under a grand, Karpathy joining Anthropic to bootstrap Claude with Claude, the open-source skills and code-graph repos all gaining stars together — same idea, three angles. The fun question to sit with tonight: if a Flash-tier model can already orchestrate itself into a 12-hour OS, what's the smallest, most boring task in your own workflow that no longer needs a human in the middle? See you tomorrow.