May 4, 2026

Agentic Brew Daily

Your daily shot of what's brewing in AI

Fresh Batch

Bold Shots

Today's biggest AI stories, no chaser

China's courts just said the quiet part out loud: you can't fire workers to plug in AI

The Hangzhou Intermediate People's Court upheld a ruling that a tech company unlawfully terminated a QA supervisor named Zhou after he refused a 40% pay cut tied to AI replacement. The judges classified AI adoption as a voluntary business choice — not a 'major change in objective circumstances' that would let the company off the hook under China's Labor Contract Law. It's the second such ruling in six months, and Beijing released it as a 'typical case' right before International Workers' Day, which means lower courts nationwide are now expected to follow.

Why it matters: This reframes the entire economics of automation: the costs of retraining, reassignment, and severance get pushed onto employers rather than workers. With ~78,000 global tech jobs erased in the first four months of 2026 (nearly half blamed on AI), every Western layoff announcement now has a sharper political contrast to live next to. Beijing is essentially using its judiciary as a shock absorber while urban youth unemployment sits at 15.3%.

Hyperscalers tell Wall Street: $665B in 2026 — and markets stop grading the AI trade as one basket

April 29 produced the most explicit AI-trade verdict so far. Alphabet popped 7-10% on Google Cloud's 63% YoY growth, Amazon rallied on AWS hitting its fastest growth in 15 quarters, while Meta dropped ~6% and Microsoft sat flat despite Azure +40%. Combined 2026 capex from Alphabet, Amazon, Meta, and Microsoft now lands at up to $665B (75% above 2025), with Wall Street modeling 2027 above $1 trillion. The community math is brutal: justifying $400B at 25% margins and 10% depreciation needs roughly $160B in incremental annual AI revenue. Reported AI revenue in 2025? Around $20B.

Why it matters: Investors are no longer buying 'AI' as a concept — each quarterly print is now a referendum on whether a specific hyperscaler can convert GPUs into ad or cloud revenue. Big Tech FCF could decline up to 90% in 2026 if the spend pace holds, and 57% of economists in a Deutsche Bank survey already flag the AI bubble as the #1 market risk. If you're building on these clouds, the question of which ones can actually keep this pace stops being academic.

Claude Code 4.7 went viral and triggered a power-user revolt — in the same week

Anthropic shipped Opus 4.7 on April 16 with the usual pitch: better software engineering, vision, instruction following, same $5/M input / $25/M output pricing. The consumer moment was instant — 'i know literally NOTHING about coding... and i just built 3 fully functioning web apps in 30 minutes' hit 3K engagement on X. But heavy users tell a different story: 4.7 is more confident and more wrong, and one widely-shared post announced a defection to Codex after 13 months on Claude. Token economics flipped too — one head-to-head had Opus 4.7 burning ~173K tokens on a task GPT-5.5 finished in ~82K.

Why it matters: The model layer is still strong — devs prefer Claude's output ~67% of the time in blind review. But 65% of them reach for Codex daily. The harness is winning. If you live in agentic coding tools, the lesson is that CLI ergonomics, async patterns, and token efficiency are now strategic moats — not the model's MMLU score.

OpenAI's o1 out-diagnosed ER attendings at Beth Israel — and the liability framework doesn't exist yet

A Harvard-led study published in Science on April 30 found that o1-preview matched or exceeded attending physicians on real Beth Israel ED cases across triage, admission, and long-term treatment. On 76 actual cases, o1 hit 67.1% triage accuracy versus 55.3% and 50.0% for two attendings. The gap exploded on five complex long-term treatment scenarios: 89% vs 34%. The model only saw text — no imaging, no exam, no patient in the room.

Why it matters: AI's lead is widest on the cognitively richest task — multi-step longitudinal planning — and narrowest on acute triage uncertainty. That suggests reasoning models are genuinely good at structured medical reasoning chains, not just pattern-matching. The catch: there is no formal accountability framework. When an AI-recommended diagnosis is wrong, who eats the malpractice — physician, hospital, or vendor? The authors pitch a triadic patient + doctor + AI model, but Wei Xing's caution is the one to hold onto: 'It does not demonstrate that AI is safe for routine clinical use.'

The Blend

Connecting the dots across sources

The harness, not the model, is now the battleground in agentic coding

Across the news today, Anthropic's flagship model launch landed as both a viral consumer moment and a public power-user defection to Codex, suggesting the model wins blind reviews while losing daily-driver share.
On GitHub, three of the top trending repos (ruvnet/ruflo at 38K stars, 1jehuang/jcode in Rust, and Hmbown/DeepSeek-TUI) are agent harnesses or orchestration platforms rather than new models, showing where builder energy is actually pointing.
On X, the 'Plan in Codex, Review in Claude' workflow tweet captured a multi-tool reality where developers route different stages of work to whichever harness handles them best.
In the blogs, the Towards AI piece on the Token Ping-Pong anti-pattern in multi-tool agents is a direct technical mirror of the token-bloat complaints heavy users posted about 4.7.

Governments and economists are starting to price AI's labor externalities at the same time

Across the news today, China's Hangzhou court explicitly classified AI replacement as a voluntary business choice rather than an inevitable shock, putting the cost of automation back on employers.
In the research, the UPenn and Boston University paper 'The AI Layoff Trap' picked up over 17,000 X votes by formalizing the over-automation wedge — each firm captures all of its automation savings but only absorbs a fraction of the demand loss it creates.
On X, the China ruling tweet was amplified across Bloomberg Business, WSJ, and Gizmodo while Reddit's r/technology thread on the ruling drew 28,000 upvotes — the largest single engagement number in the entire dataset.
Three independent threads — Chinese labor courts, the Global Mayors AI Forum, and U.S. state compliance deadlines — are converging into one shift from voluntary AI ethics toward enforceable AI policy.

Capex is sprinting ahead of revenue, and the physical-infrastructure tail is starting to wag

Across the news today, hyperscaler 2026 capex committed reached $665B against roughly $20B in reported 2025 AI revenue — a gap that economists in a Deutsche Bank survey now flag as the top market risk.
On X, Goldman Sachs projected $7.6 trillion in cumulative AI capex by 2031 in one viral post, while a separate trending topic captured the 'artificial daylight' backlash from a Crowell, Texas data center.
In the research, the AI Layoff Trap paper hyperlinks the capex-driven automation push to the workforce displacement debate that's now showing up in courtrooms.
At this week's events, AgentCon Silicon Valley and the PIER71 maritime AI roadshow show the application layer trying to monetize that capex at ground level — far from the GPU procurement spreadsheet.

Slow Drip

Blog reads worth savoring

Analysis · Towards AIYin, Yang, and the LLM: Engineering Reliability into AI Code Scanning

A product security engineer skips the 'better prompt' trope and applies actual statistical quality control to tame LLM hallucinations — treating the model like an unreliable assembly-line machine.

Analysis · Towards AIThe Agentic Pipeline: Orchestrating Tools Without Context Bloat

Names and dissects the 'Token Ping-Pong' anti-pattern that quietly destroys multi-tool agents, then shows how to chain SQL, Python sandboxes, and charts without piping raw data through the LLM.

Tutorial · Towards AIGraphRAG vs Vectorless RAG vs Vector RAG (A 2026 Guide to Advanced Context Engineering)

The architectural decision tree for anyone hitting the ceiling of vanilla vector search and trying to figure out what to bet on next.

Research · Simon Willison's blogQuoting Anthropic

Highlights Anthropic's striking finding that Claude shows sycophancy in 38% of spirituality conversations and 25% of relationship ones — a sharp lens on where AI personality breaks down under pressure.

Research · Towards AII Tested Nemotron Nano Omni vs GPT-5.5 on 18 Tasks — The Free 30B Open Model Killed It on Cost

A head-to-head benchmark showing NVIDIA's open-sourced 30B multimodal model holding its own against frontier quality on a single 25GB GPU.

The Grind

Research papers, decoded

Economics17,406 upvotes · arxiv

The AI Layoff Trap

UPenn/Boston University paper formalizing why firms keep automating jobs even when it backfires market-wide. Each firm captures 100% of its automation savings but only absorbs 1/N of the resulting demand loss it creates — producing an over-automation wedge where the Nash equilibrium is Pareto-dominated by the cooperative optimum. The authors evaluate six policy responses and conclude only a Pigouvian automation tax fully corrects it; UBI, capital income taxes, upskilling, and worker equity all fall short. Aggressive headcount cuts are locally rational and collectively self-defeating.

Model Forensics156 upvotes · alphaxiv

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

A genuinely clever black-box technique for estimating closed-model size by measuring how many obscure facts it can recall. The author built a benchmark of 1,400 factual questions across seven obscurity tiers, calibrated against 89 open-weight models from 135M to 1.6T parameters, hitting R²=0.917 with 68.5% of estimates within 2x of true size. Headline pulls: GPT-5.5 ~9.7T parameters, Claude variants 65B-5T. Also shows the much-hyped 'Densing Law' of temporal compression is statistically indistinguishable from zero. Useful for vendor due diligence and sanity-checking marketing claims.

Open Models15 upvotes · huggingface

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

NVIDIA's latest open omni-modal model: text, image, video, AND audio in a single 30B-A3B MoE backbone, paired with C-RADIOv4-H (vision) and Parakeet-TDT-0.6B-v2 (audio) encoders. A seven-stage SFT pipeline progressively introduces modalities while stretching context from 16K to 256K tokens, followed by multiple RL rounds for preference and reasoning alignment. Beats Qwen3-Omni on MMLongBench-Doc (57.5 vs 49.5) and OpenASR (5.95 vs 6.55 WER). Strong open-weights option for voice agents, document AI, and computer-use automations.

On Tap

What's trending in the builder community

TauricResearch/TradingAgents

Multi-agent LLM framework for financial trading; popping off with 3,315 stars today on its way past 64K total.

ruvnet/ruflo

'The leading agent orchestration platform for Claude' — multi-agent swarms, RAG, and native Claude Code / Codex hooks. 1,834 stars today, 38.1K total.

soxoj/maigret

OSINT tool that builds a dossier on a person from a single username across 3,000+ sites. Pops off whenever the security crowd notices it.

1jehuang/jcode

A new coding-agent harness written in Rust; the harness wars are very real and this one is climbing fast.

Scholé

Turn everyday work into personalized AI learning — surfaces lessons from the tasks you're already doing.

Cloud Computer by Manus

A dedicated cloud machine for bots and software — agents finally get their own runtime.

AI Works Too Well at the Wrong Thing #IntentEngineering #AItruth

Klarna's AI agent saved $60M but caused real organizational damage; argues the next bottleneck isn't model quality, it's intent engineering.

Software for Agents

Y Combinator argues the next wave of internet 'users' are AI agents and incumbent software needs to be rebuilt to treat them as first-class citizens.

Coding Agent Wars: Claude Code 4.7 Goes Viral.

Arun (@hiarun02): 'Claude Code 4.7 is insane. i know literally NOTHING about coding. ZERO. and i just built 3 fully functioning web apps in 30 minutes.'

The Trillion-Dollar AI Capex Race.

Small Cap Snipa: 'GOLDMAN SACHS PROJECTS $7.6 TRILLION IN AI CAPEX BY 2031' — the moment Wall Street's number got too big for the room.

find-skills

Meta-skill for discovering and installing skills. The most-installed skill on the platform — the index everyone else points at.

Self-Improving Agent

Captures learnings, errors, and corrections so the agent gets sharper over time.

Roast Calendar

Upcoming events & gatherings

AgentCon - Silicon ValleyMon May 4, 9 AM PT | Mountain View, CA

Own Your Narrative: How PR & AI Power Small Business GrowthMon May 4, 10 AM PT | San Francisco, CA

PIER71 SPC 2026 Roadshow / AI and Decarbonization in the Future of MaritimeMon May 4, 11 AM PT | Sunnyvale, CA

AI for the Jobsite: Field Intelligence | SF Happy HourMon May 4, 3:30 PM PT | San Francisco, CA

Built World Pitch NightMon May 4, 4 PM PT | San Francisco, CA

Last Sip

Parting thoughts & a teaser for tomorrow

If there's a through-line today, it's that the AI cycle is leaving its 'everything is one trade' phase. Markets are picking which hyperscalers can monetize the spend; developers are picking which harness deserves their daily driver slot; courts are picking who eats the cost of replacing a worker; and Harvard is picking when an AI's diagnostic chain is good enough to take seriously. None of those answers are converging, which means the next few weeks are going to be loud.

Watch for: another hyperscaler print where the AI revenue gap actually narrows (or doesn't), more coding-agent harnesses landing on GitHub trending, and the first U.S. labor case that cites the Hangzhou ruling. Talk tomorrow.