Agentic Brew Daily
Your daily shot of what's brewing in AI
Fresh Batch
- Meta rerating from cloud customer to competitor knocked ~14-17% off CoreWeave and Nebius the same week Nvidia began backstopping those neoclouds' unsold GPUs.
- Palantir's Karp calling token pricing "irresponsibly over-sold," DeepSeek adding afternoon surge pricing, and builders running LLMs locally all signal a revolt against per-token economics.
Bold Shots
Today's biggest AI stories, no chaser
The US Commerce Department lifted export controls on Anthropic's Fable 5 and Mythos 5 on June 30, ending an 18-day government-ordered global shutdown that started June 12 after Amazon researchers reported a jailbreak that got Fable 5 to identify vulnerabilities and write exploit code. Anthropic won the reversal by training a new safety classifier that blocks the technique in over 99% of attempts (Commerce says 99.9%), rerouting flagged requests to the weaker Claude Opus 4.8. Fable 5 went back up globally July 1 across Claude.ai, Claude Code, and AWS Bedrock, though Bedrock access now requires a 30-day data-retention opt-in.
Why it matters: This is the first time Washington used export-control authority to effectively hit a kill switch on a live frontier model, setting precedent by enforcement rather than rulemaking. The deal that restored access becomes the template every lab gets measured against next time, and the classifier fix quietly degrades the model for legitimate defensive-security work, so the win is contested.
Bloomberg reported July 1 that Meta is building "Meta Compute" to sell its excess AI compute, competing directly with AWS, Azure, and Google Cloud. There are two models: API access to hosted models (including the closed-weight Muse Spark) and raw capacity in the neocloud/CoreWeave style. The market split instantly, with Meta up 8.81% to $612.91 (roughly $149B added) while CoreWeave dropped 13.92%, Nebius 17.01%, and IREN about 6.5%.
Why it matters: Meta is turning on the very suppliers it just paid tens of billions to lease, including a ~$21B CoreWeave deal through 2032. That reframes Meta from anchor customer to competitor and forces a repricing of the whole neocloud sector, while raising the question of whether "excess compute" is really a signal of overbuilding.
OpenAI proposed handing the US government a 5% equity stake, per the FT on July 2, though talks are very early and would need Congressional approval. The proposal is actually broader than OpenAI: 5% of each leading US AI developer via a government vehicle, structured like an Alaska Permanent Fund-style public wealth fund that distributes proceeds to citizens. At OpenAI's $852B post-money valuation, 5% works out to roughly $42.6B.
Why it matters: This would make the government simultaneously a regulator and a shareholder of the company it polices, a conflict critics warn could blunt safety enforcement. The 5% figure also anchors a fast-widening political spectrum, with Trump's ~10% Intel stake already double that and Sanders and Bannon pushing for 50%.
The Information reported July 2 that Anthropic began early development of its own custom AI server chip and held preliminary talks with Samsung Electronics. The discussions center on Samsung Foundry's 2nm process and advanced packaging, and the chip targets AI inference rather than training. It's still conceptual with no settled design, and Anthropic stressed that Google, Amazon, and Nvidia chips stay central, but the report alone triggered a semiconductor selloff, dragging down US memory names plus Samsung and SK Hynix.
Why it matters: A headline about preliminary talks, not a product, gutted chip stocks, which shows how tightly the market ties frontier-lab supply chains to incumbent silicon. The real story is inference economics: custom chips tuned to Claude could cut serving costs 50%+, letting Anthropic control unit economics and API price floors. It's the last major lab to go custom.
Nvidia is rolling out a model where AI clouds buy Nvidia infrastructure and Nvidia earns product revenue plus a share of cloud revenue on supported capacity. Under a $6.3B initial-value agreement with CoreWeave, Nvidia is obligated to buy residual unsold GPU capacity through April 13, 2032. It has also put roughly $2B of equity each into CoreWeave and Nebius, anchoring relationships that then buy tens of billions in GPUs.
Why it matters: Nvidia is shifting from chip seller to landlord, putting its balance sheet behind customer GPUs and taking a recurring cut of downstream compute. That fuels "circular financing" fears, with VC Tomasz Tunguz mapping it onto Lucent's telecom-bubble vendor financing (Nvidia's exposure ~67% of annual revenue vs Lucent's 24%). Nvidia's rebuttal: unlike Lucent, its riders are paying, profitable hyperscalers who settle within 53 days.
Slow Drip
Blog reads worth savoring
Learn the exact architecture OpenAI used to run real-time WebRTC on Kubernetes: splitting stateless edge relays from stateful transceivers, embedding routing metadata in the ICE ufrag for zero-lookup first-packet forwarding, and a Go/SO_REUSEPORT relay backed by Redis.
Understand why stacking DRAM directly on the logic die (10K-100K vertical connections vs HBM's ~2,048 edge links) sidesteps constrained CoWoS packaging but relocates the yield-and-thermal problem into 3D.
See why the ex-Meta Llama lead left for drug discovery, and how PEARL's sub-1Å RMSD co-folding (winning OpenBind zero-shot on 802 unseen complexes) models induced fit without costly MD simulations.
Grasp the emerging pay-per-request model for the agentic web: charging AI agents for APIs, datasets, and MCP tools in stablecoins via HTTP's 402 code, with edge-enforced access and no seller-side billing stack.
A reproducible build for multi-hop RAG that uses a knowledge graph in Neptune plus Personalized PageRank to rank passages in a single retrieval step, connecting distant facts that vanilla vector search misses.
The Grind
Research papers, decoded
A feed-forward 3D foundation model that reconstructs scene geometry in real time from ordinary RGB video, using only one moving camera — no depth sensors or multi-camera rigs. Geometric Context Transformer combines anchor context, pose-reference window, and trajectory memory; ~20 FPS on 10,000+ frames; SOTA on Oxford Spires, Tanks and Temples, ETH3D, 7-Scenes. A phone, drone, or robot with a single camera can build a live spatial map without specialized hardware.
DeepSeek's speculative-decoding framework, shipped as the full-stack DeepSpec codebase. Pairs semi-autoregressive generation (emitting several tokens in parallel) with a confidence schedule — aggressive drafting when the model is sure, pulling back when it isn't — for higher per-user speed and aggregate throughput in production without changing output quality. One of the highest-leverage cost/latency optimizations for serving at volume, open and adaptable.
Trains an AI agent to act as a data scientist generating and refining its own training data, via a Challenger/Solver/Judge loop (Agentic Self-Instruct) targeting the right difficulty for a target model. Validation QA pass rate climbed 62.1% -> 79.6% over 126 iterations with no manual prompt engineering; gains over CoT Self-Instruct on CS-research, legal (PRBench-Legal 0.441 vs 0.377), and math. An agentic pipeline that builds difficulty-calibrated data converts extra inference compute into better datasets.
A "general world foundation model" built around next-state prediction instead of isolated next-token/frame/action objectives; unified latent from 125K hours of video plus 160M event annotations and 11.5M VQA pairs. Backbone frozen, only lightweight decoders trained: 51.8 vs Qwen3.5-4B's 46.7 (video understanding), 59.8 vs FLUX.2's 56.1 (image prediction), 32.4 vs π0.5's 29.4 (robot action) despite no action labels in pre-training. A shared state-transition model can outperform task-specific models and reach embodied action without action-labeled data.
The Mill
Builder tools ground for action
An agentic skills framework & software development methodology that works.
Open-source AI penetration testing tool to find and fix your app’s vulnerabilities.
Give /automate a task in plain English and it drives a real browser to do it: navigate a site, click through a multi-step flow, fill a form, reach a page that only renders after interaction. The result streams back in one API call. It's an API you call, not a framework you install. Browser and LLM included, nothing to host, no concurrency ceiling. Accessibility-tree automation spends 60 to 80% fewer tokens than screenshot-based agents. Built by Mozilla. Ephemeral, no training on your data.
The Counter
Voices from the AI bar today
Deep technical talk with Tufa Labs on induction vs transduction and world models for agents learning in unfamiliar environments.
Practical playbook for building "agent-first" businesses that sell work-as-a-service instead of software.
Argues the real AI scaling bottleneck is now energy and physical infrastructure, not GPUs.
Eight used RTX 3090s -> 192GB VRAM in one private box running 70B models without a datacenter.
New Skills workflow to create Live Photos, part of the video-gen wave around Seedance 2.0 4K and Vidu Q3.
Showcase of a game NPC engine driven entirely by local models; the community debates latency and model-size tradeoffs for real-time play.
Discussion of custom-silicon moves by OpenAI and Anthropic driven by compute scarcity, vendor lock-in, and supply-chain bottlenecks.
Roast Calendar
Your AI week, day by day
Last Sip
Parting thoughts
If there's one thread to pull from today, it's that the money and the machines are now the same conversation. Meta, Nvidia, and Anthropic all made moves about who owns the compute and who pays for it, and the markets reacted before any of it shipped. Meanwhile the counter-current is people building their own answer at home, stacking used 3090s to run 70B models off-grid. Both are reactions to the same discomfort with what compute costs right now. Grab a refill and enjoy the reads.