Agentic Brew Daily — The Memory Tax Comes Home
Your daily shot of what's brewing in AI
Fresh Batch
- The same AI-memory demand that drove Micron to ~85% margins is what Apple cited for hiking Mac and iPad prices 15-33% while sparing the iPhone.
- With OpenAI's Jalapeño, Qualcomm's Dragonfly C1000, and a Modular acquisition all landing today, the silicon stack is racing to design around Nvidia, not just buy from it.
- Anthropic is simultaneously poaching DeepMind's Gemini contributors and accusing Alibaba of 25,000 fake accounts to distill Claude — competing for talent and guarding model IP at once.
Bold Shots
Today's biggest AI stories, no chaser
OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom AI accelerator — an "Intelligence Processor" built from scratch for LLM inference. It went from design to tape-out in roughly nine months, with engineering samples already running workloads including GPT-5.3-Codex-Spark, and OpenAI used its own models to speed parts of the design. Broadcom CEO Hock Tan cited ~50% cost savings versus typical AI GPUs; TSMC is the reported foundry and Microsoft is expected to take ~40% of first-phase chips. Gigawatt-scale deployment is planned for late 2026.
Why it matters: Inference, not training, is where the recurring AI bill lives, so cheaper tokens-per-watt directly lowers the cost of running ChatGPT, Codex, and the API at scale. The deeper signal is the nine-month, AI-assisted design cycle — and that Broadcom captures the enablement layer no matter which lab goes custom.
OpenAI and $AVGO just unveiled Jalapeno, OpenAI's first custom AI chip. It was designed end-to-end in just 9 months with help from OpenAI's own AI.
OpenAI and Broadcom have developed a custom artificial intelligence chip called Jalapeno. OpenAI is now testing the samples.
Micron reported record fiscal Q3 FY2026 revenue of $41.46B, up roughly 4x from $9.30B a year earlier, with non-GAAP EPS of $25.11 against a ~$20.49 forecast and a company-record 84.9% gross margin — the highest in its data going back to 1990. It also signed 16 multi-year take-or-pay agreements representing ~$100B in minimum contracted revenue and ~$22B in upfront customer cash, covering ~20% of DRAM and a third of NAND shipments through 2030. Shares surged 15-19% to a fresh record.
Why it matters: HBM has become the binding constraint on how fast the industry can build AI infrastructure. The take-or-pay contracts push cyclical downside onto hyperscalers — the clearest sign yet that memory's 40-year boom-bust pattern may be breaking.
Qualcomm unveiled its first data-center CPU, the Dragonfly C1000, at Investor Day 2026 — custom Oryon cores, 250+ cores above 5 GHz, claiming ~2x better performance-per-watt versus competitive server CPUs — and named Meta as its first data-center customer (production H2 2028). It also agreed to acquire AI software startup Modular in an all-stock deal valued ~$3.9B, betting on a silicon-agnostic stack built around the Mojo language and MAX inference engine.
Why it matters: The headline is silicon, but the real move is the Modular buy — it targets Nvidia's true moat, CUDA and the rewrite cost that pins workloads to its hardware. Modular's pedigree (co-founder Chris Lattner, creator of LLVM and Swift) sharpens the threat, though Dragonfly production with Meta doesn't start until H2 2028.
Apple raised prices on Macs, iPads, and several home products worldwide, blaming a surge in memory and storage costs driven by AI data-center demand — while leaving iPhone prices unchanged for now. The entry MacBook Neo rose to $699 from $599, the Mac Studio to $2,499 from $1,999, and the iPad Air to $749 from $599. Apple shares fell ~6% to ~$275.15, their worst single day in more than a year. Tim Cook said Apple had shielded customers for months but the situation had become unsustainable.
Why it matters: This is the moment the AI buildout cost became personal — data centers now consume ~70% of all memory chips produced worldwide, and the same DRAM dies in a MacBook compete directly with HBM feeding AI accelerators. Sparing the iPhone is a deliberate bet on ecosystem demand elasticity.
Anthropic sent a June 10 letter to the U.S. Senate Banking Committee accusing Alibaba of "the largest known distillation attack on Anthropic to date," alleging operators affiliated with Alibaba and its Qwen lab ran 28.8 million exchanges with Claude using ~25,000 fraudulent accounts between April 22 and June 5. Alibaba shares fell ~4.43% to HK$95.00 on June 25, wiping ~HK$88B in value. China's Foreign Ministry called the allegations groundless.
Why it matters: This shifts the U.S.-China AI contest from chips to cognition, and the 28.8M-exchange figure is a step-change over earlier DeepSeek and MiniMax cases. It hasn't landed as a clean morality tale, though — critics note Anthropic trained on scraped web data itself, and this is a TOS-violation letter to the Senate, not a filed lawsuit.
Slow Drip
Blog reads worth savoring
Explains why KV-cache storage is a distinct SSD workload and which cell-level, block-size, and FDP design choices let flash act as a memory tier instead of DRAM.
Argues your agent's nondeterminism is an architecture gap, not a model flaw, and borrows the distributed-systems controller layer to fix retries, state, and step-tracking.
Reproduces Meta's SIRA result showing a compiled weighted-BM25 query beats E5, SPLADE, and agentic search on BEIR (0.691 vs 0.648 Recall@10) with no fine-tuning, plus runnable code.
A one-import swap that gives 3.4-3.7x MoE training throughput and ~32% less GPU memory via Expert Parallelism + DeepEP, scaling to 550B params.
Computer-use is now native to 3.5 Flash, with adversarial-trained prompt-injection defenses and auto-halt safeguards for browser/mobile/desktop agents callable from the Gemini API today.
The Grind
Research papers, decoded
The strongest open recipe to date for training terminal-using agents. A 9B model hits 27.2% on Terminal-Bench 2.0 by generating 14,600 synthetic terminal environments across nine structured dimensions and training with outcome-only RL (DPPO). Generalizes: SWE-Bench Verified 44.0%->53.5%, AIME 73.3%->91.1%. Dataset, models, and code fully open-sourced.
The first language world models that simulate agentic environments across 7 domains via long chain-of-thought. The 397B model scores 58.71 on AgentWorldBench, edging out GPT-5.4 and Claude Opus 4.8. Training agents in its simulated environments beats real-environment-only training (+7.1 on QwenClawBench). Code released.
Builds on DeepSeek OCR. Reference Sliding Window Attention (R-SWA) keeps all visual reference tokens globally accessible but only a 128-token sliding window of generated text, so the KV cache stays constant. Result: 93.23% on OmniDocBench v1.5 (+6.22 pts), 12.7% faster, 40+ pages in one forward pass. Weights and code public.
Allocate more parameters to early layers and fewer to later ones. Tapering MLP width via a cosine schedule improved perplexity by 1.84 points on a 440M Transformer, consistent across 4 architectures and 3 scales, at zero extra parameter or compute cost.
The Mill
Builder tools ground for action
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
JavaScript in-page GUI agent. Control web interfaces with natural language.
A format specification for describing a visual identity to coding agents. DESIGN.md gives agents a persistent, structured understanding of a design system.
Tencent EdgeOne Makers is an edge platform for modern web apps and AI agents. Build with your preferred frameworks and deploy through familiar CLI, Git, and CI/CD workflows. Get built-in agent runtime, sandboxed tools, memory, observability, model gateway support, serverless functions, and storage—without stitching together complex infrastructure. Add AI agents to existing products or launch new AI applications in minutes. Deploy AI agents like web apps.
FUTO Swipe is a family of small, open models for accurate swipe typing. It includes a layout-agnostic encoder, a layout-specific decoder, and a lightweight context language model. The full system runs efficiently on-device with a very small footprint, and FUTO has also released the 1 million swipe dataset used to train it.
The Counter
Voices from the AI bar today
Jensen Huang's full vision pitch — agentic AI, physical AI, and the AI-factory roadmap for next-gen compute.
Matei Zaharia and Reynold Xin on Omnigent, a meta-harness over Claude Code/Codex, and an agent-native database stack.
The rush to design AI silicon off Nvidia, captured in one viral thread.
Felicis and Khosla back Runlayer as a 'golden path' for enterprise agent identity.
A crowd-sourced map of China's domestic AI-silicon surge and what it means for the GPU supply chain.
Roast Calendar
Your AI week, day by day
Last Sip
Parting thoughts
Five of today's biggest stories trace back to a single thing: memory. The same HBM scarcity lifting Micron to record margins is the line item on your next MacBook receipt and the reason OpenAI, Qualcomm, and a half-dozen Chinese startups are all designing silicon at once. When a constraint shows up in earnings, in consumer prices, and in geopolitics on the same day, it's worth sitting with. Enjoy the brew.