Jun 26, 2026

Agentic Brew Daily — The Memory Tax Comes Home

Your daily shot of what's brewing in AI

Fresh Batch

Distilled trend

The same AI-memory demand that drove Micron to ~85% margins is what Apple cited for hiking Mac and iPad prices 15-33% while sparing the iPhone.
With OpenAI's Jalapeño, Qualcomm's Dragonfly C1000, and a Modular acquisition all landing today, the silicon stack is racing to design around Nvidia, not just buy from it.
Anthropic is simultaneously poaching DeepMind's Gemini contributors and accusing Alibaba of 25,000 fake accounts to distill Claude — competing for talent and guarding model IP at once.

Bold Shots

Today's biggest AI stories, no chaser

OpenAI and Broadcom unveil Jalapeño, a custom inference chip designed in nine months

OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom AI accelerator — an "Intelligence Processor" built from scratch for LLM inference. It went from design to tape-out in roughly nine months, with engineering samples already running workloads including GPT-5.3-Codex-Spark, and OpenAI used its own models to speed parts of the design. Broadcom CEO Hock Tan cited ~50% cost savings versus typical AI GPUs; TSMC is the reported foundry and Microsoft is expected to take ~40% of first-phase chips. Gigawatt-scale deployment is planned for late 2026.

Why it matters: Inference, not training, is where the recurring AI bill lives, so cheaper tokens-per-watt directly lowers the cost of running ChatGPT, Codex, and the API at scale. The deeper signal is the nine-month, AI-assisted design cycle — and that Broadcom captures the enablement layer no matter which lab goes custom.

OpenAI and $AVGO just unveiled Jalapeno, OpenAI's first custom AI chip. It was designed end-to-end in just 9 months with help from OpenAI's own AI.

@danielisdizzy·5 engagements

OpenAI and Broadcom have developed a custom artificial intelligence chip called Jalapeno. OpenAI is now testing the samples.

@BloombergTV·69 engagements

OpenAI Unveils First Custom AI Chip With Broadcom | Bloomberg Tech 6/24/2026

Bloomberg Technology·5.3K views

Micron posts record $41.46B quarter on AI memory demand

Micron reported record fiscal Q3 FY2026 revenue of $41.46B, up roughly 4x from $9.30B a year earlier, with non-GAAP EPS of $25.11 against a ~$20.49 forecast and a company-record 84.9% gross margin — the highest in its data going back to 1990. It also signed 16 multi-year take-or-pay agreements representing ~$100B in minimum contracted revenue and ~$22B in upfront customer cash, covering ~20% of DRAM and a third of NAND shipments through 2030. Shares surged 15-19% to a fresh record.

Why it matters: HBM has become the binding constraint on how fast the industry can build AI infrastructure. The take-or-pay contracts push cyclical downside onto hyperscalers — the clearest sign yet that memory's 40-year boom-bust pattern may be breaking.

Qualcomm launches Dragonfly C1000 and acquires Modular for ~$3.9B

Qualcomm unveiled its first data-center CPU, the Dragonfly C1000, at Investor Day 2026 — custom Oryon cores, 250+ cores above 5 GHz, claiming ~2x better performance-per-watt versus competitive server CPUs — and named Meta as its first data-center customer (production H2 2028). It also agreed to acquire AI software startup Modular in an all-stock deal valued ~$3.9B, betting on a silicon-agnostic stack built around the Mojo language and MAX inference engine.

Why it matters: The headline is silicon, but the real move is the Modular buy — it targets Nvidia's true moat, CUDA and the rewrite cost that pins workloads to its hardware. Modular's pedigree (co-founder Chris Lattner, creator of LLVM and Swift) sharpens the threat, though Dragonfly production with Meta doesn't start until H2 2028.

Apple raises Mac and iPad prices, blaming the AI memory shortage

Apple raised prices on Macs, iPads, and several home products worldwide, blaming a surge in memory and storage costs driven by AI data-center demand — while leaving iPhone prices unchanged for now. The entry MacBook Neo rose to $699 from $599, the Mac Studio to $2,499 from $1,999, and the iPad Air to $749 from $599. Apple shares fell ~6% to ~$275.15, their worst single day in more than a year. Tim Cook said Apple had shielded customers for months but the situation had become unsustainable.

Why it matters: This is the moment the AI buildout cost became personal — data centers now consume ~70% of all memory chips produced worldwide, and the same DRAM dies in a MacBook compete directly with HBM feeding AI accelerators. Sparing the iPhone is a deliberate bet on ecosystem demand elasticity.

Anthropic accuses Alibaba of large-scale Claude distillation

Anthropic sent a June 10 letter to the U.S. Senate Banking Committee accusing Alibaba of "the largest known distillation attack on Anthropic to date," alleging operators affiliated with Alibaba and its Qwen lab ran 28.8 million exchanges with Claude using ~25,000 fraudulent accounts between April 22 and June 5. Alibaba shares fell ~4.43% to HK$95.00 on June 25, wiping ~HK$88B in value. China's Foreign Ministry called the allegations groundless.

Why it matters: This shifts the U.S.-China AI contest from chips to cognition, and the 28.8M-exchange figure is a step-change over earlier DeepSeek and MiniMax cases. It hasn't landed as a clean morality tale, though — critics note Anthropic trained on scraped web data itself, and this is a TOS-violation letter to the Senate, not a filed lawsuit.

Anthropic Accuses Alibaba's Qwen of Largest Claude Distillation

r/ArtificialInteligence·255 upvotes

Slow Drip

Blog reads worth savoring

Analysis · Vik's NewsletterWhat AI Inference Actually Demands From a NAND SSD

Explains why KV-cache storage is a distinct SSD workload and which cell-level, block-size, and FDP design choices let flash act as a memory tier instead of DRAM.

Analysis · Data Science CollectiveThe Missing Abstraction in AI Systems: Controllers

Argues your agent's nondeterminism is an architecture gap, not a model flaw, and borrows the distributed-systems controller layer to fix retries, state, and step-tracking.

Research · Data Science CollectiveBefore Another RAG Hop, Try Compiling the Query for BM25

Reproduces Meta's SIRA result showing a compiled weighted-BM25 query beats E5, SPLADE, and agentic search on BEIR (0.691 vs 0.648 Recall@10) with no fine-tuning, plus runnable code.

Tutorial · Hugging Face Blog / NVIDIAAccelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

A one-import swap that gives 3.4-3.7x MoE training throughput and ~32% less GPU memory via Expert Parallelism + DeepEP, scaling to 550B params.

News · Google DeepMindIntroducing computer use in Gemini 3.5 Flash

Computer-use is now native to 3.5 Flash, with adversarial-trained prompt-injection defenses and auto-halt safeguards for browser/mobile/desktop agents callable from the Gemini API today.

The Grind

Research papers, decoded

AlphaXiv166 upvotes · alphaxiv

Tmax: A simple recipe for terminal agents

The strongest open recipe to date for training terminal-using agents. A 9B model hits 27.2% on Terminal-Bench 2.0 by generating 14,600 synthetic terminal environments across nine structured dimensions and training with outcome-only RL (DPPO). Generalizes: SWE-Bench Verified 44.0%->53.5%, AIME 73.3%->91.1%. Dataset, models, and code fully open-sourced.

AlphaXiv83 upvotes · alphaxiv

Qwen-AgentWorld: Language World Models for General Agents

The first language world models that simulate agentic environments across 7 domains via long chain-of-thought. The 397B model scores 58.71 on AgentWorldBench, edging out GPT-5.4 and Claude Opus 4.8. Training agents in its simulated environments beats real-environment-only training (+7.1 on QwenClawBench). Code released.

AlphaXiv77 upvotes · alphaxiv

Unlimited OCR Works

Builds on DeepSeek OCR. Reference Sliding Window Attention (R-SWA) keeps all visual reference tokens globally accessible but only a 128-token sliding window of generated text, so the KV cache stays constant. Result: 93.23% on OmniDocBench v1.5 (+6.22 pts), 12.7% faster, 40+ pages in one forward pass. Weights and code public.

AlphaXiv49 upvotes · alphaxiv

Tapered Language Models

Allocate more parameters to early layers and fewer to later ones. Tapering MLP width via a cosine schedule improved perplexity by 1.84 points on a 440M Transformer, consistent across 4 architectures and 3 scales, at zero extra parameter or compute cost.

The Mill

Builder tools ground for action

69.2K stars

opendatalab/MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

GitHub

19.7K stars

alibaba/page-agent

JavaScript in-page GUI agent. Control web interfaces with natural language.

GitHub

18.4K stars

google-labs-code/design.md

A format specification for describing a visual identity to coding agents. DESIGN.md gives agents a persistent, structured understanding of a design system.

GitHub

552 votesProduct Hunt

Tencent EdgeOne Makers

Tencent EdgeOne Makers is an edge platform for modern web apps and AI agents. Build with your preferred frameworks and deploy through familiar CLI, Git, and CI/CD workflows. Get built-in agent runtime, sandboxed tools, memory, observability, model gateway support, serverless functions, and storage—without stitching together complex infrastructure. Add AI agents to existing products or launch new AI applications in minutes. Deploy AI agents like web apps.

Product Hunt

128 votesProduct Hunt

FUTO Swipe

FUTO Swipe is a family of small, open models for accurate swipe typing. It includes a layout-agnostic encoder, a layout-specific decoder, and a lightweight context language model. The full system runs efficiently on-device with a very small footprint, and FUTO has also released the 1 million swipe dataset used to train it.

Product Hunt

The Counter

Voices from the AI bar today

11K views

2026 Nvidia Annual Stockholder Meeting

Jensen Huang's full vision pitch — agentic AI, physical AI, and the AI-factory roadmap for next-gen compute.

Yahoo Finance

3.9K views

The Agent Cloud: Databricks' Bet on the Future of AI

Matei Zaharia and Reynold Xin on Omnigent, a meta-harness over Claude Code/Codex, and an agent-native database stack.

Latent Space

4.5K engagements

NVIDIA CEO watching everyone build their own chips... Amazon. Anthropic. Google. xAI and now OPENAI

The rush to design AI silicon off Nvidia, captured in one viral thread.

@shiri_shh

3K engagements

Runlayer raises $30M to give enterprise AI agents their own scoped identity

Felicis and Khosla back Runlayer as a 'golden path' for enterprise agent identity.

@berman66

929 upvotes · 268 comments

7 Chinese companies are already shipping H100/H200-class AI chips, most IPO'd in the last 6 months. I mapped all of them.

A crowd-sourced map of China's domestic AI-silicon surge and what it means for the GPU supply chain.

r/LocalLLaMA

Roast Calendar

Your AI week, day by day

Fri26

9:30 AM PT•San Francisco, CA

Beta Fund AI Agents Hackathon

3:00 PM PT•San Francisco, CA

The New AI Scaling Axis: Neuro-Inspired Test-Time Cognition

5:30 PM PT•San Francisco, CA

AI Memory for Agents & Cognee Launch Party

Sat27

9:00 AM PT•San Francisco, CA

AI Engineer World Fair's Hackathon

9:30 AM PT•San Francisco, CA

Scalekit x Actian x Render Hackathon — Agents in Production Build Day

1:00 PM PT•Berkeley, CA

Berkeley AI Founders & Builders Meetup

Sun28

June 28 - June 29•San Francisco, CA

Wizard Hackathon

5:00 PM PT•San Francisco, CA

AI Engineer World's Fair — New Engineer Orientation (IRL)

Mon29

5:30 PM PT•San Francisco, CA

Harness Engineering: State of the Art in Agent Harnesses

6:00 PM PT•San Francisco, CA

Artificial Analysis Intelligence Index

6:00 PM PT•San Francisco, CA

Model Independence Day

Tue30

3:30 PM PT•Menlo Park, CA

Building AI Ecosystem: Talks + AI Agent Workshops (ServiceNow, Glean, SAP, Snowflake)

5:30 PM PT•San Francisco, CA

Agents in Production: correctness, context, and control

6:00 PM PT•San Francisco, CA

AI Engineer World Fair Demos & Happy Hour

Wed1

5:30 PM PT•San Francisco, CA

AI Demo Night

5:30 PM PT•San Francisco, CA

AAuth Night: Moving Beyond OAuth

6:30 PM PT•San Francisco, CA

AI Engineer After Dark

Thu2

2:00 PM PT•San Francisco, CA

The Future of Agentic Engineering and AI Workforces with Qoder

6:00 PM PT•San Francisco, CA

{AI} in Production

Last Sip

Parting thoughts

Five of today's biggest stories trace back to a single thing: memory. The same HBM scarcity lifting Micron to record margins is the line item on your next MacBook receipt and the reason OpenAI, Qualcomm, and a half-dozen Chinese startups are all designing silicon at once. When a constraint shows up in earnings, in consumer prices, and in geopolitics on the same day, it's worth sitting with. Enjoy the brew.