One Die, One Petaflop, 120-Billion Parameters On Your Lap
RTX Spark's headline number -- 1 petaflop of local FP4 AI compute -- only makes sense once you see how the die is wired. A 20-core Arm Grace CPU sits next to a Blackwell RTX GPU with 6,144 CUDA cores and 5th-generation FP4 Tensor Cores, and the two are stitched together with NVLink-C2C across 128GB of unified memory [1]. That last detail is the trick. On a normal laptop, model weights have to be shuttled between CPU RAM and GPU VRAM over a relatively narrow bus, which is what makes running anything larger than ~13B parameters miserable on the road. With unified memory and a chip-to-chip interconnect, the GPU can address the full 128GB pool directly, and a model never has to leave.
The practical envelope NVIDIA quotes follows from that: render ultralarge 90GB+ 3D scenes, edit 12K 4:2:2 video, generate 4K AI video, run 120-billion-parameter LLMs with up to a 1-million-token context window using local agents, and still play AAA games at 1440p over 100 FPS [2]. Translation for a builder: the same 14mm, 3-pound laptop that runs an indie game at high refresh can also host a frontier-class assistant that ingests an entire codebase, an entire deposition, or an entire video edit without round-tripping to a cloud API. The chip is manufactured on TSMC's 3nm EUV node [3], and Huang committed to a three-generation Spark roadmap -- Grace Blackwell now, Vera Rubin with LPDDR6 memory next, then Rosa Feynman [4]-- signalling this is a platform, not a one-off.




