TECH

Local AI inference hardware: Nvidia Spark vs AMD

27+

Signals

Strategic Overview

01.
A new category of compact local-AI computers built around 128GB of unified memory has arrived, letting users run 70B-plus models, agents, and fine-tuning jobs entirely offline. Nvidia's DGX Spark pairs a 20-core Arm Grace CPU with a Blackwell GB10 GPU, delivers a petaflop of AI performance, and can run inference on models up to roughly 200B parameters, priced at $3,999.
02.
AMD is challenging Nvidia head-on with its Ryzen AI Max+ 395 (Strix Halo) platform. Its $3,999 Ryzen AI Halo developer box matches the 128GB capacity with LPDDR5X-8000 and a 2TB SSD while adding x86 and Windows 11/Linux flexibility, and third-party Strix Halo mini PCs such as the GMKtec EVO X2 hit the same 128GB spec from roughly $1,499 to $2,500.
03.
For single-batch LLM token generation the two platforms perform similarly, because that workload is memory-bandwidth-bound rather than compute-bound and the two boxes have comparable bandwidth (~273 vs ~256 GB/s). Nvidia's clear advantages emerge in prompt processing, image and video generation, and fine-tuning, where its tensor cores deliver a 2-3x lead.

Under the Hood: Why a Half-Price Box Ties a $4,000 One

The headline shock of this matchup is that a Strix Halo mini PC starting around $1,499 generates LLM tokens at roughly the same speed as the $3,999 DGX Spark ^[6]. That is not a fluke. Single-batch LLM token generation (the decode phase, where the model emits one token at a time) is memory-bandwidth-bound, not compute-bound: every generated token requires streaming the model's weights through memory, so throughput is gated by how fast you can move bytes, not how many FLOPS you can do. The two boxes have nearly identical bandwidth, about 273 GB/s on the Spark versus 256 GB/s on the Strix Halo HP Z2 Mini G1a ^[2], so they 'churn out tokens at a similar pace' ^[1]. The benchmarks bear this out: on gpt-oss 120B the Spark hit ~38.6 tok/s versus ~34.1 on Strix Halo, and on a Llama 3.3 70B run the Strix Halo actually edged ahead at 4.97 versus 4.67 tok/s ^[4]. LMSYS measured the mechanism directly, calling the limited bandwidth 'the key bottleneck in AI inference' ^[2]. Nvidia's far larger compute budget (6,144 CUDA cores against the Radeon 8060S's 40 CUs / 2,560 stream processors) simply has nothing to do during decode ^[1].

Follow the Money: Who Actually Wins, by Workload

Decode is only one of three workloads, and Nvidia dominates the other two. Prompt processing (prefill, where the model ingests your context before answering) is compute-bound, and here the Spark is in a different class: roughly 1,723 tok/s versus the Strix Halo's ~340 tok/s, a 2-3x lead on shorter sequences ^[1]^[4]. The gap widens further for creation and training. On FLUX.1 image generation the Spark runs about 2.5x faster, and it finished a QLoRA fine-tune of Llama 3.1 70B in roughly 20 minutes against ~50 minutes on the AMD box ^[1]. That maps cleanly onto its BF16 tensor performance of 56 teraFLOPS versus the HP Z2 Mini G1a's ~46 ^[1]. So the buying decision is genuinely workload-shaped: if you mostly chat with a local 70B model, AMD's value is unbeatable; if you process long documents, generate images and video, or fine-tune, the Spark earns its premium. AMD does win the CPU side outright, with Zen 5 delivering 'between 10 and 15 percent higher performance' than the Spark's Arm cores in Sysbench, 7zip, and HandBrake ^[1].

What the Viral 'AMD Killed Nvidia' Framing Misses

The social conversation has crystallized around a price-disruptor narrative (one widely shared post framed it as AMD's CEO killing Nvidia's $4,000 box with a $1,499 lunchbox running a 235B model live), and on raw token throughput that framing has real merit. But benchmark obsession glosses over three things. First, software friction: getting vLLM, BitsandBytes, and Flash Attention 2 running on AMD often meant source compilation or AMD-specific builds, because 'a lot of software can be made to work on AMD's consumer hardware, but it's not always as simple as running something like pip install xyz-package' ^[1], against Nvidia's drop-in CUDA experience. Second, scalability: the most thoughtful defenders of the Spark on Reddit argue its real magic is ConnectX networking and near-lossless scaling, where two Sparks combine to serve a much larger model with high prefill, rather than raw single-box bandwidth. Third, the lived experience cuts both ways. The same threads that defend the Spark on scalability also include an owner of both machines admitting they still reach for the Strix Halo more often. The honest read is that 'AMD killed Nvidia' is half-true: AMD won the value crown for interactive chat, but Nvidia kept the crown for generation, fine-tuning, and multi-box production.

Why This Category Exists Now

These boxes exist because 128GB of unified memory crosses a threshold that consumer GPU VRAM never could: a 70B-plus model loads directly into memory, turning local exploration of frontier-class models from impossible into merely slow ^[2]^[5]. The demand driver is the wish to run large models, agents, and fine-tuning locally without cloud cost or data exposure, where nothing leaves your network. Community math frames the economics starkly: AMD's break-even against renting cloud GPUs lands around six months, with a roughly $4,500 three-year cost versus $25K-plus in the cloud. But the same physics that enables the category also caps it. Large-model interactivity is slow on both platforms (Llama 3.1 70B FP8 delivered only ~2.7 tok/s decode on the Spark, and real-world 70B INT4 runs land around 3-5 tok/s) ^[2]^[4], and Apple's M4 Max already offers ~546 GB/s, roughly double the Spark, hinting at how much headroom the next generation has on the one axis that matters most for chat ^[6].

Historical Context

2025-03-19

DGX Spark specs (originally teased as Project Digits) surfaced and drew mixed reactions, especially over memory bandwidth.

2025-10-13

Published an in-depth DGX Spark inference review that established the 273 GB/s bandwidth bottleneck empirically.

2025-10

DGX Spark became available at $3,999, positioned for AI at the edge.

2025-12-25

Published head-to-head testing of AMD Strix Halo (HP Z2 Mini G1a) against the Nvidia DGX Spark.

2026-06

AMD opened pre-orders for its Linux/Windows Ryzen AI Halo developer box at $3,999.

Power Map

Key Players

Subject

Local AI inference hardware: Nvidia Spark vs AMD

Nvidia

Maker of the DGX Spark / GB10 Grace Blackwell platform, positioning a premium $3,999 personal AI supercomputer whose leverage is a mature CUDA ecosystem and dedicated tensor hardware.

AMD

The challenger with Ryzen AI Max+ 395 (Strix Halo), competing on price, x86 Windows/Linux flexibility, and a broad OEM ecosystem, and explicitly naming DGX Spark as its rival.

OEM partners (Acer, ASUS, Dell, GIGABYTE, HP, Lenovo, MSI; GMKtec, Minisforum, Corsair, Xiaomi)

Build and price both platforms. Strix Halo OEMs aggressively undercut Nvidia on price, while Nvidia's partners broaden DGX Spark distribution.

Apple

An indirect competitor whose M4 Max offers roughly 546 GB/s of memory bandwidth, about double the DGX Spark, serving as the benchmark that exposes the Spark's bandwidth limitation.

Fact Check

6 cited

Source Articles

Top 4

THE SIGNAL.

Analysts

“AMD's Zen 5 CPU outpaces the Spark's Arm CPU and single-user LLM inference is competitive, but Nvidia wins decisively on image and video generation and on fine-tuning. AMD's Zen 5 delivered 'between 10 and 15 percent higher performance' in Sysbench, 7zip, and HandBrake tests.”

The Register

Enterprise tech publication

“The DGX Spark is bottlenecked by its 273 GB/s memory bandwidth, which is 'expected (and empirically shown) to be the key bottleneck in AI inference,' making it best for prototyping, small models, and batched serving rather than large-model production.”

LMSYS Org

AI research org and model-serving authors

“Despite the bandwidth caveats, the GB10 box is genuinely appealing for buyers who specifically want an Nvidia-based, high-memory mini AI workstation: 'If you want a high-memory, NVIDIA-based mini AI workstation, this is it.'”

ServeTheHome

Server and hardware review site

The Crowd

“AMD CEO Lisa Su just killed Nvidia's $4,000 AI box with a $1,499 lunchbox. She walked on stage, held it in one hand, and ran a 235 billion parameter model live. No data center. No cloud. No rented GPU. The chip inside is something nobody saw coming. AMD's Ryzen AI Max+ 395 is”

@@adiix_official9949

“NVIDIA will refund your cloud GPU bill if you let them bolt a $250,000 AI supercomputer to your desk for $2,999. NVIDIA's $2,999 DGX Spark puts a 128GB AI supercomputer on your desk runs the same 70B models you've been renting in the cloud, but nothing crosses your network and”

@@dunik_7971

“THIS NUMBER MADE THE REVIEWER GO "WHOA" ON CAMERA AND IT SHOULD MAKE YOU GO WHOA TOO Mac Mini PP score was 563, Strix Halo came in at 342, and DGX Spark hit 2,170 that's not a small gap, that's a completely different category of hardware people were judging Spark on token”

@@leopardracer543

“AMD Tackles NVIDIA's $4679 DGX Spark AI PC With Its $3999 Ryzen AI Halo: Now Available With 128 GB Memory For Blazing Fast LLMs”

@u/GanacheNegative1988136

Broadcast

NVIDIA DGX Spark - A Non-Sponsored Review (Strix Halo Comparison, Pros & Cons)

Test Nvidia DGX Spark vs AMD and Mac Mini

DGX Spark vs AMD EPYC CPU Local AI Benchmarks