Why Eigen: a HAN Lab pedigree that already sits inside the modern serving stack
Eigen AI is small — about 20 people — but its founders carry an outsized footprint in the academic plumbing of LLM inference. CEO Ryan Hanrui Wang is the lead author on Sparse Attention (SpAtten), described in Nebius's own announcement as the most-cited HPCA paper since 2020. Co-founder Wei-Chen Wang received the MLSys 2024 Best Paper Award for Activation-aware Weight Quantization (AWQ), now a standard 4-bit serving technique inside many production LLM stacks. The team's lineage is the MIT HAN Lab, and the company was incorporated only in 2025.
What Eigen actually ships is a layered optimization stack: system-, model-, and kernel-level techniques — custom CUDA/Triton kernels that talk directly to the GPU, weight compression, KV-cache (the running model's working memory) optimization, and LoRA-based post-training — that together raise throughput and lower cost per inference on a fixed silicon footprint. Nebius reports that the two companies have already co-shipped optimized implementations of leading open-source models that ranked among the fastest on Artificial Analysis benchmarks. Buying Eigen, in other words, is not a bet on unproven research; it is folding an in-flight collaboration into the parent company before its results compound elsewhere.




