The Research Desk.

The most upvoted and starred AI research crossing the community today.

Last Brew Time: May 6, 2026, 9:22 AM PT

AlphaXiv Trending

Computer Vision
Thinking with Visual Primitives
AlphaXiv
194

Thinking with Visual Primitives

Ruijie Lu, Yiyang Ma, Xiaokang Chen

Computer Vision
Representation Fréchet Loss for Visual Generation
AlphaXiv
112

Representation Fréchet Loss for Visual Generation

CUHK, Jiawei Yang, Zhengyang Geng, Xuan Ju

Machine Learning
Let ViT Speak: Generative Language-Image Pre-training
AlphaXiv
75

Let ViT Speak: Generative Language-Image Pre-training

ByteDance, Yan Fang, Mengcheng Lan, Zilong Huang

ByteDance

Robotics
World Model for Robot Learning: A Comprehensive Survey
AlphaXiv
36

World Model for Robot Learning: A Comprehensive Survey

ETH Zurich, Bohan Hou, Gen Li, Jindou Jia

Reasoning
MolmoAct2: Action Reasoning Models for Real-world Deployment
AlphaXiv
29

MolmoAct2: Action Reasoning Models for Real-world Deployment

Haoquan Fang, Jiafei Duan, Donovan Clay

Reinforcement Learning
On-Policy Distillation
AlphaXiv
28

On-Policy Distillation

Thinking Machines, Kevin Lu

Thinking Machines

HuggingFace Daily Papers

Reinforcement Learning
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
HuggingFace
65

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Ruofeng Yang, Yongcan Li, Shuai Li

Computer Vision
X2SAM: Any Segmentation in Images and Videos
HuggingFace
16

X2SAM: Any Segmentation in Images and Videos

Hao Wang, Limeng Qiao, Chi Zhang, Lin Ma, Guanglu Wan

Computer Vision
Video Generation with Predictive Latents
HuggingFace
6

Video Generation with Predictive Latents

Yian Zhao, Feng Wang, Qiushan Guo, Chang Liu, Xiangyang Ji

Speech Audio
The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail
HuggingFace
1

The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail

Venkata Pushpak Teja Menta

Reinforcement Learning
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
HuggingFace
1

ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue

Daoxuan Zhang, Ping Chen, Jianyi Zhou, Shuo Yang

Machine Learning
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO
HuggingFace
0

Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO

Yu Tian, Jiawei Chen, Lifan Zheng, Mingxiang Tao, Xinyi Zeng