The Research Desk.

The most upvoted and starred AI research crossing the community today.

Last Updated: May 6, 2026, 9:22 AM PT

AlphaXiv Trending

Thinking with Visual Primitives
AlphaXiv
194

Thinking with Visual Primitives

Ruijie Lu, Yiyang Ma, Xiaokang Chen

#computer-vision#alphaxiv
Representation Fréchet Loss for Visual Generation
AlphaXiv
112

Representation Fréchet Loss for Visual Generation

CUHK, Jiawei Yang, Zhengyang Geng +1 more

#computer-vision#alphaxiv
Let ViT Speak: Generative Language-Image Pre-training
AlphaXiv
75

Let ViT Speak: Generative Language-Image Pre-training

ByteDance, Yan Fang, Mengcheng Lan +1 more

#machine-learning#computer-vision#alphaxiv
World Model for Robot Learning: A Comprehensive Survey
AlphaXiv
36

World Model for Robot Learning: A Comprehensive Survey

ETH Zurich, Bohan Hou, Gen Li +1 more

#robotics#alphaxiv
MolmoAct2: Action Reasoning Models for Real-world Deployment
AlphaXiv
29

MolmoAct2: Action Reasoning Models for Real-world Deployment

Haoquan Fang, Jiafei Duan, Donovan Clay

#reasoning#alphaxiv
On-Policy Distillation
AlphaXiv
28

On-Policy Distillation

Thinking Machines, Kevin Lu

#reinforcement-learning#efficiency#alphaxiv

HuggingFace Daily Papers

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
HuggingFace
65

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Ruofeng Yang, Yongcan Li, Shuai Li

#reinforcement-learning#robotics#retrieval#wanshuiyin
X2SAM: Any Segmentation in Images and Videos
HuggingFace
16

X2SAM: Any Segmentation in Images and Videos

Hao Wang, Limeng Qiao, Chi Zhang +2 more

#computer-vision#wanghao9610
Video Generation with Predictive Latents
HuggingFace
6

Video Generation with Predictive Latents

Yian Zhao, Feng Wang, Qiushan Guo +2 more

#computer-vision
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
HuggingFace
1

ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue

Daoxuan Zhang, Ping Chen, Jianyi Zhou +1 more

#reinforcement-learning#robotics#retrieval#4amGodvzx
The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail
HuggingFace
1

The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail

Venkata Pushpak Teja Menta

#speech-audio#praxelhq
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO
HuggingFace
0

Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO

Yu Tian, Jiawei Chen, Lifan Zheng +2 more

#machine-learning#T1aNS1R