The Research Desk.

The most upvoted and starred AI research crossing the community today.

Last Brew Time: May 31, 2026, 7:24 AM PT

X.com Research Buzz

The AI Layoff Trap
X.com
5857

The AI Layoff Trap

Brett Hemenway Falk, Gerry Tsoukalas

Wharton School, University of Pennsylvania, Boston University

AI narrative tells in story writing
X.com
4109

AI narrative tells in story writing

Computer Vision
LocateAnything: Fast and High-Quality Vision-Language Grounding and Detection
X.com
2509

LocateAnything: Fast and High-Quality Vision-Language Grounding and Detection

NVIDIA

AlphaXiv Trending

NLP
AlphaXiv
181

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

Sangyun Lee, Sean McLeish, Tom Goldstein

Carnegie Mellon University, University of Maryland

AlphaXiv
129

When Does LeJEPA Learn a World Model?

David Klindt, Yann LeCun, Randall Balestriero

Cold Spring Harbor Laboratory, New York University, Brown University

Retrieval
AlphaXiv
85

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Madhuri Shanbhogue, Zhe Li, Shanfeng Zhang

Computer Vision
AlphaXiv
76

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qiuyue Wang, Mingsheng Li, Jian Guan

Reinforcement Learning
AlphaXiv
64

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Huawei Lin, Peng Li, Jie Song

ByteDance Inc, Rochester Institute of Technology, Huawei Lin

Computer Vision
AlphaXiv
56

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Shihao Wang, Shilong Liu, Yuanguo Kuang

HuggingFace Daily Papers

NLP
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
HuggingFace
38

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Cheolhong Min, Jaeyun Jung, Daeun Lee, Hyeonseong Jeon, Yu Su

NVIDIA

Robotics
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
HuggingFace
7

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Jusuk Lee, Seungjae Lee, Jonghun Shin, Hoseong Jung, Sungha Kim

Reinforcement Learning
PANDO: Efficient Multimodal AI Agents via Online Skill Distillation
HuggingFace
5

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Yubo Li, Yidi Miao, Yuntian Shen, Yuxin Liu

Carnegie Mellon University

NLP
Reflective Prompt Tuning through Language Model Function-Calling
HuggingFace
4

Reflective Prompt Tuning through Language Model Function-Calling

Farima Fatahi Bayat, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka

Megagon Labs

NLP
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
HuggingFace
4

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Yubo Li, Yidi Miao

Carnegie Mellon University

Speech Audio
Convex Low-resource Accent-Robust Language Detection in Speech Recognition
HuggingFace
3

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Miria Feng, William Tan, Mert Pilanci

Stanford University