The Research Desk.

The most upvoted and starred AI research crossing the community today.

Last Brew Time: May 30, 2026, 7:25 AM PT

X.com Research Buzz

Jenna Russell
StoryScope: Investigating idiosyncrasies in AI fiction
X.com
4097

StoryScope: Investigating idiosyncrasies in AI fiction

Jenna Russell, Rishanth Rajendhran, Chau Minh Pham, Mohit Iyyer, John Wieting

University of Maryland, College Park, Google DeepMind

Computer Vision
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
X.com
2400

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Shihao Wang, Shilong Liu, Yuanguo Kuang, Xinyu Wei, Yangzhou Liu, Zhiqi Li, Yunze Man, Guo Chen, Andrew Tao, Guilin Liu, Jan Kautz, Lei Zhang, Zhiding Yu

NVIDIA, The Hong Kong Polytechnic University, Princeton University, Nanjing University, University of Illinois Urbana-Champaign

AlphaXiv Trending

NLP
AlphaXiv
166

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

Sangyun Lee, Sean McLeish, Tom Goldstein

Carnegie Mellon University, University of Maryland

AlphaXiv
108

When Does LeJEPA Learn a World Model?

David Klindt, Yann LeCun, Randall Balestriero

Cold Spring Harbor Laboratory, New York University, Brown University

Retrieval
AlphaXiv
72

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Madhuri Shanbhogue, Zhe Li, Shanfeng Zhang

AlphaXiv
69

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax, Aili Chen, Aonian Li

Reinforcement Learning
AlphaXiv
55

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Huawei Lin, Peng Li, Jie Song

ByteDance Inc, Rochester Institute of Technology, Huawei Lin

Computer Vision
AlphaXiv
53

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qiuyue Wang, Mingsheng Li, Jian Guan

HuggingFace Daily Papers

NLP
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
HuggingFace
29

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Cheolhong Min, Jaeyun Jung, Daeun Lee, Hyeonseong Jeon, Yu Su

NVIDIA

Robotics
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
HuggingFace
5

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Jusuk Lee, Seungjae Lee, Jonghun Shin, Hoseong Jung, Sungha Kim

NLP
Reflective Prompt Tuning through Language Model Function-Calling
HuggingFace
2

Reflective Prompt Tuning through Language Model Function-Calling

Farima Fatahi Bayat, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka

Megagon Labs

NLP
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
HuggingFace
2

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

Yubo Li, Yidi Miao

Carnegie Mellon University

Reinforcement Learning
PANDO: Efficient Multimodal AI Agents via Online Skill Distillation
HuggingFace
2

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Yubo Li, Yidi Miao, Yuntian Shen, Yuxin Liu

Carnegie Mellon University

Speech Audio
Convex Low-resource Accent-Robust Language Detection in Speech Recognition
HuggingFace
1

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Miria Feng, William Tan, Mert Pilanci

Stanford University