The Research Desk.

The most upvoted and starred AI research crossing the community today.

Last Updated: Mar 14, 2026, 11:55 AM PT

X.com Research Buzz

Natural Emergent Misalignment from Reward Hacking in Production RL
X.com
19523

Natural Emergent Misalignment from Reward Hacking in Production RL

Monte MacDiarmid, Benjamin Wright, Jonathan Uesato +19 more

#reinforcement-learning#safety-alignment

AlphaXiv Trending

OpenClaw-RL: Train Any Agent Simply by Talking
AlphaXiv
194

OpenClaw-RL: Train Any Agent Simply by Talking

Yinjie Wang, Xuyang Chen, Xiaolong Jin

#reinforcement-learning#alphaxiv
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
AlphaXiv
74

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Changyao Tian, Danni Yang, Guanzhou Chen

#reasoning#multimodal#alphaxiv
How Far Can Unsupervised RLVR Scale LLM Training?
AlphaXiv
68

How Far Can Unsupervised RLVR Scale LLM Training?

Bingxiang He, Yuxin Zuo, Zeyuan Liu

#machine-learning#nlp#alphaxiv
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
AlphaXiv
64

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Zorik Gekhman, Roee Aharoni, Eran Ofek

#nlp#reasoning#alphaxiv
Lost in Backpropagation: The LM Head is a Gradient Bottleneck
AlphaXiv
45

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Nathan Godey, Yoav Artzi

#alphaxiv
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
AlphaXiv
34

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan, Phillip Isola

#alphaxiv

HuggingFace Daily Papers

CREATE: Testing LLMs for Associative Creativity
HuggingFace
11

CREATE: Testing LLMs for Associative Creativity

Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman +2 more

#nlp#ManyaWadhwa
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
HuggingFace
9

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

Tzu-Heng Huang, Sirajul Salekin, Javier Movellan +2 more

#computer-vision#reinforcement-learning
Meta-Reinforcement Learning with Self-Reflection for Agentic Search
HuggingFace
5

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

Teng Xiao, Yige Yuan, Hamish Ivison +2 more

#reinforcement-learning#retrieval
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
HuggingFace
3

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan, Phillip Isola

#sunrainyg
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
HuggingFace
2

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

Shahriar Noroozizadeh, Xiaobin Shen, Jeremy C. Weiss +1 more

#reasoning#Shahriarnz14
WaDi: Weight Direction-aware Distillation for One-step Image Synthesis
HuggingFace
0

WaDi: Weight Direction-aware Distillation for One-step Image Synthesis

Lei Wang, Yang Cheng, Senmao Li +2 more

#computer-vision#efficiency#gudaochangsheng