The Research Desk.

The most upvoted and starred AI research crossing the community today.

Last Updated: Mar 15, 2026, 3:03 PM PT

X.com Research Buzz

Natural Emergent Misalignment from Reward Hacking in Production RL
X.com
20509

Natural Emergent Misalignment from Reward Hacking in Production RL

Monte MacDiarmid, Evan Hubinger

#reinforcement-learning#safety-alignment

AlphaXiv Trending

Training Language Models via Neural Cellular Automata
AlphaXiv
41

Training Language Models via Neural Cellular Automata

Dan Lee, Seungwook Han, Akarsh Kumar

#machine-learning#nlp#alphaxiv
Ranking Reasoning LLMs under Test-Time Scaling
AlphaXiv
38

Ranking Reasoning LLMs under Test-Time Scaling

Mohsen Hariri, Michael Hinczewski, Jing Ma

#nlp#reasoning#alphaxiv
Ψ
0
Ψ
0
	​

: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation
AlphaXiv
36

Ψ 0 Ψ 0 ​ : An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei, Hongyi Jing, Boqian Li

#robotics#alphaxiv