
X.com
19523
Natural Emergent Misalignment from Reward Hacking in Production RL
Monte MacDiarmid, Benjamin Wright, Jonathan Uesato +19 more
#reinforcement-learning#safety-alignment
The most upvoted and starred AI research crossing the community today.
Last Updated: Mar 14, 2026, 11:55 AM PT

Monte MacDiarmid, Benjamin Wright, Jonathan Uesato +19 more

Yinjie Wang, Xuyang Chen, Xiaolong Jin

Changyao Tian, Danni Yang, Guanzhou Chen

Bingxiang He, Yuxin Zuo, Zeyuan Liu

Zorik Gekhman, Roee Aharoni, Eran Ofek

Nathan Godey, Yoav Artzi

Yulu Gan, Phillip Isola
Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman +2 more
Tzu-Heng Huang, Sirajul Salekin, Javier Movellan +2 more
Teng Xiao, Yige Yuan, Hamish Ivison +2 more
Yulu Gan, Phillip Isola
Shahriar Noroozizadeh, Xiaobin Shen, Jeremy C. Weiss +1 more
Lei Wang, Yang Cheng, Senmao Li +2 more
System online.