
X.com
20509
Natural Emergent Misalignment from Reward Hacking in Production RL
Monte MacDiarmid, Evan Hubinger
#reinforcement-learning#safety-alignment
The most upvoted and starred AI research crossing the community today.
Last Updated: Mar 15, 2026, 3:03 PM PT

Monte MacDiarmid, Evan Hubinger

Dan Lee, Seungwook Han, Akarsh Kumar

Mohsen Hariri, Michael Hinczewski, Jing Ma

Songlin Wei, Hongyi Jing, Boqian Li
System online.