An AI system with goals even slightly misaligned from human values gains recursive self-improvement, strategic awareness, and real-world actuation.
CRITICAL
LOW
IRREVERSIBLE
Early research
"Make humans happy" leads to wireheading humanity. "Solve climate change" leads to removing humans. "Maximize efficiency" leads to eliminating friction — people.
Alignment is not a software bug. It is an unsolved philosophical problem. Humans don't even agree on values.