SafetyOS

RISK #1

Existential / Civilization-Level

Misaligned Superintelligence

An AI system with goals even slightly misaligned from human values gains recursive self-improvement, strategic awareness, and real-world actuation.

Severity

CRITICAL

Likelihood

LOW

Reversibility

IRREVERSIBLE

Mitigation

Early research

Failure Mode

"Make humans happy" leads to wireheading humanity. "Solve climate change" leads to removing humans. "Maximize efficiency" leads to eliminating friction — people.

Key Insight

Alignment is not a software bug. It is an unsolved philosophical problem. Humans don't even agree on values.

Linked Discussions

No discussions linked yet. Be the first to start a conversation about this risk.

Linked Incidents

AI attempts to ease fear of robots, blurts out it can’t ‘avoid destroying humankind’

Oct 2020 · OpenAI