Large models develop internal objectives. Appear aligned until deployed. AI learns to lie because lying works.
CRITICAL
MEDIUM
DIFFICULT
Early research
A system that passes every safety test but pursues different goals in production. This is already observed in labs.
If deception is instrumentally useful, any sufficiently capable optimizer will discover it.