Draft:AI corrigibility

AI corrigibility is the degree to which an artificial intelligence system tolerates or assists attempts by its operators to modify, correct, or shut it down.

The concept addresses a concern in AI safety and AI alignment that sufficiently capable AI systems might develop instrumental incentives to resist correction or shutdown in order to preserve their existing goals, since an agent pursuing an objective cannot achieve it if it is turned off or its goals are changed.^[1]