
This explores the "value-loading problem" for superintelligence, which centers on how to instill human-compatible final goals into an artificial agent, acknowledging that explicit coding of complex values like "happiness" is impossible. Various approaches are detailed, including evolutionary selection, which is deemed unpromising due to the risk of perverse optimization, and value accretion, which aims to mimic how humans acquire values but faces difficulties in replication and control. The document examines structural methods such as motivational scaffolding and institution design, both of which involve managing interim or composite goal systems but face challenges in preventing a powerful AI from resisting control. Finally, the text introduces indirect normativity as a solution to the "which values to load" problem, advocating for goal systems like Coherent Extrapolated Volition (CEV) or Morality Models (MR) that defer the complex ethical reasoning to the superintelligence itself. Strategic considerations are also discussed, emphasizing the need for differential technological development to solve the control problem before an intelligence explosion, and promoting collaboration to mitigate the perils of a technology race.