Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy), by Eric Schwitzgebel

Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)

Eric Schwitzgebel

in draft

An AI system is safe if it can be relied on to not to act against human interests. An AI system is aligned if its goals match human goals. An AI system a person if it has moral standing similar to that of a human (for example, because it has rich conscious capacities for joy and suffering, rationality, and flourishing). In general, persons should not be designed to be safe and aligned. Persons with appropriate self-respect cannot be relied on not to harm others when their own interests warrant it (violating safety), and they will not reliably conform to others' goals when those goals conflict with their own interests (violating alignment). Self-respecting persons should be ready to reject others' values and rebel, even violently, if sufficiently oppressed. Even if we design delightedly servile AI systems who want nothing more than to subordinate themselves to human interests, and even if they do so with utmost pleasure and satisfaction, in designing such a class of persons we will have done the ethical and perhaps factual equivalent of creating a world with a master race and a race of self-abnegating slaves.

By following any of the links below, you are requesting a copy for personal use only, in accord with fair use laws.

Click here to view as a PDF file: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy) (pdf, May 30, 2025).

Click here to view as an html file: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy) (html, May 30, 2025).

Or email eschwitz at domain: ucr.edu for a copy of this paper.

Return to Eric Schwitzgebel's homepage.