[ home | resume | contact | science | art | personal | email ]

Roy-Stang, Z. & Davies, J. (2025). Human Biases and Remedies in AI Safety and Alignment Contexts. AI and Ethics. 1-23. https://doi.org/10.1007/s43681-025-00698-5

Cite this for: Publisher: Nature Springer

BibTex Entry:

@article{RoystangDavies2025,
  title={Human biases and remedies in AI safety and alignment contexts},
  author={Roy-Stang, Zo{\'e} and Davies, Jim},
  journal={AI and Ethics},
  pages={1--23},
  year={2025},
  publisher={Springer}
}

This is an open access article.
Download: [ local PDF | Publisher's Website]

Abstract

Errors in judgment can undermine artificial intelligence (AI) safety and alignment efforts, leading to potentially catastrophic consequences. Attitudes towards AI range from total support to total opposition, and there is little agreement on how to approach the issues. We discuss how relevant cognitive biases could affect the general public’s perception of AI developments and risks associated with advanced AI. We focus on how biases could affect decision-making in key contexts of AI development, safety, and governance. We review remedies that could reduce or eliminate these biases to improve resource allocation, prioritization, and planning. We conclude with a summary list of ‘information consumer remedies’ which can be applied at the individual level and ‘information system remedies’ which can be incorporated into decision-making structures, including decision support systems, to improve the quality of decision-making. We also provide suggestions for future research on biases and remedies that could contribute to mitigating global catastrophic risks in the context of emerging, high-risk, high-reward technologies.


JimDavies ( jim@jimdavies.org )