What I Know about Getting Started with AI-Alignment

19 Aug 2021

While I’m not personally working on AI-alignment, I think it’s likely pretty important, and want to help increase the knowledge-base around it and access-to-information about it. This post is a short resource list based on personal readings and conversations I’ve had with friends interested in AI alignment.

At a high-level, what is AI-alignment

One day we will likely transfer control of important applications. If you can imagine that these AI could be as intelligent as human beings, but lack the cultural, ethical, moral, or biological constraints that other humans have, it’s fairly easy to imagine that these AI’s goals may be drastically different from our own. AI-alignment is the challenge of trying to align the goals of AI agents with the goals of humans. In particular, this challenge focuses on doing so while we still have a moderate understanding of how they work, and good control over what they work on.

A good place to learn more about AI alignment is Stuart’s book Human Compatible. Rohin Shah summarized it in the Alignment Newsletter #69.

A quick example of an AI-alignment challenge

The most common introduction to problems in AI alignment is the paperclip maximizer. This relates to Instrumental Convergence, where an intelligent actor is given a straightforward goal, but may act in surprising or harmful ways.

Instrumental convergence posits that an intelligent agent with unbounded but apparently harmless goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving an incredibly difficult mathematics problem like the Riemann hypothesis could attempt to turn the entire Earth into one giant computer in an effort to increase its computational power so that it can succeed in its calculations - Wikipedia

Hopefully this simple example of a challenge in AI alignment sparks your interest a bit. From here on, all I can do is point you to a bunch of other resources on AI-alignment which do a much better job discussing it.

