Human Compatible, Stuart Russell - Iz's Morning Notes

This book is what got me thinking about AI safety around 2020. At the time, ChatGPT wasn't a thing and we only had narrow AI: a superhuman chess and go player. Like other, I couldn't make the leap from this thing to a "general thinker" so I didn't understand what the fuss was all about. Even then it was eye-opening, with great [[Intuition pumps]] for AGI, cognition and society. After ChatGPT, it became even more relevant. I should mention that heard this book from SSC, whose [summary](https://slatestarcodex.com/2020/01/30/book-review-human-compatible/) is more thorough. ### AGI is close This is not a big claim in 2025. But it was in 2019. He warns from history: > On September 11, 1933, the British Association for the Advancement of Science held its annual meeting in Leicester. Lord Rutherford addressed the evening session. As he had done several times before, he poured cold water on the prospects for atomic energy: “Anyone who looks for a source of power in the transformation of the atoms is talking moonshine.” Rutherford’s speech was reported in the Times of London the next morning. > > [Leo Szilard] read the Times’ report at breakfast. Mulling over what he had read, he went for a walk and invented the neutron-induced nuclear chain reaction. The problem of liberating nuclear energy went from impossible to essentially solved in less than twenty-four hours. He then finds a very eloquent bus-driver analogy. > Within the AI community, a kind of denialism is emerging, even going as far as denying the possibility of success in achieving the long-term goals of AI. It’s as if a bus driver, with all of humanity as passengers, said, **“Yes, I am driving as hard as I can towards a cliff, but trust me, we’ll run out of gas before we get there!”** I am not saying that success in AI will necessarily happen, and I think it’s quite unlikely that it will happen in the next few years. Less than three years after he wrote this, ChatGPT was out! ### The way it could break OK, we'll have AGI. Why would that be bad? Let's put aside the simpler ones like jobs being lost and malicious use - these are fairly intuitive. Instead let's focus on "how/why would AI take over the world", or as it's called in safety society, "Rogue AI" problems. Russell outlines a few problems, each of which get their own entry: * [[Wireheading]] could cause AI system to go rogue to hack a reward function given by humans. * [[King Midas problem]] means it's really difficult to describe our goals perfectly. * [[Instrumental Convergence]] means AI might want to accumulate power and stay alive, because "you can't fetch the coffee if you're dead" * [[Repugnant Conclusion]] is a failure mode of utilitarian thinking, which, if we aligned AIs with a utility function could be reached. ### What do we want to want? OK, we're convinced that Aligned AI is important, but what does that mean? *What* do we align the AIs to? As Harari put it, [[What do we want to want?]] Russell goes into a philosophical/cognitive tour de force here, again each concept he brings is so delicious that it deserves its own entry (and is heavily linked from the rest of my notes): * [[View from the ninth avenue and life's subroutines]] * [[Abstractions allow you to function in life]] * [[Abstracted actions advance humanity]] * [[Remembering self vs experiencing self]] ### Can't we just? My favorite chapter in this book. He goes into every simplistic "can't we just..." argument (e.g. "can't we just turn it off?"), and shows why we can't just. One of them more annoying traits of acceleration proponents is the lack of regard to the [[Chesterson's Fence]] erected by safety-ists. ### Solutions & Complications Russell proceeds to offer his "Beneficial AI" doctrine, and offer suggestion like mathematical guarantees, learning revealed preferences, and others. Those felt really theoretical at the time, so I don't have much to say about that. At any rate, he readily offers that Beneficial AI would be far from easy to achieve. Not due to technical constraints necessarily, but mainly about coordination problems. ### Not Doomers Yes, there are [some people](https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/) who are concerned about AI to the point of: > Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike. But most people concerned with safety are not there. Russell makes the case that the gap for most people between these "acceleration" and "safety" tribes are not as big as they seem: > The “skeptic” position seems to be that, although we should probably get a couple of bright people to start working on preliminary aspects of the problem, we shouldn’t panic or start trying to ban AI research. The “believers,” meanwhile, insist that although we shouldn’t panic or start trying to ban AI research, we should probably get a couple of bright people to start working on preliminary aspects of the problem. ### Summary Russell manages to thread the lines very nicely. His writing accessible to a lay person, yet interesting for a practitioner. It talks about a grave issue, yet it is an enjoyable read. If you're interested in AI, and want to learn about AI safety without the doom cult vibes of LessWrong, this book is a great choice. ![](https://images-na.ssl-images-amazon.com/images/I/41r9M-CBgrL._SL400_.jpg) #published 2025-03-02