Autocurricula is a crucial concept in RL, the study of our culture and the quest for ethical AI - Ram Rachum / AI Safety Research

%% ## Inbox I think I had thought about splitting away the last bit parents and children, explain how meaningful that is, we want AI to be good and that's part of it. When you want your children to have your values, part of the work is imparting these values to them, and the other part is changing your values to fit theirs. This give-and-take is immensley important. As much as it's difficult to change our values, we don't want our progeny to take our values unmodified. They have to fight us to add their own changes to our value system. They don't want to win without a fight either. This fight between us is extremely important. It ensures that the culture that we create together has been thorughly battle-tested. Anything that didn't survive the battle was removed. This is needed for that culture to survive in the hostile world. %% Autocurricula is one of the most important concepts in Reinforcement Learning. It's introduced in [the paper "Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research"](https://arxiv.org/abs/1903.00742) by DeepMind. I'm conflicted about this paper, because I love the concept of autocurricula so much, yet I feel that the paper understates the importance of this concept, since it starts with the main motivation of solving a technical problem in RL. An RL autocurricum is best exemplified in the [Hide and Seek video](https://www.youtube.com/watch?v=kopoLzvh5jY) ([paper](https://arxiv.org/abs/1909.07528)). Basically, an autocurriculum is when there are multiple levels of learning stacked up on top of each other. They are stacked because each level of learning relies on the level that came before. In the case of the Hide and Seek video, the autocurriculum was adverserial. There were at least 5 levels of learning, but let's look at a partial sequence of just 3 levels: The hiders learn to build a shelter from the seekers. Then the seekers learn to break into the shelter using the ramps. Then the hiders learn to hide the ramps before they build the shelter. Level 3 couldn't have come before level 2; the hiders would have no reason to hide the ramps before the seekers started using the ramps to break into their shelters. The fact this autocurriculum is adverserial made it extra fun to watch. Most of the autocurricula I've seen in research are not adverserial. Here's [a video example](https://www.youtube.com/watch?v=4MJWExnPkzc) of the last stage of an autocurriculum in a single-agent environment[^1]. The agent wants to eat as many apples as possible, but if it eats all the apples off a tree before they can regenerate, the tree dies. At first the agent walks randomly around the world. When it discovers that apples are tasty, it starts pigging out on all the apples. Then the trees die, and the agent learns to eat only some of the apples every time. As in the Hide and Seek example, the last stage (learning to conserve apples) couldn't have come before the previous stages. This property where there are multiple levels of learning is something that happens in human society; we were able to reproduce it as emergent behavior in MARL experiments, and that's a great milestone in the journey to bring AI closet to humanity. I believe that an autocurriculum is a crucial phenomenon, both in Reinforcement Learning and in the study of our culture. In the RL world, it's possibly more important that the rudiments of RL, such as states, observations, rewards and actions. I see these rudiments as stepping stones to autocurricula. This means that if a different branch of ML, that doesn't have actions and rewards, was able to produce autocurricula, I would treat it as equivalent to RL. I'd consider that a case of "truth is one, sages call it by various names." ## Examples of autocurricula in our culture Most of human culture is a remix of behaviors that appeared before, repurposed to mean new things. Language is a treasure trove of such examples. When we say "Goodbye" to leave, we don't mean to say "God be with you". That's the origin of the phrase from the 16th century, but humanity has been using it for so long that the meaning slowly drifted. The new meaning has stuck so hard, that the previous meaning was almost completely forgotten. We don't have to go back 400 years to find examples. Podcasts were called podcasts because you listened to them on the iPod. We keep using that name even though very few of us listen to them on actual iPods. The name works well enough for us to forgive ourselves for it being inaccurate. One day a child will ask us why a podcast is called a podcast, and we'll have to say "Back in my day... We wore an iPod on our belt, [which was the style at the time.](https://youtu.be/yujF8AumiQo)" The theme of "let's do something like what they did before, but change it around a bit" is in every aspect of our culture: Religion, Music, Fashion, Architecture, Law. The two examples above were of a concept that kept most of its original meaning through the years, but changed on the details. As we grow older we notice concepts that *radically* change their meaning. This is sometimes enchanting to us and sometimes horrifying. I've read a few furious reports from millennials that some Gen Z kids think that Nirvana is a clothing brand. (This is because there are so many Nirvana shirts out there, that they became a fashion statement.) Nirvana holds a dear place in the heart of many people. They have connotations of rebelliousness and rage of the young against the old. This is very different from the new meaning, which is "another flavor you can buy for $50 and add to the intriguing cocktail that is your personal presentation." Quick thinkers would point out that the difference between "fashion brand" and "rebellious band" isn't any starker than that between "rebellious band" and the previous meaning of Nirvana, which is "the goal of the Buddhist path that marks the soteriological release from worldly suffering and rebirths in saṃsāra." One example I sometimes think about is pirates. Pirates, as far as I understand them, are groups of criminals that steal, rape and murder anything that they can. When you think about it, it's bewildering that they've become heroes of children's shows, especially of the insanely popular Pirates of the Carribbean film franchise by none other than Disney. When I think about that, it makes me think of humanity as nothing more than a kid roaming in his grandpa's attic, finding ancient relics and playing soldiers with them. ## Autocurricula as our culture's mechanism of change In the section above, I described autocurricula as a pattern I see in human culture, which many other people have seen before. I claim something stronger: Autocurricula are a very strong feature of our culture, sometimes stronger than any actual content. When we're listening to new music, we evaluate in the context of the music we know. Maybe it's similar to something we already like, but a little spicier. Maybe it's completely different than anything we heard before, which can astound us or horrify us; regardless of whether our emotional reaction is positive or negative, it's greatly informed by the relationship this new music has with the existing genres that we know. In other words, autocurricula are the mechanism by which our culture changes. We can zoom in on a single concept that's undergoing a meaning change to try to find the seam. The result will be like zooming in on a stock that's changing in price. No one decides that it'll change in price; everyone decides that together, collectively, in many pairs of people that are negotiating with each other, resolving their differences and closing deals. For the meaning of Nirvana to successfully change from "rebellious band" to "fashion brand", many individual interactions need to _work well_. People who see the shirt in the store, at least in the 90's, need to recognize the band and be moved enough to buy the shirt. The cool kids at highschool need to wear Nirvana shirts, so everyone will have an association between Nirvana and coolness. Twenty years later, millennials furious about the revised meaning need to be just furious enough to stoke the fire of rebelliousness in the younger generation, but not too furious as to burn down the store with all the shirts. In other words, the evolution of our culture is an emergent phenomenon on top of the rules that govern our interactions. ## Autocurricula are the maps to our morality Let's go a step further. Many of these individual interactions on the seams of the autocurricula are of extreme disagreement. In the Nirvana example, the interaction between the furious millennial and the shirt-touting kid is such an example. When we have an extreme disagreement with another person, we try to discuss what's right, whether we should cave in to the other's position or pull them stronger to ours. We could discuss what would be the right choice with our opponent, with our friends and with ourselves. It becomes a moral question about good and bad. We ask these moral questions and try to decide when the change is good and when it's bad. When we do, we consider possible interactions that would need to work for this change to be made, and estimate how feasible they are. When I went to the theater to watch Star Wars episode VII, I expected it to be a bad movie. It was far worse than I expected. I thought to myself, "Is this... what movies are now? Is this the future? Am I going to just be okay with the way everyone is hyping this terrible movie?" When I think these thoughts, I imagine the interactions. I imagine myself stopping my friends' conversation for a 5-minute rant about how bad this movie was. Do I care about the movie enough that I'll be okay with people possibly seeing me as the cranky nerd in the group and not wanting to hang out with me anymore? Besides imagining my interactions, I imagine other peoples' interactions. If I'm gonna be in the group of cranky nerds, it means that the other members are all people who chose the 5-minute rant with the possible friend loss. What do I think about these people? Do I want to be a member of their group? There's a folk tale that I read in a Hebrew book when I was a kid. Looking it up online, it might come from Japanese culture, where it's called ["The Case of the Stolen Smell"](https://en.wikipedia.org/wiki/%C5%8Coka_Tadasuke#Famous_cases). A baker is angry that people are smelling the fumes of his cooking while eating their cheaper, dull food. He complains, but the judge decides that the smell of the food is free for anyone to smell. (In the Hebrew version, the baker is replaced by a child and the judge is replaced by their mother.) What is the great wisdom behind the decision that charging people for smelling food is immoral? Mostly the fact that this law would be so difficult to enforce, that no one will want to be on the side of enforcing it. Naively, it might seem lazy that we decide a rule is immoral just because it's difficult to enforce. If we decide our morality entirely by what would be easy to enforce, it would became worthless. But we sometimes do make these decisions, not only because of the resources that it'll cost us to enforce, but because of the path that these decisions will set us on. If you're the kind of person who charges people for smelling food, you're going to be grouped together with all the other people who make that decision. Is this a group that you want to be a part of? When we make moral decisions between good and bad, we consider the possible spectrum of interactions that make up the autocurricula, and try to estimate in which direction they'll flow. ## Ethical AI, parents and children The AI world is greatly concerned whether the Artificial General Intelligence that we'll eventually create will be ethical, i.e. will have values similar to ours. In the long term, as more power in our culture shifts from human minds to AI, the more we are at the mercy of the AI's decisions. Ethical consultants in tech companies ask "will our image recognition AI discriminate against people of certain races?" while futurists ask "will the entire human race itself be discriminated against as we become second class citizens on Earth?" If the question is "how can we make sure that our AI will share our values?" the first place I'll reach for would be the relationships between parents and children. I don't have children so I might be underqualified to write about that, though I do have extensive experience as a problem child. Parents want children to share their values, and their success is always partial, because children have a mind of their own. I break this task down to two parts: 1. Get the child to adopt as many of your values as possible, and 2. On some of the fronts where the child hasn't embraced your values, change your own values to be closer to theirs. This give-and-take between parents and children is exactly the kind of interaction we were talking about in the last two sections, i.e an interaction that's on the seam of the autocurricula. The start of the process can be modelled as maximizing a metric: "Get the child to adopt as many of your values as possible." After you max that out, the game changes. If your family is part of Religion A and your child wants to convert to Religion B, your first reaction would be to resist; after some amount of resistance, the consideration "I want my child to be their true self" overpowers and you give in. "The true self" is an abstraction. When we say that someone is being their true self, we mean that they're strongly inclined and motivated to behave in that way, despite any strong resistance. We actually supplied the strong resistance... Whether or not the child survived that resistance is a strong indicator to whether they'll survive the resistance of other people. Let's recap: When trying to impart our values to our children, we started with a process that's simple to model, which is maximizing a metric. The second part of the process is much more complex, emotionally and Game Theory-wise, and completely different from maximizing a metric. On one hand, if we try to model the task "Teach our AI to be ethical" as a metric to be maximized, the [leaks](https://en.wikipedia.org/wiki/Leaky_abstraction) in that abstraction might be so big that the task will fail. On the other hand, we are *very* concerned that the AI algorithms that we develop could hurt our society, and we're willing to spend a lot of resources to figure out how to do that. We're not going to just shrug and say this process is so chaotic that there's no point in trying to make sense of it. ## Autocurricula and the Ship of Theseus Todo: The value of each point as a step in a bridge. [^1]: Leibo et al. (https://deepmind.com/research/open-source/melting-pot