Stone, Ad hoc autonomous agent teams, Collaboration without pre-coordination - Ram Rachum / AI Safety Research

[Paper](https://www.cs.utexas.edu/~pstone/Papers/bib2html/b2hd-AAAI10-adhoc.html) ## Absolute good One thing that bothers me here is the same thing that bothers me with other AHT-like research in the MARL world. There's a general theme of absolute good, or "why can't we all just get along". For example, in section 5, the authors describe 3 steps for what an AHT player should do to succeed. These make sense, but I feel like there's an assumption that different teammates could require mutually exclusive behaviors. The approach here is like one big arrow towards the good way for an agent to act instead of acknowledging that it's a tornado of small arrows. ## Is AHT actually two problems in one? See Section 2 (Evaluation). There seem to be two separate problems here: 1. We want our agents to be able to cooperate well with as many the agents in $A$, but that set might be arbitrarily large and we might not have enough resources to train them with each of these agents. 2. When we evaluate the agents, they don't know which subset $B\subset A$ of agents they're playing with. Maybe these two problems should be studied separately? ## Random distribution over $A$ The evaluation scheme takes agents from $A$ randomly. But it's actually misleading. $A$ is a set and there are no weights to the agents. Let's say that there's an agent in $A$ that behaves in a certain way. We could actually include that agent a million times in $A$ if we play with the concepts of equality for agents, i.e. have one million nearly-identical agents that behave like that agent. In that way we make the evaluation all about learning to cooperate with that specific kind of agent, instead of being about adapting to different kinds of agents. ## Equivalence classes We can't train with all the agents in advance, but we can train with some agents. If we train with certain agents, it will make us better with similar agents. There's something like equivalence classes here, but not discrete. I'm not sure what to do with this. ## Miscellaneous * This paper inspired me to describe the experiment [[Stubster]].