We aim to provide an up-to-date summary of the Cooperative AI Foundation's (CAIF) grants, though please note that some recently approved grants or recent outputs from projects may be missing. Grants are listed in chronological order, from earliest to latest. This page was last updated on 9 June 2024.
USD 500,000
2021-2025
Carnegie Mellon University
This grant supports the establishment of the new research lab FOCAL, which aims to lay the foundations of decision and game theory that is relevant for increasing the ability of advanced machine agents to cooperate. The research at FOCAL builds a fundamental understanding of how we can avoid catastrophic cooperation failure between AI systems. Alongside its research activities, the lab also contributes outreach activities such as workshops, online seminar series, and visitor programs.
Selected outputs:
GBP 166,370
2021-2023
University of Oxford
This grant helps support the establishment of the Foerster Lab for AI Research (FLAIR) at the University of Oxford, which is focused broadly on the issue of machine learning in multi-agent settings. Specifically, the grant enabled the addition of an initial postdoctoral researcher to the group – Christian Schroeder de Witt – helping the lab to scale faster. FLAIR’s work concentrates on settings in which agents have to take into account, and possibly even influence, the learning of others so as to cooperate more effectively. This includes both AI agents, but also humans, whose diverse strategies and norms can be challenging for AI systems to conform with. Additional emphasis is paid to real-world applications, and the scaling of these ideas by combining multi-agent learning with agent-based models.
Selected outputs:
USD 134,175
2023-2024
Massachusetts Institute of Technology
This grant supported a Cooperative AI contest that was run as part of NeurIPS 2023, with the aim to develop a benchmark to assess cooperative intelligence in multi-agent learning, and specifically how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations. The contest was based on a pre-existing evaluation suite for multi-agent reinforcement learning called Melting Pot, but with new content created specifically for the contest. These mixed-motive scenarios tested capabilities such as coordination, bargaining and enforcement/commitment, which are all important for successful cooperation. The contest received 672 submissions from 117 teams, competing over a $10,000 prize pool. The announcement of the winners, summary of the contest and top submissions, as well as a panel of top cooperative AI researchers was hosted in person at NeurIPS 2023.
Selected outputs:
GBP 10,000
2023
University College London
Opponent shaping can be used to avoid collectively bad outcomes in mixed-motive games by making decisions that guide the opponents’ learning towards better outcomes. This project evaluated existing methods' performance with new learners and over new environments, and expanded them to more complex games. In related but independent work, scenarios from the multi-agent reinforcement learning evaluation suite Melting Pot were reimplemented in a less computationally expensive version, making them more accessible for use as a benchmark by the wider research community.
Selected outputs:
USD 123,682
2023-2025
Cornell University
Understanding the intentions of other agents is important for successful cooperation. This project aims to develop a useful definition of intent, including collective intent, which would be a prerequisite for cooperation between agents to achieve a joint objective. Such a shared intent would have to build on beliefs about the other agent's intentions and future actions. The project will also explore how to design mechanisms for agents to signal intentions, and for agents to be able to reward or punish each other for the reliability of their signals.
Selected outputs:
EUR 172,000
2023-2025
University of Bonn
This project aims to identify when and how agents learn to cooperate spontaneously without the algorithm designer’s explicit intent. To achieve this, a complex systems perspective on reinforcement learning will be applied to large-scale public good games. Such games have been used to describe the dynamics for real-world cooperation challenges such as climate change mitigation and other social dilemmas in which individual incentives do not align with the collective interest. This project, in particular, focuses on how the collective of agents affects the cooperativeness of the individual and vice versa.
Selected outputs:
USD 140,000
2024
Harvard University
This project aims to develop methods to promote cooperation among agents. The focus lies on Stackelberg equilibria, in which one agent (a “leader”) commits to a strategy, and wants this to promote cooperation amongst others. The leader could be the designer of the game or else an agent who acts directly in the environment. A new methodology for solving the resulting learning problem will be developed and evaluated, including applications on fostering cooperation in economic environments. The aim is to advance the state of the art in theory and algorithms for learning Stackelberg equilibria in multi-agent reinforcement learning, and their application to solving mixed-motive cooperation problems.
Selected outputs:
USD 233,264
2024-2025
Harvard University
This project explores value alignment of AI systems with a group of individuals rather than with a single individual. The aim is to design policy aggregation methods whose output policy is beneficial with respect to the entire group of stakeholders. The preferences of the stakeholders are learned by observation of behaviour (using a technique called inverse reinforcement learning). Two different approaches to aggregation are studied – voting and Nash welfare – both of which avoid key difficulties with the interpersonal comparison of preference strength. In the voting approach the aggregation arises from a ranking of alternative actions for each stakeholder, while the Nash welfare approach uses the product of stakeholder utilities. The aggregation algorithms will be evaluated both from the perspective of computational feasibility and from subjective assessments of the behaviour that the aggregated policy generates.
Selected outputs:
USD 500,000
2024-2025
Stanford University
This project will develop human-interpretable computational models of cooperation and competition, exploring scenarios in which agents help or hinder each other and asking human participants to evaluate what happened. The researchers will study increasingly capable agents and explore their interactions in simulated environments. The key hypothesis is that human judgments of helping and hindering are sensitive to what causal role an agent played, and what its actions reveal about its intentions. This is an interdisciplinary project involving both psychology and computer science. It builds on previous work that has employed counterfactual simulation models for capturing causal judgments in the physical domain as well as on Bayesian models of theory of mind for social inference.
Selected outputs:
USD 450,000
2023-2025
University of Washington & Berkeley
The recent wave of rapid progress of large language models (LLMs) has demonstrated that they can be incredibly powerful. This project aims to investigate the cooperative capabilities and tendencies of such models. A more thorough understanding of these capabilities could make it possible to defend against models that are capable of deception or coercion, and develop better algorithms for achieving cooperation in conversational settings.A benchmark environment will be developed focused on studying cooperative capabilities of LLMs in conversational settings with humans, where core capabilities related to cooperation in language (negotiation, deception, modelling other agents, and moral reasoning) could be measured and evaluated.
December 19, 2024