We are witnessing a rapid technological advance in increased capacity of autonomous decision-making in AI systems, leading to a new suite of challenges centred on agent interactions and cooperation. In June, we organised a Cooperative AI retreat in California, bringing together researchers from industry, academia, and non-profits, working across technical AI research, AI ethics, and AI governance. We are excited to share some insights from their conversations, which we believe have important implications for future AI deployment and policymaking.
The Cooperative AI Foundation supports research into how AI can assist humans in achieving higher social welfare, in multi-agent settings with mixed motives (i.e. with individual interests but potential group benefits from cooperation). We believe AI has immense potential to improve people’s lives, but many unsolved challenges will arise from the forecasted increase in interactions between AI systems. We are dedicated to the systems thinking needed to build prosocial technology, prevent harm, and safeguard societal interests and human dignity throughout the transition to advanced AI. There is no one best way to resolve complex problems across society globally, but we believe that sharing insights as a basis for open and transparent collective deliberation is a good bet. In particular, the insights from governance discussions at the retreat addressed four themes:
AI systems are rapidly developing decision-making capabilities. In some contexts, they are already being deployed as autonomous agents acting in the world with limited human oversight, as with autonomous vehicles. We expect that the competitive advantages for people, organisations, and countries that use AI agents will rapidly accelerate their deployment, leading in turn to ever more interactions between AI systems with real-world consequences. This will bring entirely new governance challenges.
Currently, multi-agent research mostly discusses scenarios when the agents have similar capabilities. The area of heterogeneous multi-agent AI system interactions is understudied, and is currently not discussed in governance frameworks.
AI-augmented human decision-making has been an area of concern for many years (e.g., due to bias leading to unfair treatment). The impact of recent advances in generative models on human decision-making is largely unexplored. Past use of AI algorithms can be audited and retroactively corrected. But real-time decisions, such as in diplomatic and military contexts, can carry larger consequences. We are at risk of implementing tools not fully understood or auditable to make high-stakes decisions. Cooperative AI can help bring clarity and caution to this fast-progressing area that promises a competitive advantage to users.
Evaluations of powerful AI systems are beginning to inform standards and regulatory oversight. But those evaluations rarely consider interactions between AI systems, and policies need to address the distinct risks arising from these interactions. Sandbox evaluations should assess multiple interacting agents, not just individual agent responses.
Multiple levels need to be considered for the governance of multi-agent interactions. The behaviour of each individual AI system remains important: fine-tuning for appropriate behaviour needs to include pro-social cooperative behaviour while reducing possible collusive, coercive, or deceptive behaviours. But action is also needed in the wider ecosystem within which agents will be deployed, as suggested in some of the governability points below.
It is important to push forward research into governability features and trustworthiness of advanced AI systems, which might entail visibility and traceability of agents (implementation of solutions such as IDs for AI agents and tools for tracking their behaviour), control and establishing liability and accountability in multi-agent scenarios.
Governability features require new infrastructure, which cannot be created by regulators alone. The private sector will play a substantial role in building infrastructure to ensure the preconditions for enforcement of safeguards and allow third-party evaluations. In addition to agent IDs and behaviour tracking, such infrastructure might include mechanisms for identifying norm violations and making decentralised responses to these.
One of our first steps towards setting evaluation standards is the NeurIPS 2024 Concordia Contest in collaboration with Google DeepMind, MIT, University of Washington, UC Berkeley, and UCL. Concordia is an environment designed to study interactions between language model agents and help us further identify what comprises a cooperative agent. This contest can inform the development of future research on the cooperativeness of agents and shine a light on promising methods of evaluation.
The future relevance and scalability of current governance forms are uncertain as AI advances. At present, human oversight has to take a principal role in high-stakes decision-making in AI systems to ensure liability and accountability. This human oversight may no longer be sustainable as the advantages of rapid, autonomous AI decision-making increase pressure to deploy AI agents. We will need governance structures to account for this deployment, including instances where advanced AI systems play supervisory roles for other systems/agents.
Humans are often aided by emotional intelligence, helping us empathise and understand the impact of our actions to achieve pro-social outcomes. The governance of interacting AI systems needs to take into account the fact that AI systems might not possess such “normative competence” or other relevant biological and social features by default. At the same time, could AI systems, through the mechanisms needed to overcome the lack of these features, find ways to enhance human cooperation by discovering new solutions to old conflicts?
One example of how AI systems might improve human cooperation or conflict resolution is in the area of bargaining. In negotiations where either party needs to save face with their allies or domestic audiences, they may be unwilling to risk offering compromise options. AI-agent-mediated negotiations may enable parties to share sensitive compromise positions with the AI system, without revealing the negotiating parameters to anyone else, including their negotiating adversaries. If there is space for a mutually beneficial agreement, the negotiation can succeed; if not, all secret information about the possible compromises would be securely erased.
December 6, 2024