Cooperative AI

Collusion

What It Is and Why It Matters

In this area, we would like to see work that studies what can go wrong when we don’t want or expect AI agents to work together. Collusion (undesired cooperation) between agents could for example lead them to bypass safeguards or laws. We believe work in this area will be important for monitoring and governance as deployment of advanced and interacting AI systems becomes more widespread throughout society.

Specific Work We Would Like to Fund

Development of methods for detection of collusion between AI systems, including steganographic collusion. Such work could, for example, build on using information-theoretic measures or interpretability tools.
Development of mitigation strategies for preventing collusion such as oversight regimes, methods for steering agents, restricted communication protocols and/or control of communication channels.
Theoretical and empirical work that aims to provide general results about the conditions that make collusion easier or harder between AI agents, such as the similarity of the agents, forms of communication, the number of agents, the environmental structure, agents’ objectives, etc.
Development of benchmarks and evaluations for measuring AI agents’ ability and propensity to collude. We would be particularly interested in work on creating a major, complex benchmark environment for collusion. Such a benchmark could assess capabilities and/or propensities that influence the likelihood of collusion, and should be ambitious in its aims to significantly advance research on collusion.

Key Considerations

We expect it to be challenging to do work in this area that will be informative about advanced AI systems. Proposals should explicitly aim for results that can be expected to generalise to large and complex systems consisting of advanced AI agents, and it is important that the phenomena studied can reasonably be expected to be significant for the societal impact of AI systems in the long term.
For work on conditions that make collusion more or less likely, the expected generality of the results would be crucial in the assessment of the proposal. Merely demonstrating the correlation between certain conditions in a specific setting and the likelihood of collusion in that setting is not something that we are likely to fund.

‍

References

‍

Priority Research Areas

Understanding and Evaluating Cooperation-Relevant Propensities (High Priority)

Learn More →

Understanding and Evaluating Cooperation-Relevant Capabilities (High Priority)

Learn More →

Incentivizing Cooperation Among AI Agents

Learn More →

AI for Facilitating Human Cooperation

Learn More →