Abstract
Large language models (LLMs) provide a compelling foundation for buildinggenerally-capable AI agents. These agents may soon be deployed at scale in thereal world, representing the interests of individual humans (e.g., AIassistants) or groups of humans (e.g., AI-accelerated corporations). Atpresent, relatively little is known about the dynamics of multiple LLM agentsinteracting over many generations of iterative deployment. In this paper, weexamine whether a "society" of LLM agents can learn mutually beneficial socialnorms in the face of incentives to defect, a distinctive feature of humansociality that is arguably crucial to the success of civilization. Inparticular, we study the evolution of indirect reciprocity across generationsof LLM agents playing a classic iterated Donor Game in which agents can observethe recent behavior of their peers. We find that the evolution of cooperationdiffers markedly across base models, with societies of Claude 3.5 Sonnet agentsachieving significantly higher average scores than Gemini 1.5 Flash, which, inturn, outperforms GPT-4o. Further, Claude 3.5 Sonnet can make use of anadditional mechanism for costly punishment to achieve yet higher scores, whileGemini 1.5 Flash and GPT-4o fail to do so. For each model class, we alsoobserve variation in emergent behavior across random seeds, suggesting anunderstudied sensitive dependence on initial conditions. We suggest that ourevaluation regime could inspire an inexpensive and informative new class of LLMbenchmarks, focussed on the implications of LLM agent deployment for thecooperative infrastructure of society.