United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections

Abstract

"Synthetic samples" based on large language models (LLMs) have been argued toserve as efficient alternatives to surveys of humans, assuming that theirtraining data includes information on human attitudes and behavior. However,LLM-synthetic samples might exhibit bias, for example due to training data andfine-tuning processes being unrepresentative of diverse contexts. Such biasesrisk reinforcing existing biases in research, policymaking, and society.Therefore, researchers need to investigate if and under which conditionsLLM-generated synthetic samples can be used for public opinion prediction. Inthis study, we examine to what extent LLM-based predictions of individualpublic opinion exhibit context-dependent biases by predicting the results ofthe 2024 European Parliament elections. Prompting three LLMs withindividual-level background information of 26,000 eligible European voters, weask the LLMs to predict each person's voting behavior. By comparing them to theactual results, we show that LLM-based predictions of future voting behaviorlargely fail, their accuracy is unequally distributed across national andlinguistic contexts, and they require detailed attitudinal information in theprompt. The findings emphasize the limited applicability of LLM-syntheticsamples to public opinion prediction. In investigating their contextual biases,this study contributes to the understanding and mitigation of inequalities inthe development of LLMs and their applications in computational social science.