AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models

Abstract

This study introduces AGGA, a dataset comprising 80 academic guidelines forthe use of Generative AIs (GAIs) and Large Language Models (LLMs) in academicsettings, meticulously collected from official university websites. The datasetcontains 188,674 words and serves as a valuable resource for natural languageprocessing tasks commonly applied in requirements engineering, such as modelsynthesis, abstraction identification, and document structure assessment.Additionally, AGGA can be further annotated to function as a benchmark forvarious tasks, including ambiguity detection, requirements categorization, andthe identification of equivalent requirements. Our methodologically rigorousapproach ensured a thorough examination, with a selection of universities thatrepresent a diverse range of global institutions, including top-rankeduniversities across six continents. The dataset captures perspectives from avariety of academic fields, including humanities, technology, and both publicand private institutions, offering a broad spectrum of insights into theintegration of GAIs and LLMs in academia.