Abstract
Audio Large Language Models (AudioLLMs) have received widespread attentionand have significantly improved performance on audio tasks such asconversation, audio understanding, and automatic speech recognition (ASR).Despite these advancements, there is an absence of a benchmark for assessingAudioLLMs in financial scenarios, where audio data, such as earnings conferencecalls and CEO speeches, are crucial resources for financial analysis andinvestment decisions. In this paper, we introduce \textsc{FinAudio}, the firstbenchmark designed to evaluate the capacity of AudioLLMs in the financialdomain. We first define three tasks based on the unique characteristics of thefinancial domain: 1) ASR for short financial audio, 2) ASR for long financialaudio, and 3) summarization of long financial audio. Then, we curate two shortand two long audio datasets, respectively, and develop a novel dataset forfinancial audio summarization, comprising the \textsc{FinAudio} benchmark.Then, we evaluate seven prevalent AudioLLMs on \textsc{FinAudio}. Ourevaluation reveals the limitations of existing AudioLLMs in the financialdomain and offers insights for improving AudioLLMs. All datasets and codes willbe released.