Abstract
As Automatic Speech Recognition (ASR) models become ever more pervasive, itis important to ensure that they make reliable predictions under corruptionspresent in the physical and digital world. We propose Speech Robust Bench(SRB), a comprehensive benchmark for evaluating the robustness of ASR models todiverse corruptions. SRB is composed of 114 input perturbations which simulatean heterogeneous range of corruptions that ASR models may encounter whendeployed in the wild. We use SRB to evaluate the robustness of severalstate-of-the-art ASR models and observe that model size and certain modelingchoices such as the use of discrete representations, or self-training appear tobe conducive to robustness. We extend this analysis to measure the robustnessof ASR models on data from various demographic subgroups, namely English andSpanish speakers, and males and females. Our results revealed noticeabledisparities in the model's robustness across subgroups. We believe that SRBwill significantly facilitate future research towards robust ASR models, bymaking it easier to conduct comprehensive and comparable robustnessevaluations.