Whispering in Amharic: Fine-tuning Whisper for Low-resource Language

Abstract

This work explores fine-tuning OpenAI's Whisper automatic speech recognition(ASR) model for Amharic, a low-resource language, to improve transcriptionaccuracy. While the foundational Whisper model struggles with Amharic due tolimited representation in its training data, we fine-tune it using datasetslike Mozilla Common Voice, FLEURS, and the BDU-speech dataset. Thebest-performing model, Whispersmall-am, significantly improves when finetunedon a mix of existing FLEURS data and new, unseen Amharic datasets. Trainingsolely on new data leads to poor performance, but combining it with FLEURS datareinforces the model, enabling better specialization in Amharic. We alsodemonstrate that normalizing Amharic homophones significantly enhances WordError Rate (WER) and Bilingual Evaluation Understudy (BLEU) scores. This studyunderscores the importance of fine-tuning strategies and dataset compositionfor improving ASR in low-resource languages, providing insights for futureAmharic speech recognition research.