VoiceBench: Benchmarking LLM-Based Voice Assistants

  • 2024-12-11 15:45:21
  • Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li
  • 0

Abstract

Building on the success of large language models (LLMs), recent advancementssuch as GPT-4o have enabled real-time speech interactions through LLM-basedvoice assistants, offering a significantly improved user experience compared totraditional text-based interactions. However, the absence of benchmarksdesigned to evaluate these speech interaction capabilities has hinderedprogress of LLM-based voice assistants development. Current evaluations focusprimarily on automatic speech recognition (ASR) or general knowledge evaluationwith clean speeches, neglecting the more intricate, real-world scenarios thatinvolve diverse speaker characteristics, environmental and content factors. Toaddress this, we introduce VoiceBench, the first benchmark designed to providea multi-faceted evaluation of LLM-based voice assistants. VoiceBench alsoincludes both real and synthetic spoken instructions that incorporate the abovethree key real-world variations. Extensive experiments reveal the limitationsof current LLM-based voice assistant models and offer valuable insights forfuture research and development in this field.