Unanswerability Evaluation for Retrieval Augmented Generation

Abstract

Existing evaluation frameworks for retrieval-augmented generation (RAG)systems focus on answerable queries, but they overlook the importance ofappropriately rejecting unanswerable requests. In this paper, we introduceUAEval4RAG, a framework designed to evaluate whether RAG systems can handleunanswerable queries effectively. We define a taxonomy with six unanswerablecategories, and UAEval4RAG automatically synthesizes diverse and challengingqueries for any given knowledge base with unanswered ratio and acceptable ratiometrics. We conduct experiments with various RAG components, includingretrieval models, rewriting methods, rerankers, language models, and promptingstrategies, and reveal hidden trade-offs in performance of RAG systems. Ourfindings highlight the critical role of component selection and prompt designin optimizing RAG systems to balance the accuracy of answerable queries withhigh rejection rates of unanswerable ones. UAEval4RAG provides valuableinsights and tools for developing more robust and reliable RAG systems.