Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Abstract

LLMs are an integral component of retrieval-augmented generation (RAG)systems. While many studies focus on evaluating the overall quality ofend-to-end RAG systems, there is a gap in understanding the appropriateness ofLLMs for the RAG task. To address this, we introduce Trust-Score, a holisticmetric that evaluates the trustworthiness of LLMs within the RAG framework. Ourresults show that various prompting methods, such as in-context learning, failto effectively adapt LLMs to the RAG task as measured by Trust-Score.Consequently, we propose Trust-Align, a method to align LLMs for improvedTrust-Score performance. 26 out of 27 models aligned using Trust-Alignsubstantially outperform competitive baselines on ASQA, QAMPARI, and ELI5.Specifically, in LLaMA-3-8b, Trust-Align outperforms FRONT on ASQA (up 12.56),QAMPARI (up 36.04), and ELI5 (up 17.69). Trust-Align also significantlyenhances models' ability to correctly refuse and provide quality citations. Wealso demonstrate the effectiveness of Trust-Align across different open-weightmodels, including the LLaMA series (1b to 8b), Qwen-2.5 series (0.5b to 7b),and Phi3.5 (3.8b). We release our code athttps://github.com/declare-lab/trust-align.