Rethinking Uncertainty Estimation in Natural Language Generation

Abstract

Large Language Models (LLMs) are increasingly employed in real-worldapplications, driving the need to evaluate the trustworthiness of theirgenerated text. To this end, reliable uncertainty estimation is essential.Since current LLMs generate text autoregressively through a stochastic process,the same prompt can lead to varying outputs. Consequently, leading uncertaintyestimation methods generate and analyze multiple output sequences to determinethe LLM's uncertainty. However, generating output sequences is computationallyexpensive, making these methods impractical at scale. In this work, we inspectthe theoretical foundations of the leading methods and explore new directionsto enhance their computational efficiency. Building on the framework of properscoring rules, we find that the negative log-likelihood of the most likelyoutput sequence constitutes a theoretically grounded uncertainty measure. Toapproximate this alternative measure, we propose G-NLL, which has the advantageof being obtained using only a single output sequence generated by greedydecoding. This makes uncertainty estimation more efficient and straightforward,while preserving theoretical rigor. Empirical results demonstrate that G-NLLachieves state-of-the-art performance across various LLMs and tasks. Our worklays the foundation for efficient and reliable uncertainty estimation innatural language generation, challenging the necessity of more computationallyinvolved methods currently leading the field.