Abstract
Retrieval Augmented Generation (RAG) has emerged as a powerful application ofLarge Language Models (LLMs), revolutionizing information search andconsumption. RAG systems combine traditional search capabilities with LLMs togenerate comprehensive answers to user queries, ideally with accuratecitations. However, in our experience of developing a RAG product, LLMs oftenstruggle with source attribution, aligning with other industry studiesreporting citation accuracy rates of only about 74% for popular generativesearch engines. To address this, we present efficient post-processingalgorithms to improve citation accuracy in LLM-generated responses, withminimal impact on latency and cost. Our approaches cross-check generatedcitations against retrieved articles using methods including keyword + semanticmatching, fine tuned model with BERTScore, and a lightweight LLM-basedtechnique. Our experimental results demonstrate a relative improvement of15.46% in the overall accuracy metrics of our RAG system. This significantenhancement potentially enables a shift from our current larger language modelto a relatively smaller model that is approximately 12x more cost-effective and3x faster in inference time, while maintaining comparable performance. Thisresearch contributes to enhancing the reliability and trustworthiness ofAI-generated content in information retrieval and summarization tasks which iscritical to gain customer trust especially in commercial products.