Abstract
Retrieval-augmented generation (RAG) mitigates hallucination in LargeLanguage Models (LLMs) by using query pipelines to retrieve relevant externalinformation and grounding responses in retrieved knowledge. However, querypipeline optimization for cancer patient question-answering (CPQA) systemsrequires separately optimizing multiple components with domain-specificconsiderations. We propose a novel three-aspect optimization approach for theRAG query pipeline in CPQA systems, utilizing public biomedical databases likePubMed and PubMed Central. Our optimization includes: (1) document retrieval,utilizing a comparative analysis of NCBI resources and introducing HybridSemantic Real-time Document Retrieval (HSRDR); (2) passage retrieval,identifying optimal pairings of dense retrievers and rerankers; and (3)semantic representation, introducing Semantic Enhanced Overlap Segmentation(SEOS) for improved contextual understanding. On a custom-developed datasettailored for cancer-related inquiries, our optimized RAG approach improved theanswer accuracy of Claude-3-haiku by 5.24% over chain-of-thought prompting andabout 3% over a naive RAG setup. This study highlights the importance ofdomain-specific query optimization in realizing the full potential of RAG andprovides a robust framework for building more accurate and reliable CPQAsystems, advancing the development of RAG-based biomedical systems.