Abstract
We introduce EXIT, an extractive context compression framework that enhancesboth the effectiveness and efficiency of retrieval-augmented generation (RAG)in question answering (QA). Current RAG systems often struggle when retrievalmodels fail to rank the most relevant documents, leading to the inclusion ofmore context at the expense of latency and accuracy. While abstractivecompression methods can drastically reduce token counts, their token-by-tokengeneration process significantly increases end-to-end latency. Conversely,existing extractive methods reduce latency but rely on independent,non-adaptive sentence selection, failing to fully utilize contextualinformation. EXIT addresses these limitations by classifying sentences fromretrieved documents - while preserving their contextual dependencies - enablingparallelizable, context-aware extraction that adapts to query complexity andretrieval quality. Our evaluations on both single-hop and multi-hop QA tasksshow that EXIT consistently surpasses existing compression methods and evenuncompressed baselines in QA accuracy, while also delivering substantialreductions in inference time and token count. By improving both effectivenessand efficiency, EXIT provides a promising direction for developing scalable,high-quality QA solutions in RAG pipelines. Our code is available athttps://github.com/ThisIsHwang/EXIT