RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines

Abstract

Retrieval-augmented generation (RAG) pipelines have become the de-factoapproach for building AI assistants with access to external, domain-specificknowledge. Given a user query, RAG pipelines typically first retrieve (R)relevant information from external sources, before invoking a Large LanguageModel (LLM), augmented (A) with this information, to generate (G) responses.Modern RAG pipelines frequently chain multiple retrieval and generationcomponents, in any order. However, developing effective RAG pipelines ischallenging because retrieval and generation components are intertwined, makingit hard to identify which component(s) cause errors in the eventual output. Theparameters with the greatest impact on output quality often require hours ofpre-processing after each change, creating prohibitively slow feedback cycles.To address these challenges, we present RAGGY, a developer tool that combines aPython library of composable RAG primitives with an interactive interface forreal-time debugging. We contribute the design and implementation of RAGGY,insights into expert debugging patterns through a qualitative study with 12engineers, and design implications for future RAG tools that better align withdevelopers' natural workflows.