Abstract
Retrieval-augmented generation (RAG) empowers large language models to accessexternal and private corpus, enabling factually consistent responses inspecific domains. By exploiting the inherent structure of the corpus,graph-based RAG methods further enrich this process by building a knowledgegraph index and leveraging the structural nature of graphs. However, currentgraph-based RAG approaches seldom prioritize the design of graph structures.Inadequately designed graph not only impede the seamless integration of diversegraph algorithms but also result in workflow inconsistencies and degradedperformance. To further unleash the potential of graph for RAG, we proposeNodeRAG, a graph-centric framework introducing heterogeneous graph structuresthat enable the seamless and holistic integration of graph-based methodologiesinto the RAG workflow. By aligning closely with the capabilities of LLMs, thisframework ensures a fully cohesive and efficient end-to-end process. Throughextensive experiments, we demonstrate that NodeRAG exhibits performanceadvantages over previous methods, including GraphRAG and LightRAG, not only inindexing time, query time, and storage efficiency but also in deliveringsuperior question-answering performance on multi-hop benchmarks and open-endedhead-to-head evaluations with minimal retrieval tokens. Our GitHub repositorycould be seen at https://github.com/Terry-Xu-666/NodeRAG.