Abstract
Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but relyprimarily on parametric knowledge, limiting factual accuracy. While recentworks equip reinforcement learning (RL)-based LRMs with retrieval capabilities,they suffer from overthinking and lack robustness in reasoning, reducing theireffectiveness in question answering (QA) tasks. To address this, we proposeReaRAG, a factuality-enhanced reasoning model that explores diverse querieswithout excessive iterations. Our solution includes a novel data constructionframework with an upper bound on the reasoning chain length. Specifically, wefirst leverage an LRM to generate deliberate thinking, then select an actionfrom a predefined action space (Search and Finish). For Search action, a queryis executed against the RAG engine, where the result is returned as observationto guide reasoning steps later. This process iterates until a Finish action ischosen. Benefiting from ReaRAG's strong reasoning capabilities, our approachoutperforms existing baselines on multi-hop QA. Further analysis highlights itsstrong reflective ability to recognize errors and refine its reasoningtrajectory. Our study enhances LRMs' factuality while effectively integratingrobust reasoning for Retrieval-Augmented Generation (RAG).