Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Abstract

Despite the impressive capabilities of Large Language Models (LLMs) onvarious tasks, they still struggle with scenarios that involves complexreasoning and planning. Recent work proposed advanced prompting techniques andthe necessity of fine-tuning with high-quality data to augment LLMs' reasoningabilities. However, these approaches are inherently constrained by dataavailability and quality. In light of this, self-correction and self-learningemerge as viable solutions, employing strategies that allow LLMs to refinetheir outputs and learn from self-assessed rewards. Yet, the efficacy of LLMsin self-refining its response, particularly in complex reasoning and planningtask, remains dubious. In this paper, we introduce AlphaLLM for theself-improvements of LLMs, which integrates Monte Carlo Tree Search (MCTS) withLLMs to establish a self-improving loop, thereby enhancing the capabilities ofLLMs without additional annotations. Drawing inspiration from the success ofAlphaGo, AlphaLLM addresses the unique challenges of combining MCTS with LLMfor self-improvement, including data scarcity, the vastness search spaces oflanguage tasks, and the subjective nature of feedback in language tasks.AlphaLLM is comprised of prompt synthesis component, an efficient MCTS approachtailored for language tasks, and a trio of critic models for precise feedback.Our experimental results in mathematical reasoning tasks demonstrate thatAlphaLLM significantly enhances the performance of LLMs without additionalannotations, showing the potential for self-improvement in LLMs.