o1-Coder: an o1 Replication for Coding

Abstract

The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1model with a focus on coding tasks. It integrates reinforcement learning (RL)and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinkingcapabilities. The framework includes training a Test Case Generator (TCG) forstandardized code testing, using MCTS to generate code data with reasoningprocesses, and iteratively fine-tuning the policy model to initially producepseudocode and then generate the full code. The report also addresses theopportunities and challenges in deploying o1-like models in real-worldapplications, suggesting transitioning to the System-2 paradigm andhighlighting the imperative for world model construction. Updated modelprogress and experimental results will be reported in subsequent versions. Allsource code, curated datasets, as well as the derived models are disclosed athttps://github.com/ADaM-BJTU/O1-CODER .