Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT

Abstract

Automated Program Repair (APR) aims to automatically generate patches forbuggy programs. Recent APR work has been focused on leveraging modern LargeLanguage Models (LLMs) to directly generate patches for APR. Such LLM-based APRtools work by first constructing an input prompt built using the original buggycode and then queries the LLM to generate patches. While the LLM-based APRtools are able to achieve state-of-the-art results, it still follows theclassic Generate and Validate repair paradigm of first generating lots ofpatches and then validating each one afterwards. This not only leads to manyrepeated patches that are incorrect but also miss the crucial information intest failures as well as in plausible patches. To address these limitations, we propose ChatRepair, the first fullyautomated conversation-driven APR approach that interleaves patch generationwith instant feedback to perform APR in a conversational style. ChatRepairfirst feeds the LLM with relevant test failure information to start with, andthen learns from both failures and successes of earlier patching attempts ofthe same bug for more powerful APR. For earlier patches that failed to pass alltests, we combine the incorrect patches with their corresponding relevant testfailure information to construct a new prompt for the LLM to generate the nextpatch. In this way, we can avoid making the same mistakes. For earlier patchesthat passed all the tests, we further ask the LLM to generate alternativevariations of the original plausible patches. In this way, we can further buildon and learn from earlier successes to generate more plausible patches toincrease the chance of having correct patches. While our approach is general,we implement ChatRepair using state-of-the-art dialogue-based LLM -- ChatGPT.By calculating the cost of accessing ChatGPT, we can fix 162 out of 337 bugsfor \$0.42 each!