Revisiting the Plastic Surgery Hypothesis via Large Language Models

  • 2024-12-09 20:43:57
  • Chunqiu Steven Xia, Yifeng Ding, Lingming Zhang
  • 0

Abstract

Automated Program Repair (APR) aspires to automatically generate patches foran input buggy program. Traditional APR tools typically focus on specific bugtypes and fixes through the use of templates, heuristics, and formalspecifications. However, these techniques are limited in terms of the bug typesand patch variety they can produce. As such, researchers have designed variouslearning-based APR tools with recent work focused on directly using LargeLanguage Models (LLMs) for APR. While LLM-based APR tools are able to achievestate-of-the-art performance on many repair datasets, the LLMs used for directrepair are not fully aware of the project-specific information such as uniquevariable or method names. The plastic surgery hypothesis is a well-known insight for APR, which statesthat the code ingredients to fix the bug usually already exist within the sameproject. Traditional APR tools have largely leveraged the plastic surgeryhypothesis by designing manual or heuristic-based approaches to exploit suchexisting code ingredients. However, as recent APR research starts focusing onLLM-based approaches, the plastic surgery hypothesis has been largely ignored.In this paper, we ask the following question: How useful is the plastic surgeryhypothesis in the era of LLMs? Interestingly, LLM-based APR presents a uniqueopportunity to fully automate the plastic surgery hypothesis via fine-tuningand prompting. To this end, we propose FitRepair, which combines the directusage of LLMs with two domain-specific fine-tuning strategies and one promptingstrategy for more powerful APR. Our experiments on the widely studied Defects4j1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs (substantiallyoutperforming the best-performing baseline by 15 and 8), respectively,demonstrating a promising future of the plastic surgery hypothesis in the eraof LLMs.