Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

Abstract

This paper studies how AI-assisted programming and large language models(LLM) improve software developers' ability via AI tools (LLM agents) likeGithub Copilot and Amazon CodeWhisperer, while integrating human feedback toenhance reinforcement learning (RLHF) with crowd-sourced computation to enhancetext-to-code generation. Additionally, we demonstrate that our Bayesianoptimization framework supports AI alignment in code generation by distributingthe feedback collection burden, highlighting the value of collecting humanfeedback of good quality. Our empirical evaluations demonstrate the efficacy ofthis approach, showcasing how LLM agents can be effectively trained forimproved text-to-code generation. Our Bayesian optimization framework can bedesigned for general domain-specific languages, promoting the alignment oflarge language model capabilities with human feedback in AI-assistedprogramming for code generation.