Abstract
Alignment of Large Language models (LLMs) is crucial for safe and trustworthydeployment in applications. Reinforcement learning from human feedback (RLHF)has emerged as an effective technique to align LLMs to human preferences andbroader utilities, but it requires updating billions of model parameters, whichis computationally expensive. Controlled Decoding, by contrast, provides amechanism for aligning a model at inference time without retraining. However,single-agent decoding approaches often struggle to adapt to diverse tasks dueto the complexity and variability inherent in these tasks. To strengthen thetest-time performance w.r.t the target task, we propose a mixture ofagent-based decoding strategies leveraging the existing off-the-shelf alignedLLM policies. Treating each prior policy as an agent in the spirit of mixtureof agent collaboration, we develop a decoding method that allows forinference-time alignment through a token-level selection strategy amongmultiple agents. For each token, the most suitable LLM is dynamically chosenfrom a pool of models based on a long-term utility metric. Thispolicy-switching mechanism ensures optimal model selection at each step,enabling efficient collaboration and alignment among LLMs during decoding.Theoretical analysis of our proposed algorithm establishes optimal performancewith respect to the target task represented via a target reward for the givenoff-the-shelf models. We conduct comprehensive empirical evaluations withopen-source aligned models on diverse tasks and preferences, which demonstratesthe merits of this approach over single-agent decoding baselines. Notably,Collab surpasses the current SoTA decoding strategy, achieving an improvementof up to 1.56x in average reward and 71.89% in GPT-4 based win-tie rate.