QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?

Abstract

Credit assignment has remained a fundamental challenge in multi-agentreinforcement learning (MARL). Previous studies have primarily addressed thisissue through value decomposition methods under the centralized training withdecentralized execution paradigm, where neural networks are utilized toapproximate the nonlinear relationship between individual Q-values and theglobal Q-value. Although these approaches have achieved considerable success invarious benchmark tasks, they still suffer from several limitations, includingimprecise attribution of contributions, limited interpretability, and poorscalability in high-dimensional state spaces. To address these challenges, wepropose a novel algorithm, \textbf{QLLM}, which facilitates the automaticconstruction of credit assignment functions using large language models (LLMs).Specifically, the concept of \textbf{TFCAF} is introduced, wherein the creditallocation process is represented as a direct and expressive nonlinearfunctional formulation. A custom-designed \textit{coder-evaluator} framework isfurther employed to guide the generation, verification, and refinement ofexecutable code by LLMs, significantly mitigating issues such as hallucinationand shallow reasoning during inference. Extensive experiments conducted onseveral standard MARL benchmarks demonstrate that the proposed methodconsistently outperforms existing state-of-the-art baselines. Moreover, QLLMexhibits strong generalization capability and maintains compatibility with awide range of MARL algorithms that utilize mixing networks, positioning it as apromising and versatile solution for complex multi-agent scenarios.