Abstract
Ensemble reasoning for the strengths of different LLM experts is critical toachieving consistent and satisfactory performance on diverse inputs across awide range of tasks. However, existing LLM ensemble methods are eithercomputationally intensive or incapable of leveraging complementary knowledgeamong LLM experts for various inputs. In this paper, we propose a DynamicEnsemble Reasoning paradigm, called DER to integrate the strengths of multipleLLM experts conditioned on dynamic inputs. Specifically, we model the LLMensemble reasoning problem as a Markov Decision Process (MDP), wherein an agentsequentially takes inputs to request knowledge from an LLM candidate and passesthe output to a subsequent LLM candidate. Moreover, we devise a reward functionto train a DER-Agent to dynamically select an optimal answering route given theinput questions, aiming to achieve the highest performance with as fewcomputational resources as possible. Last, to fully transfer the expertknowledge from the prior LLMs, we develop a Knowledge Transfer Prompt (KTP)that enables the subsequent LLM candidates to transfer complementary knowledgeeffectively. Experiments demonstrate that our method uses fewer computationalresources to achieve better performance compared to state-of-the-art baselines.