WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Abstract

Can we build accurate world models out of large language models (LLMs)? Howcan world models benefit LLM agents? The gap between the prior knowledge ofLLMs and the specified environment's dynamics usually bottlenecks LLMs'performance as world models. To bridge the gap, we propose a training-free"world alignment" that learns an environment's symbolic knowledge complementaryto LLMs. The symbolic knowledge covers action rules, knowledge graphs, andscene graphs, which are extracted by LLMs from exploration trajectories andencoded into executable codes to regulate LLM agents' policies. We furtherpropose an RL-free, model-based agent "WALL-E 2.0" through the model-predictivecontrol (MPC) framework. Unlike classical MPC requiring costly optimization onthe fly, we adopt an LLM agent as an efficient look-ahead optimizer of futuresteps' actions by interacting with the neurosymbolic world model. While the LLMagent's strong heuristics make it an efficient planner in MPC, the quality ofits planned actions is also secured by the accurate predictions of the alignedworld model. They together considerably improve learning efficiency in a newenvironment. On open-world challenges in Mars (Minecraft like) and ALFWorld(embodied indoor environments), WALL-E 2.0 significantly outperforms existingmethods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate andby at least 61.7% in score. In ALFWorld, it achieves a new record 98% successrate after only 4 iterations.