Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

  • 2025-03-24 15:50:20
  • Augusto B. CorrĂȘa, AndrĂ© G. Pereira, Jendrik Seipp
  • 0

Abstract

In recent years, large language models (LLMs) have shown remarkablecapabilities in various artificial intelligence problems. However, they fail toplan reliably, even when prompted with a detailed definition of the planningtask. Attempts to improve their planning capabilities, such as chain-of-thoughtprompting, fine-tuning, and explicit "reasoning" still yield incorrect plansand usually fail to generalize to larger tasks. In this paper, we show how touse LLMs to generate correct plans, even for out-of-distribution tasks ofincreasing size. For a given planning domain, we ask an LLM to generate severaldomain-dependent heuristic functions in the form of Python code, evaluate themon a set of training tasks within a greedy best-first search, and choose thestrongest one. The resulting LLM-generated heuristics solve many more unseentest tasks than state-of-the-art domain-independent heuristics for classicalplanning. They are even competitive with the strongest learning algorithm fordomain-dependent planning. These findings are especially remarkable given thatour proof-of-concept implementation is based on an unoptimized Python plannerand the baselines all build upon highly optimized C++ code. In some domains,the LLM-generated heuristics expand fewer states than the baselines, revealingthat they are not only efficiently computable, but sometimes even moreinformative than the state-of-the-art heuristics. Overall, our results showthat sampling a set of planning heuristic function programs can significantlyimprove the planning capabilities of LLMs.