Can Test-Time Scaling Improve World Foundation Model?

Abstract

World foundation models, which simulate the physical world by predictingfuture states from current observations and inputs, have become central to manyapplications in physical intelligence, including autonomous driving androbotics. However, these models require substantial computational resources forpretraining and are further constrained by available data during post-training.As such, scaling computation at test time emerges as both a critical andpractical alternative to traditional model enlargement or re-training. In thiswork, we introduce SWIFT, a test-time scaling framework tailored for WFMs.SWIFT integrates our extensible WFM evaluation toolkit with process-levelinference strategies, including fast tokenization, probability-based Top-Kpruning, and efficient beam search. Empirical results on the COSMOS modeldemonstrate that test-time scaling exists even in a compute-optimal way. Ourfindings reveal that test-time scaling laws hold for WFMs and that SWIFTprovides a scalable and effective pathway for improving WFM inference withoutretraining or increasing model size. The code is available athttps://github.com/Mia-Cong/SWIFT.git.