Abstract
In the face of difficult exploration problems in reinforcement learning, westudy whether giving an agent an object-centric mapping (describing a set ofitems and their attributes) allow for more efficient learning. We found thisproblem is best solved hierarchically by modelling items at a higher level ofstate abstraction to pixels, and attribute change at a higher level of temporalabstraction to primitive actions. This abstraction simplifies the transitiondynamic by making specific future states easier to predict. We make use of thisto propose a fully model-based algorithm that learns a discriminative worldmodel, plans to explore efficiently with only a count-based intrinsic reward,and can subsequently plan to reach any discovered (abstract) states. We demonstrate the model's ability to (i) efficiently solve single tasks,(ii) transfer zero-shot and few-shot across item types and environments, and(iii) plan across long horizons. Across a suite of 2D crafting and MiniHackenvironments, we empirically show our model significantly out-performsstate-of-the-art low-level methods (without abstraction), as well as performantmodel-free and model-based methods using the same abstraction. Finally, we showhow to learn low level object-perturbing policies via reinforcement learning,and the object mapping itself by supervised learning.