Abstract
In modern chip design, placement aims at placing millions of circuit modules,which is an essential step that significantly influences power, performance,and area (PPA) metrics. Recently, reinforcement learning (RL) has emerged as apromising technique for improving placement quality, especially macroplacement. However, current RL-based placement methods suffer from longtraining times, low generalization ability, and inability to guarantee PPAresults. A key issue lies in the problem formulation, i.e., using RL to placefrom scratch, which results in limits useful information and inaccurate rewardsduring the training process. In this work, we propose an approach that utilizesRL for the refinement stage, which allows the RL policy to learn how to adjustexisting placement layouts, thereby receiving sufficient information for thepolicy to act and obtain relatively dense and precise rewards. Additionally, weintroduce the concept of regularity during training, which is considered animportant metric in the chip design industry but is often overlooked in currentRL placement methods. We evaluate our approach on the ISPD 2005 and ICCAD 2015benchmark, comparing the global half-perimeter wirelength and regularity of ourproposed method against several competitive approaches. Besides, we test thePPA performance using commercial software, showing that RL as a regulator canachieve significant PPA improvements. Our RL regulator can fine-tune placementsfrom any method and enhance their quality. Our work opens up new possibilitiesfor the application of RL in placement, providing a more effective andefficient approach to optimizing chip design. Our code is available at\url{https://github.com/lamda-bbo/macro-regulator}.