Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Abstract

Large real-world robot datasets hold great potential to train generalistrobot models, but scaling real-world human data collection is time-consumingand resource-intensive. Simulation has great potential in supplementinglarge-scale data, especially with recent advances in generative AI andautomated data generation tools that enable scalable creation of robot behaviordatasets. However, training a policy solely in simulation and transferring itto the real world often demands substantial human effort to bridge the realitygap. A compelling alternative is to co-train the policy on a mixture ofsimulation and real-world datasets. Preliminary studies have recently shownthis strategy to substantially improve the performance of a policy over onetrained on a limited amount of real-world data. Nonetheless, the communitylacks a systematic understanding of sim-and-real co-training and what it takesto reap the benefits of simulation data for real-robot learning. This workpresents a simple yet effective recipe for utilizing simulation data to solvevision-based robotic manipulation tasks. We derive this recipe fromcomprehensive experiments that validate the co-training strategy on varioussimulation and real-world datasets. Using two domains--a robot arm and ahumanoid--across diverse tasks, we demonstrate that simulation data can enhancereal-world task performance by an average of 38%, even with notable differencesbetween the simulation and real-world data. Videos and additional results canbe found at https://co-training.github.io/