RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Abstract

Recent advances in robotic foundation models have enabled the development ofgeneralist policies that can adapt to diverse tasks. While these models showimpressive flexibility, their performance heavily depends on the quality oftheir training data. In this work, we propose Reinforcement Learning DistilledGeneralists (RLDG), a method that leverages reinforcement learning to generatehigh-quality training data for finetuning generalist policies. Throughextensive real-world experiments on precise manipulation tasks like connectorinsertion and assembly, we demonstrate that generalist policies trained withRL-generated data consistently outperform those trained with humandemonstrations, achieving up to 40% higher success rates while generalizingbetter to new tasks. We also provide a detailed analysis that reveals thisperformance gain stems from both optimized action distributions and improvedstate coverage. Our results suggest that combining task-specific RL withgeneralist policy distillation offers a promising approach for developing morecapable and efficient robotic manipulation systems that maintain theflexibility of foundation models while achieving the performance of specializedcontrollers. Videos and code can be found on our project websitehttps://generalist-distillation.github.io