Abstract
Hand-eye calibration aims to estimate the transformation between a camera anda robot. Traditional methods rely on fiducial markers, which requireconsiderable manual effort and precise setup. Recent advances in deep learninghave introduced markerless techniques but come with more prerequisites, such asretraining networks for each robot, and accessing accurate mesh models for datageneration. In this paper, we propose Kalib, an automatic and easy-to-setuphand-eye calibration method that leverages the generalizability of visualfoundation models to overcome these challenges. It features only two basicprerequisites, the robot's kinematic chain and a predefined reference point onthe robot. During calibration, the reference point is tracked in the cameraspace. Its corresponding 3D coordinates in the robot coordinate can be inferredby forward kinematics. Then, a PnP solver directly estimates the transformationbetween the camera and the robot without training new networks or accessingmesh models. Evaluations in simulated and real-world benchmarks show that Kalibachieves good accuracy with a lower manual workload compared with recentbaseline methods. We also demonstrate its application in multiple real-worldsettings with various robot arms and grippers. Kalib's user-friendly design andminimal setup requirements make it a possible solution for continuous operationin unstructured environments.