Abstract
3D single object tracking is essential in autonomous driving and robotics.Existing methods often struggle with sparse and incomplete point cloudscenarios. To address these limitations, we propose a Multimodal-guided VirtualCues Projection (MVCP) scheme that generates virtual cues to enrich sparsepoint clouds. Additionally, we introduce an enhanced tracker MVCTrack based onthe generated virtual cues. Specifically, the MVCP scheme seamlessly integratesRGB sensors into LiDAR-based systems, leveraging a set of 2D detections tocreate dense 3D virtual cues that significantly improve the sparsity of pointclouds. These virtual cues can naturally integrate with existing LiDAR-based 3Dtrackers, yielding substantial performance gains. Extensive experimentsdemonstrate that our method achieves competitive performance on the NuScenesdataset.