LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

  • 2024-12-19 18:59:56
  • Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, Limin Wang
  • 0

Abstract

The intuitive nature of drag-based interaction has led to its growingadoption for controlling object trajectories in image-to-video synthesis.Still, existing methods that perform dragging in the 2D space usually faceambiguity when handling out-of-plane movements. In this work, we augment theinteraction with a new dimension, i.e., the depth dimension, such that usersare allowed to assign a relative depth for each point on the trajectory. Thatway, our new interaction paradigm not only inherits the convenience from 2Ddragging, but facilitates trajectory control in the 3D space, broadening thescope of creativity. We propose a pioneering method for 3D trajectory controlin image-to-video synthesis by abstracting object masks into a few clusterpoints. These points, accompanied by the depth information and the instanceinformation, are finally fed into a video diffusion model as the controlsignal. Extensive experiments validate the effectiveness of our approach,dubbed LeviTor, in precisely manipulating the object movements when producingphoto-realistic videos from static images. Project page:https://ppetrichor.github.io/levitor.github.io/