Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Abstract

We introduce Reangle-A-Video, a unified framework for generating synchronizedmulti-view videos from a single input video. Unlike mainstream approaches thattrain multi-view video diffusion models on large-scale 4D datasets, our methodreframes the multi-view video generation task as video-to-videos translation,leveraging publicly available image and video diffusion priors. In essence,Reangle-A-Video operates in two stages. (1) Multi-View Motion Learning: Animage-to-video diffusion transformer is synchronously fine-tuned in aself-supervised manner to distill view-invariant motion from a set of warpedvideos. (2) Multi-View Consistent Image-to-Images Translation: The first frameof the input video is warped and inpainted into various camera perspectivesunder an inference-time cross-view consistency guidance using DUSt3R,generating multi-view consistent starting images. Extensive experiments onstatic view transport and dynamic camera control show that Reangle-A-Videosurpasses existing methods, establishing a new solution for multi-view videogeneration. We will publicly release our code and data. Project page:https://hyeonho99.github.io/reangle-a-video/