UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

  • 2024-12-19 18:59:58
  • Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
  • 0

Abstract

We propose an unsupervised model for instruction-based image editing thateliminates the need for ground-truth edited images during training. Existingsupervised methods depend on datasets containing triplets of input image,edited image, and edit instruction. These are generated by either existingediting methods or human-annotations, which introduce biases and limit theirgeneralization ability. Our method addresses these challenges by introducing anovel editing mechanism called Cycle Edit Consistency (CEC), which appliesforward and backward edits in one training step and enforces consistency inimage and attention spaces. This allows us to bypass the need for ground-truthedited images and unlock training for the first time on datasets comprisingeither real image-caption pairs or image-caption-edit triplets. We empiricallyshow that our unsupervised technique performs better across a broader range ofedits with high fidelity and precision. By eliminating the need forpre-existing datasets of triplets, reducing biases associated with supervisedmethods, and proposing CEC, our work represents a significant advancement inunblocking scaling of instruction-based image editing.