FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

  • 2024-12-19 18:59:31
  • Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
  • 0

Abstract

Autoregressive (AR) modeling has achieved remarkable success in naturallanguage processing by enabling models to generate text with coherence andcontextual understanding through next token prediction. Recently, in imagegeneration, VAR proposes scale-wise autoregressive modeling, which extends thenext token prediction to the next scale prediction, preserving the 2D structureof images. However, VAR encounters two primary challenges: (1) its complex andrigid scale design limits generalization in next scale prediction, and (2) thegenerator's dependence on a discrete tokenizer with the same complex scalestructure restricts modularity and flexibility in updating the tokenizer. Toaddress these limitations, we introduce FlowAR, a general next scale predictionmethod featuring a streamlined scale design, where each subsequent scale issimply double the previous one. This eliminates the need for VAR's intricatemulti-scale residual tokenizer and enables the use of any off-the-shelfVariational AutoEncoder (VAE). Our simplified design enhances generalization innext scale prediction and facilitates the integration of Flow Matching forhigh-quality image synthesis. We validate the effectiveness of FlowAR on thechallenging ImageNet-256 benchmark, demonstrating superior generationperformance compared to previous methods. Codes will be available at\url{https://github.com/OliverRensu/FlowAR}.