Abstract
While text-to-image generative models can synthesize diverse and faithfulcontents, subject variation across multiple creations limits the application inlong content generation. Existing approaches require time-consuming tuning,references for all subjects, or access to other creations. We introduceContrastive Concept Instantiation (CoCoIns) to effectively synthesizeconsistent subjects across multiple independent creations. The frameworkconsists of a generative model and a mapping network, which transforms inputlatent codes into pseudo-words associated with certain instances of concepts.Users can generate consistent subjects with the same latent codes. To constructsuch associations, we propose a contrastive learning approach that trains thenetwork to differentiate the combination of prompts and latent codes. Extensiveevaluations of human faces with a single subject show that CoCoIns performscomparably to existing methods while maintaining higher flexibility. We alsodemonstrate the potential of extending CoCoIns to multiple subjects and otherobject categories.