Abstract
Deep Learning became an ubiquitous paradigm due to its extraordinaryeffectiveness and applicability in numerous domains. However, the approachsuffers from the high demand of data required to achieve the potential of thistype of model. An ever-increasing sub-field of Artificial Intelligence, ImageSynthesis, aims to address this limitation through the design of intelligentmodels capable of creating original and realistic images, endeavour which coulddrastically reduce the need for real data. The Stable Diffusion generationparadigm recently propelled state-of-the-art approaches to exceed all previousbenchmarks. In this work, we propose the ContRail framework based on the novelStable Diffusion model ControlNet, which we empower through a multi-modalconditioning method. We experiment with the task of synthetic railway imagegeneration, where we improve the performance in rail-specific tasks, such asrail semantic segmentation by enriching the dataset with realistic syntheticimages.