Abstract
Training medical image segmentation models for rare yet clinicallysignificant imaging modalities is challenging due to the scarcity of annotateddata, and manual mask annotations can be costly and labor-intensive to acquire.This paper investigates leveraging generative models to synthesize trainingdata, to train segmentation models for underrepresented modalities,particularly on annotation-scarce MRI. Concretely, our contributions arethreefold: (i) we introduce MRGen-DB, a large-scale radiology image-textdataset comprising extensive samples with rich metadata, including modalitylabels, attributes, regions, and organs information, with a subset havingpixelwise mask annotations; (ii) we present MRGen, a diffusion-based dataengine for controllable medical image synthesis, conditioned on text promptsand segmentation masks. MRGen can generate realistic images for diverse MRImodalities lacking mask annotations, facilitating segmentation training inlow-source domains; (iii) extensive experiments across multiple modalitiesdemonstrate that MRGen significantly improves segmentation performance onunannotated modalities by providing high-quality synthetic data. We believethat our method bridges a critical gap in medical image analysis, extendingsegmentation capabilities to scenarios that are challenging to acquire manualannotations.