Enhancing zero-shot learning in medical imaging: integrating clip with advanced techniques for improved chest x-ray analysis

Abstract

Due to the large volume of medical imaging data, advanced AI methodologiesare needed to assist radiologists in diagnosing thoracic diseases from chestX-rays (CXRs). Existing deep learning models often require large, labeleddatasets, which are scarce in medical imaging due to the time-consuming andexpert-driven annotation process. In this paper, we extend the existingapproach to enhance zero-shot learning in medical imaging by integratingContrastive Language-Image Pre-training (CLIP) with Momentum Contrast (MoCo),resulting in our proposed model, MoCoCLIP. Our method addresses challengesposed by class-imbalanced and unlabeled datasets, enabling improved detectionof pulmonary pathologies. Experimental results on the NIH ChestXray14 datasetdemonstrate that MoCoCLIP outperforms the state-of-the-art CheXZero model,achieving relative improvement of approximately 6.5%. Furthermore, on theCheXpert dataset, MoCoCLIP demonstrates superior zero-shot performance,achieving an average AUC of 0.750 compared to CheXZero with 0.746 AUC,highlighting its enhanced generalization capabilities on unseen data.