Abstract
Multimodal sentiment analysis (MSA) is an emerging research topic that aimsto understand and recognize human sentiment or emotions through multiplemodalities. However, in real-world dynamic scenarios, the distribution oftarget data is always changing and different from the source data used to trainthe model, which leads to performance degradation. Common adaptation methodsusually need source data, which could pose privacy issues or storage overheads.Therefore, test-time adaptation (TTA) methods are introduced to improve theperformance of the model at inference time. Existing TTA methods are alwaysbased on probabilistic models and unimodal learning, and thus can not beapplied to MSA which is often considered as a multimodal regression task. Inthis paper, we propose two strategies: Contrastive Adaptation and StablePseudo-label generation (CASP) for test-time adaptation for multimodalsentiment analysis. The two strategies deal with the distribution shifts forMSA by enforcing consistency and minimizing empirical risk, respectively.Extensive experiments show that CASP brings significant and consistentimprovements to the performance of the model across various distribution shiftsettings and with different backbones, demonstrating its effectiveness andversatility. Our codes are available at https://github.com/zrguo/CASP.