Abstract
With the rapid development of multimedia, the shift from unimodal textualsentiment analysis to multimodal image-text sentiment analysis has obtainedacademic and industrial attention in recent years. However, multimodalsentiment analysis is affected by unimodal data bias, e.g., text sentiment ismisleading due to explicit sentiment semantic, leading to low accuracy in thefinal sentiment classification. In this paper, we propose a novelCounterFactual Multimodal Sentiment Analysis framework (CF-MSA) using causalcounterfactual inference to construct multimodal sentiment causal inference.CF-MSA mitigates the direct effect from unimodal bias and ensures heterogeneityacross modalities by differentiating the treatment variables betweenmodalities. In addition, considering the information complementarity and biasdifferences between modalities, we propose a new optimisation objective toeffectively integrate different modalities and reduce the inherent bias fromeach modality. Experimental results on two public datasets, MVSA-Single andMVSA-Multiple, demonstrate that the proposed CF-MSA has superior debiasingcapability and achieves new state-of-the-art performances. We will release thecode and datasets to facilitate future research.