Abstract
Limited accessibility to neurological care leads to underdiagnosedParkinson's Disease (PD), preventing early intervention. Existing AI-based PDdetection methods primarily focus on unimodal analysis of motor or speechtasks, overlooking the multifaceted nature of the disease. To address this, weintroduce a large-scale, multi-task video dataset consisting of 1102 sessions(each containing videos of finger tapping, facial expression, and speech taskscaptured via webcam) from 845 participants (272 with PD). We propose a novelUncertainty-calibrated Fusion Network (UFNet) that leverages this multimodaldata to enhance diagnostic accuracy. UFNet employs independent task-specificnetworks, trained with Monte Carlo Dropout for uncertainty quantification,followed by self-attended fusion of features, with attention weightsdynamically adjusted based on task-specific uncertainties. To ensurepatient-centered evaluation, the participants were randomly split into threesets: 60% for training, 20% for model selection, and 20% for final performanceevaluation. UFNet significantly outperformed single-task models in terms ofaccuracy, area under the ROC curve (AUROC), and sensitivity while maintainingnon-inferior specificity. Withholding uncertain predictions further boosted theperformance, achieving 88.0+-0.3%$ accuracy, 93.0+-0.2% AUROC, 79.3+-0.9%sensitivity, and 92.6+-0.3% specificity, at the expense of not being able topredict for 2.3+-0.3% data (+- denotes 95% confidence interval). Furtheranalysis suggests that the trained model does not exhibit any detectable biasacross sex and ethnic subgroups and is most effective for individuals agedbetween 50 and 80. Requiring only a webcam and microphone, our approachfacilitates accessible home-based PD screening, especially in regions withlimited healthcare resources.