Abstract
The complexity and variability inherent in high-resolution pathologicalimages present significant challenges in computational pathology. Whilepathology foundation models leveraging AI have catalyzed transformativeadvancements, their development demands large-scale datasets, considerablestorage capacity, and substantial computational resources. Furthermore,ensuring their clinical applicability and generalizability requires rigorousvalidation across a broad spectrum of clinical tasks. Here, we presentPathOrchestra, a versatile pathology foundation model trained viaself-supervised learning on a dataset comprising 300K pathological slides from20 tissue and organ types across multiple centers. The model was rigorouslyevaluated on 112 clinical tasks using a combination of 61 private and 51 publicdatasets. These tasks encompass digital slide preprocessing, pan-cancerclassification, lesion identification, multi-cancer subtype classification,biomarker assessment, gene expression prediction, and the generation ofstructured reports. PathOrchestra demonstrated exceptional performance across27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks,including pan-cancer classification across various organs, lymphoma subtypediagnosis, and bladder cancer screening. Notably, it is the first model togenerate structured reports for high-incidence colorectal cancer anddiagnostically complex lymphoma-areas that are infrequently addressed byfoundational models but hold immense clinical potential. Overall, PathOrchestraexemplifies the feasibility and efficacy of a large-scale, self-supervisedpathology foundation model, validated across a broad range of clinical-gradetasks. Its high accuracy and reduced reliance on extensive data annotationunderline its potential for clinical integration, offering a pathway towardmore efficient and high-quality medical services.