Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Abstract

Co-clustering simultaneously clusters rows and columns, revealing morefine-grained groups. However, existing co-clustering methods suffer from poorscalability and cannot handle large-scale data. This paper presents a novel andscalable co-clustering method designed to uncover intricate patterns inhigh-dimensional, large-scale datasets. Specifically, we first propose a largematrix partitioning algorithm that partitions a large matrix into smallersubmatrices, enabling parallel co-clustering. This method employs aprobabilistic model to optimize the configuration of submatrices, balancing thecomputational efficiency and depth of analysis. Additionally, we propose ahierarchical co-cluster merging algorithm that efficiently identifies andmerges co-clusters from these submatrices, enhancing the robustness andreliability of the process. Extensive evaluations validate the effectivenessand efficiency of our method. Experimental results demonstrate a significantreduction in computation time, with an approximate 83% decrease for densematrices and up to 30% for sparse matrices.