Abstract
Document structure analysis, aka document layout analysis, is crucial forunderstanding both the physical layout and logical structure of documents,serving information retrieval, document summarization, knowledge extraction,etc. Hierarchical Document Structure Analysis (HDSA) specifically aims torestore the hierarchical structure of documents created using authoringsoftware with hierarchical schemas. Previous research has primarily followedtwo approaches: one focuses on tackling specific subtasks of HDSA in isolation,such as table detection or reading order prediction, while the other adopts aunified framework that uses multiple branches or modules, each designed toaddress a distinct task. In this work, we propose a unified relation predictionapproach for HDSA, called UniHDSA, which treats various HDSA sub-tasks asrelation prediction problems and consolidates relation prediction labels into aunified label space. This allows a single relation prediction module to handlemultiple tasks simultaneously, whether at a page-level or document-levelstructure analysis. To validate the effectiveness of UniHDSA, we develop amultimodal end-to-end system based on Transformer architectures. Extensiveexperimental results demonstrate that our approach achieves state-of-the-artperformance on a hierarchical document structure analysis benchmark,Comp-HRDoc, and competitive results on a large-scale document layout analysisdataset, DocLayNet, effectively illustrating the superiority of our methodacross all sub-tasks.