DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

Abstract

Dongba pictographs are the only pictographs still in use in the world. Theyhave pictorial ideographic features, and their symbols carry rich cultural andcontextual information. Due to the lack of relevant datasets, existing researchhas difficulty in advancing the study of semantic understanding of Dongbapictographs. To this end, we propose \textbf{DongbaMIE}, the first multimodaldataset for semantic understanding and extraction of Dongba pictographs,consisting of Dongba pictograph images and corresponding Chinese semanticannotations. DongbaMIE contains 23,530 sentence-level and 2,539 paragraph-levelimages, covering four semantic dimensions: objects, actions, relations, andattributes. We systematically evaluate multimodal large language models(MLLMs), such as GPT-4o, Gemini-2.0, and Qwen2-VL. Experimental results showthat best F1 scores of proprietary models, GPT-4o and Gemini, for objectextraction task are only 3.16 and 3.11 respectively. For the open-source modelQwen2-VL, it achieves only 11.49 after supervised fine-tuning. These suggestthat current MLLMs still face significant challenges in accurately recognizingdiverse semantic information in Dongba pictographs.