Abstract
With the rapid development of artificial intelligence technologies andwearable devices, egocentric vision understanding has emerged as a new andchallenging research direction, gradually attracting widespread attention fromboth academia and industry. Egocentric vision captures visual and multimodaldata through cameras or sensors worn on the human body, offering a uniqueperspective that simulates human visual experiences. This paper provides acomprehensive survey of the research on egocentric vision understanding,systematically analyzing the components of egocentric scenes and categorizingthe tasks into four main areas: subject understanding, object understanding,environment understanding, and hybrid understanding. We explore in detail thesub-tasks within each category. We also summarize the main challenges andtrends currently existing in the field. Furthermore, this paper presents anoverview of high-quality egocentric vision datasets, offering valuableresources for future research. By summarizing the latest advancements, weanticipate the broad applications of egocentric vision technologies in fieldssuch as augmented reality, virtual reality, and embodied intelligence, andpropose future research directions based on the latest developments in thefield.