A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Abstract

Mathematical reasoning, a core aspect of human cognition, is vital acrossmany domains, from educational problem-solving to scientific advancements. Asartificial general intelligence (AGI) progresses, integrating large languagemodels (LLMs) with mathematical reasoning tasks is becoming increasinglysignificant. This survey provides the first comprehensive analysis ofmathematical reasoning in the era of multimodal large language models (MLLMs).We review over 200 studies published since 2021, and examine thestate-of-the-art developments in Math-LLMs, with a focus on multimodalsettings. We categorize the field into three dimensions: benchmarks,methodologies, and challenges. In particular, we explore multimodalmathematical reasoning pipeline, as well as the role of (M)LLMs and theassociated methodologies. Finally, we identify five major challenges hinderingthe realization of AGI in this domain, offering insights into the futuredirection for enhancing multimodal reasoning capabilities. This survey servesas a critical resource for the research community in advancing the capabilitiesof LLMs to tackle complex multimodal reasoning tasks.