ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

  • 2025-03-19 11:46:58
  • Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Wei Li, Shufei Zhang, Mao Su, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou
  • 0

Abstract

Large Language Models (LLMs) have achieved remarkable success and have beenapplied across various scientific fields, including chemistry. However, manychemical tasks require the processing of visual information, which cannot besuccessfully handled by existing chemical LLMs. This brings a growing need formodels capable of integrating multimodal information in the chemical domain. Inthis paper, we introduce \textbf{ChemVLM}, an open-source chemical multimodallarge language model specifically designed for chemical applications. ChemVLMis trained on a carefully curated bilingual multimodal dataset that enhancesits ability to understand both textual and visual chemical information,including molecular structures, reactions, and chemistry examination questions.We develop three datasets for comprehensive evaluation, tailored to ChemicalOptical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), andMultimodal Molecule Understanding tasks. We benchmark ChemVLM against a rangeof open-source and proprietary multimodal large language models on varioustasks. Experimental results demonstrate that ChemVLM achieves competitiveperformance across all evaluated tasks. Our model can be found athttps://huggingface.co/AI4Chem/ChemVLM-26B.