Abstract
Engineering design is undergoing a transformative shift with the advent ofAI, marking a new era in how we approach product, system, and service planning.Large language models have demonstrated impressive capabilities in enablingthis shift. Yet, with text as their only input modality, they cannot leveragethe large body of visual artifacts that engineers have used for centuries andare accustomed to. This gap is addressed with the release of multimodalvision-language models (VLMs), such as GPT-4V, enabling AI to impact many moretypes of tasks. Our work presents a comprehensive evaluation of VLMs across aspectrum of engineering design tasks, categorized into four main areas:Conceptual Design, System-Level and Detailed Design, Manufacturing andInspection, and Engineering Education Tasks. Specifically in this paper, weassess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design taskssuch as sketch similarity analysis, CAD generation, topology optimization,manufacturability assessment, and engineering textbook problems. Through thisstructured evaluation, we not only explore VLMs' proficiency in handlingcomplex design challenges but also identify their limitations in complexengineering design applications. Our research establishes a foundation forfuture assessments of vision language models. It also contributes a set ofbenchmark testing datasets, with more than 1000 queries, for ongoingadvancements and applications in this field.