Abstract
The rise of Large Language Models (LLMs) and generative visual analyticssystems has transformed data-driven insights, yet significant challengespersist in accurately interpreting users' analytical and interaction intents.While language inputs offer flexibility, they often lack precision, making theexpression of complex intents inefficient, error-prone, and time-intensive. Toaddress these limitations, we investigate the design space of multimodalinteractions for generative visual analytics through a literature review andpilot brainstorming sessions. Building on these insights, we introduce a highlyextensible workflow that integrates multiple LLM agents for intent inferenceand visualization generation. We develop InterChat, a generative visualanalytics system that combines direct manipulation of visual elements withnatural language inputs. This integration enables precise intent communicationand supports progressive, visually driven exploratory data analyses. Byemploying effective prompt engineering, and contextual interaction linking,alongside intuitive visualization and interaction designs, InterChat bridgesthe gap between user interactions and LLM-driven visualizations, enhancing bothinterpretability and usability. Extensive evaluations, including two usagescenarios, a user study, and expert feedback, demonstrate the effectiveness ofInterChat. Results show significant improvements in the accuracy and efficiencyof handling complex visual analytics tasks, highlighting the potential ofmultimodal interactions to redefine user engagement and analytical depth ingenerative visual analytics.