Personalization Toolkit: Training Free Personalization of Large Vision Language Models

Abstract

Large Vision Language Models (LVLMs) have significant potential to providepersonalized assistance by adapting to the unique needs and preferences ofindividual users. The personalization of LVLMs has emerged as a field thatfocuses on customizing models to recognize specific object instances andprovide tailored responses. However, current methodologies depend ontime-consuming test-time training for each user and object, which proves to beimpractical. This paper introduces a novel, training-free approach to LVLMpersonalization by leveraging pre-trained vision foundation models to extractdistinct features, retrieval-augmented generation (RAG) techniques to recognizeinstances in the visual input, and visual prompting methods. Our model-agnosticvision toolkit enables flexible and efficient personalization without the needfor extensive retraining. We demonstrate state-of-the-art results, surpassingconventional training-based approaches, and set a new benchmark for LVLMpersonalization.