Personalization Toolkit: Training Free Personalization of Large Vision Language Models

  • 2025-03-24 12:34:02
  • Soroush Seifi, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf Aljundi
  • 0

Abstract

Large Vision Language Models (LVLMs) have significant potential to providepersonalized assistance by adapting to the unique needs and preferences ofindividual users. The personalization of LVLMs has emerged as a field thatfocuses on customizing models to recognize specific object instances andprovide tailored responses. However, current methodologies depend ontime-consuming test-time training for each user and object, which proves to beimpractical. This paper introduces a novel, training-free approach to LVLMpersonalization by leveraging pre-trained vision foundation models to extractdistinct features, retrieval-augmented generation (RAG) techniques to recognizeinstances in the visual input, and visual prompting methods. Our model-agnosticvision toolkit enables flexible and efficient personalization without the needfor extensive retraining. We demonstrate state-of-the-art results, surpassingconventional training-based approaches, and set a new benchmark for LVLMpersonalization.