When Text Embedding Meets Large Language Model: A Comprehensive Survey

Abstract

Text embedding has become a foundational technology in natural languageprocessing (NLP) during the deep learning era, driving advancements across awide array of downstream tasks. While many natural language understandingchallenges can now be modeled using generative paradigms and leverage therobust generative and comprehension capabilities of large language models(LLMs), numerous practical applications - such as semantic matching,clustering, and information retrieval - continue to rely on text embeddings fortheir efficiency and effectiveness. Therefore, integrating LLMs with textembeddings has become a major research focus in recent years. In this survey,we categorize the interplay between LLMs and text embeddings into threeoverarching themes: (1) LLM-augmented text embedding, enhancing traditionalembedding methods with LLMs; (2) LLMs as text embedders, adapting their innatecapabilities for high-quality embedding; and (3) Text embedding understandingwith LLMs, leveraging LLMs to analyze and interpret embeddings. By organizingrecent works based on interaction patterns rather than specific downstreamapplications, we offer a novel and systematic overview of contributions fromvarious research and application domains in the era of LLMs. Furthermore, wehighlight the unresolved challenges that persisted in the pre-LLM era withpre-trained language models (PLMs) and explore the emerging obstacles broughtforth by LLMs. Building on this analysis, we outline prospective directions forthe evolution of text embedding, addressing both theoretical and practicalopportunities in the rapidly advancing landscape of NLP.