Abstract
On-device large language models (LLMs), referring to running LLMs on edgedevices, have raised considerable interest since they are more cost-effective,latency-efficient, and privacy-preserving compared with the cloud paradigm.Nonetheless, the performance of on-device LLMs is intrinsically constrained byresource limitations on edge devices. Sitting between cloud and on-device AI,mobile edge intelligence (MEI) presents a viable solution by provisioning AIcapabilities at the edge of mobile networks, enabling end users to offloadheavy AI computation to capable edge servers nearby. This article provides acontemporary survey on harnessing MEI for LLMs. We begin by illustratingseveral killer applications to demonstrate the urgent need for deploying LLMsat the network edge. Next, we present the preliminaries of LLMs and MEI,followed by resource-efficient LLM techniques. We then present an architecturaloverview of MEI for LLMs (MEI4LLM), outlining its core components and how itsupports the deployment of LLMs. Subsequently, we delve into various aspects ofMEI4LLM, extensively covering edge LLM caching and delivery, edge LLM training,and edge LLM inference. Finally, we identify future research opportunities. Wehope this article inspires researchers in the field to leverage mobile edgecomputing to facilitate LLM deployment, thereby unleashing the potential ofLLMs across various privacy- and delay-sensitive applications.