Abstract
Large language models (LLMs) use function calls to interface with externaltools and data source. However, the current approach to LLM function calling isinherently synchronous, where each call blocks LLM inference, limiting LLMoperation and concurrent function execution. In this work, we propose AsyncLM,a system for asynchronous LLM function calling. AsyncLM improves LLM'soperational efficiency by enabling LLMs to generate and execute function callsconcurrently. Instead of waiting for each call's completion, AsyncLM introducesan interrupt mechanism to asynchronously notify the LLM in-flight when functioncalls return. We design an in-context protocol for function calls andinterrupts, provide fine-tuning strategy to adapt LLMs to the interruptsemantics, and implement these mechanisms efficiently on LLM inference process.We demonstrate that AsyncLM can reduce end-to-end task completion latency from1.6x-5.4x compared to synchronous function calling on a set of benchmark tasksin the Berkeley function calling leaderboard (BFCL). Furthermore, we discusshow interrupt mechanisms can be extended to enable novel human-LLM or LLM-LLMinteractions.