Abstract
Large Language Models (LLMs) have demonstrated remarkable performance acrossvarious tasks. A promising but largely under-explored area is their potentialto facilitate human coordination with many agents. Such capabilities would beuseful in domains including disaster response, urban planning, and real-timestrategy scenarios. In this work, we introduce (1) a real-time strategy gamebenchmark designed to evaluate these abilities and (2) a novel framework weterm HIVE. HIVE empowers a single human to coordinate swarms of up to 2,000agents using natural language dialog with an LLM. We present promising resultson this multi-agent benchmark, with our hybrid approach solving tasks such ascoordinating agent movements, exploiting unit weaknesses, leveraging humanannotations, and understanding terrain and strategic points. However, ourfindings also highlight critical limitations of current models, includingdifficulties in processing spatial visual information and challenges informulating long-term strategic plans. This work sheds light on the potentialand limitations of LLMs in human-swarm coordination, paving the way for futureresearch in this area. The HIVE project page, which includes videos of thesystem in action, can be found here: hive.syrkis.com.