Robo-Troj: Attacking LLM-based Task Planners

Abstract

Robots need task planning methods to achieve goals that require more thanindividual actions. Recently, large language models (LLMs) have demonstratedimpressive performance in task planning. LLMs can generate a step-by-stepsolution using a description of actions and the goal. Despite the successes inLLM-based task planning, there is limited research studying the securityaspects of those systems. In this paper, we develop Robo-Troj, the firstmulti-trigger backdoor attack for LLM-based task planners, which is the maincontribution of this work. As a multi-trigger attack, Robo-Troj is trained toaccommodate the diversity of robot application domains. For instance, one canuse unique trigger words, e.g., "herical", to activate a specific maliciousbehavior, e.g., cutting hand on a kitchen robot. In addition, we develop anoptimization method for selecting the trigger words that are most effective.Through demonstrating the vulnerability of LLM-based planners, we aim topromote the development of secured robot systems.