A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

Abstract

Large Language Models (LLMs) exhibit substantial promise in enhancingtask-planning capabilities within embodied agents due to their advancedreasoning and comprehension. However, the systemic safety of these agentsremains an underexplored frontier. In this study, we present Safe-BeAl, anintegrated framework for the measurement (SafePlan-Bench) and alignment(Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Benchestablishes a comprehensive benchmark for evaluating task-planning safety,encompassing 2,027 daily tasks and corresponding environments distributedacross 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysisreveals that even in the absence of adversarial inputs or malicious intent,LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, wepropose Safe-Align, a method designed to integrate physical-world safetyknowledge into LLM-based embodied agents while maintaining task-specificperformance. Experiments across a variety of settings demonstrate thatSafe-BeAl provides comprehensive safety validation, improving safety by 8.55 -15.22%, compared to embodied agents based on GPT-4, while ensuring successfultask completion.