AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Abstract

Agents built on LLMs are increasingly deployed across diverse domains,automating complex decision-making and task execution. However, their autonomyintroduces safety risks, including security vulnerabilities, legal violations,and unintended harmful actions. Existing mitigation methods, such asmodel-based safeguards and early enforcement strategies, fall short inrobustness, interpretability, and adaptability. To address these challenges, wepropose AgentSpec, a lightweight domain-specific language for specifying andenforcing runtime constraints on LLM agents. With AgentSpec, users definestructured rules that incorporate triggers, predicates, and enforcementmechanisms, ensuring agents operate within predefined safety boundaries. Weimplement AgentSpec across multiple domains, including code execution, embodiedagents, and autonomous driving, demonstrating its adaptability andeffectiveness. Our evaluation shows that AgentSpec successfully prevents unsafeexecutions in over 90% of code agent cases, eliminates all hazardous actions inembodied agent tasks, and enforces 100% compliance by autonomous vehicles(AVs). Despite its strong safety guarantees, AgentSpec remains computationallylightweight, with overheads in milliseconds. By combining interpretability,modularity, and efficiency, AgentSpec provides a practical and scalablesolution for enforcing LLM agent safety across diverse applications. We alsoautomate the generation of rules using LLMs and assess their effectiveness. Ourevaluation shows that the rules generated by OpenAI o1 achieve a precision of95.56% and recall of 70.96% for embodied agents, successfully identifying87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8scenarios.