Abstract
Inferring an adversary's goals from exhibited behavior is crucial forcounterplanning and non-cooperative multi-agent systems in domains likecybersecurity, military, and strategy games. Deep Inverse ReinforcementLearning (IRL) methods based on maximum entropy principles show promise inrecovering adversaries' goals but are typically offline, require large batchsizes with gradient descent, and rely on first-order updates, limiting theirapplicability in real-time scenarios. We propose an online Recursive DeepInverse Reinforcement Learning (RDIRL) approach to recover the cost functiongoverning the adversary actions and goals. Specifically, we minimize an upperbound on the standard Guided Cost Learning (GCL) objective using sequentialsecond-order Newton updates, akin to the Extended Kalman Filter (EKF), leadingto a fast (in terms of convergence) learning algorithm. We demonstrate thatRDIRL is able to recover cost and reward functions of expert agents in standardand adversarial benchmark tasks. Experiments on benchmark tasks show that ourproposed approach outperforms several leading IRL algorithms.