Iterative Multi-Agent Reinforcement Learning: A Novel Approach Toward Real-World Multi-Echelon Inventory Optimization

Abstract

Multi-echelon inventory optimization (MEIO) is critical for effective supplychain management, but its inherent complexity can pose significant challenges.Heuristics are commonly used to address this complexity, yet they often facelimitations in scope and scalability. Recent research has found deepreinforcement learning (DRL) to be a promising alternative to traditionalheuristics, offering greater versatility by utilizing dynamic decision-makingcapabilities. However, since DRL is known to struggle with the curse ofdimensionality, its relevance to complex real-life supply chain scenarios isstill to be determined. This thesis investigates DRL's applicability to MEIOproblems of increasing complexity. A state-of-the-art DRL model was replicated,enhanced, and tested across 13 supply chain scenarios, combining diversenetwork structures and parameters. To address DRL's challenges withdimensionality, additional models leveraging graph neural networks (GNNs) andmulti-agent reinforcement learning (MARL) were developed, culminating in thenovel iterative multi-agent reinforcement learning (IMARL) approach. IMARLdemonstrated superior scalability, effectiveness, and reliability in optimizinginventory policies, consistently outperforming benchmarks. These findingsconfirm the potential of DRL, particularly IMARL, to address real-world supplychain challenges and call for additional research to further expand itsapplicability.