Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise

Abstract

Two-time-scale Stochastic Approximation (SA) is an iterative algorithm withapplications in reinforcement learning and optimization. Prior finite timeanalysis of such algorithms has focused on fixed point iterations with mappingscontractive under Euclidean norm. Motivated by applications in reinforcementlearning, we give the first mean square bound on non linear two-time-scale SAwhere the iterations have arbitrary norm contractive mappings and Markoviannoise. We show that the mean square error decays at a rate of $O(1/n^{2/3})$ inthe general case, and at a rate of $O(1/n)$ in a special case where the slowertimescale is noiseless. Our analysis uses the generalized Moreau envelope tohandle the arbitrary norm contractions and solutions of Poisson equation todeal with the Markovian noise. By analyzing the SSP Q-Learning algorithm, wegive the first $O(1/n)$ bound for an algorithm for asynchronous control of MDPsunder the average reward criterion. We also obtain a rate of $O(1/n)$ forQ-Learning with Polyak-averaging and provide an algorithm for learningGeneralized Nash Equilibrium (GNE) for strongly monotone games which convergesat a rate of $O(1/n^{2/3})$.