Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability

Abstract

Incomplete data, confounding effects, and violations of the Markov propertyare interrelated problems which are ubiquitous in Reinforcement Learningapplications. We introduce the concept of ``partial ignorabilty" and leverageit to establish a novel convergence theorem for adaptive ReinforcementLearning. This theoretical result relaxes the Markov assumption on thestochastic process underlying conventional $Q$-learning, deploying ageneralized form of the Robbins-Monro stochastic approximation theorem toestablish optimality. This result has clear downstream implications for mostactive subfields of Reinforcement Learning, with clear paths for extension tothe field of Causal Inference.