Relaxing the Markov Requirements on Reinforcement Learning Under Weak Relative Ignorability

  • 2025-04-20 16:06:48
  • MaryLena Bleile
  • 0

Abstract

Incomplete data, confounding effects, and violations of the Markov propertyare interrelated problems which are ubiquitous in Reinforcement Learningapplications. We introduce the concept of ``relative ignorabilty" and leverageit to establish a novel convergence theorem for adaptive ReinforcementLearning. This theoretical result relaxes the Markov assumption on thestochastic process underlying conventional $Q$-learning, deploying ageneralized form of the Robbins-Monro stochastic approximation theorem toestablish optimality. This result has clear downstream implications for mostactive subfields of Reinforcement Learning, with clear paths for extension tothe field of Causal Inference.