Loading [MathJax]/jax/output/HTML-CSS/config.js

Relaxing the Markov Requirements on Reinforcement Learning Under Weak Partial Ignorability

  • 2025-04-16 12:57:23
  • MaryLena Bleile
  • 0

Abstract

Incomplete data, confounding effects, and violations of the Markov propertyare interrelated problems which are ubiquitous in Reinforcement Learningapplications. We introduce the concept of ``partial ignorabilty" and leverageit to establish a novel convergence theorem for adaptive ReinforcementLearning. This theoretical result relaxes the Markov assumption on thestochastic process underlying conventional $Q$-learning, deploying ageneralized form of the Robbins-Monro stochastic approximation theorem toestablish optimality. This result has clear downstream implications for mostactive subfields of Reinforcement Learning, with clear paths for extension tothe field of Causal Inference.