Abstract
Incomplete data, confounding effects, and violations of the Markov propertyare interrelated problems which are ubiquitous in Reinforcement Learningapplications. We introduce the concept of ``relative ignorabilty" and leverageit to establish a novel convergence theorem for adaptive ReinforcementLearning. This theoretical result relaxes the Markov assumption on thestochastic process underlying conventional $Q$-learning, deploying ageneralized form of the Robbins-Monro stochastic approximation theorem toestablish optimality. This result has clear downstream implications for mostactive subfields of Reinforcement Learning, with clear paths for extension tothe field of Causal Inference.