Pub. online:10 Dec 2025Type:Data Science ReviewsOpen Access
Journal:Journal of Data Science
Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 86–105
Abstract
Reinforcement Learning (RL) is a powerful framework for sequential decision-making, enabling agents to optimize actions through interaction with their environment. While widely studied in computer science, statisticians have advanced RL by addressing challenges like uncertainty quantification, sample efficiency, and interpretability. These contributions are particularly impactful in healthcare, where RL complements Dynamic Treatment Regimes (DTRs), optimizing personalized medicine by tailoring treatments to individuals based on evolving characteristics. This paper serves as both a tutorial for statisticians new to RL and a review of its integration with statistical methodologies. It introduces foundational RL concepts, classical algorithms, and Q-learning variants, and highlights how statistical perspectives, especially causal inference, address challenges in DTRs. By bridging RL and statistical perspectives, the paper highlights opportunities to enhance decision-making in high-stakes domains like healthcare.
Precision medicine is an innovative approach that aims to customize medical treatments and interventions to patients based on their individual characteristics. Several estimation techniques, including Q-learning, have been developed to determine optimal treatment rules. However, the applicability of these methods depends on the availability of precisely measured variables. This study extends the scope of Q-learning to incorporate compound outcomes, deviating from the commonly assumed univariate outcomes, and further accommodates data with mismeasurement in both binary and continuous covariates. Two methods are described to mitigate the impact of mismeasurement. Numerical studies reveal that mismeasurement in covariates leads to notable estimation bias in parameters indexing the optimal treatment, yet the methods addressing the mismeasured effects yield improved results.