Q-learning with Compound Outcome and Mixed Misclassification and Measurement Error in Covariates
Pub. online: 15 October 2025
Type: Statistical Data Science
Open Access
Received
30 August 2024
30 August 2024
Accepted
21 September 2025
21 September 2025
Published
15 October 2025
15 October 2025
Abstract
Precision medicine is an innovative approach that aims to customize medical treatments and interventions to patients based on their individual characteristics. Several estimation techniques, including Q-learning, have been developed to determine optimal treatment rules. However, the applicability of these methods depends on the availability of precisely measured variables. This study extends the scope of Q-learning to incorporate compound outcomes, deviating from the commonly assumed univariate outcomes, and further accommodates data with mismeasurement in both binary and continuous covariates. Two methods are described to mitigate the impact of mismeasurement. Numerical studies reveal that mismeasurement in covariates leads to notable estimation bias in parameters indexing the optimal treatment, yet the methods addressing the mismeasured effects yield improved results.
Supplementary material
Supplementary Material
S1.
An Example of Constructing
S
K
j
∗
(
θ
K
j
;
Y
K
j
i
,
A
‾
K
i
,
X
‾
K
i
∗
,
C
‾
K
i
∗
,
Z
‾
K
i
)
S2.
Proportion of Optimally Treated Future Patients
S3.
Simulation Results for Correction Strategies with Reduced Sample Size
S4.
Simulation Results for Correction Strategies with Reduced Validation Subsample Size
S5.
Data Analysis
References
Akazawa K, Kinukawa N, Nakamura T (1998). A note on the corrected score function adjusting for misclassification. Journal of the Japan Statistical Society, 28(1): 115–123. https://doi.org/10.14490/jjss1995.28.115
Henmi M, Eguchi S (2004). A paradox concerning nuisance parameters and projected estimating functions. Biometrika, 91(4): 929–941. https://doi.org/10.1093/biomet/91.4.929
Khadem Charvadeh Y, Yi GY (2024a). Accommodating misclassification effects on optimizing dynamic treatment regimes with Q-learning. Statistics in Medicine, 43(3): 578–605. https://doi.org/10.1002/sim.9973
Khadem Charvadeh Y, Yi GY (2024b). Understanding effective virus control policies for COVID-19 with the Q-learning method. Statistics in Biosciences, 16(1): 265–289. https://doi.org/10.1007/s12561-023-09382-w
Ning Y, Yi GY, Reid N (2018). A class of weighted estimating equations for semiparametric transformation models with missing covariates. Scandinavian Journal of Statistics, 45(1): 87–109. https://doi.org/10.1111/sjos.12289
Robins JM (2004). Optimal structural nested models for optimal sequential decisions. In: Lin, DY, Heagerty, PJ (eds.), Proceedings of the Second Seattle Symposium in Biostatistics. Lecture Notes in Statistics, vol. 179. Springer. New York, NY. https://doi.org/10.1007/978-1-4419-9076-1_11
Robins JM, Rotnitzky A, Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427): 846–866. https://doi.org/10.1080/01621459.1994.10476818
Spicker D, Wallace MP (2020). Measurement error and precision medicine: Error-prone tailoring covariates in dynamic treatment regimes. Statistics in Medicine, 39(26): 3732–3755. https://doi.org/10.1002/sim.8690
Wang L, Rotnitzky A, Lin X, Millikan RE, Thall PF (2012). Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical Association, 107(498): 493–508. https://doi.org/10.1080/01621459.2011.641416