Modern precision medicine aims to utilize real-world data to provide the best treatment for an individual patient. An individualized treatment rule (ITR) maps each patient’s characteristics to a recommended treatment scheme that maximizes the expected outcome of the patient. A challenge precision medicine faces is population heterogeneity, as studies on treatment effects are often conducted on source populations that differ from the populations of interest in terms of the distribution of patient characteristics. Our research goal is to explore a transfer learning algorithm that aims to address the population heterogeneity problem and obtain targeted, optimal, and interpretable ITRs. The algorithm incorporates a calibrated augmented inverse probability weighting estimator for the average treatment effect and employs value function maximization for the target population using Genetic Algorithm to produce our desired ITR. To demonstrate its practical utility, we apply this transfer learning algorithm to two large medical databases, eICU Collaborative Research Database and Medical Information Mart for Intensive Care III. We first identify the important covariates, treatment options, and outcomes of interest based on the two databases, and then estimate the optimal linear ITRs for patients with sepsis. Our research introduces and applies new techniques for data fusion to obtain data-driven ITRs that cater to patients’ individual medical needs in a population of interest. By emphasizing generalizability and personalized decision-making, this methodology extends its potential application beyond medicine to fields such as marketing, technology, social sciences, and education.
Abstract: Observational studies of relatively large data can have potentially hidden heterogeneity with respect to causal effects and propensity scores–patterns of a putative cause being exposed to study subjects. This underlying heterogeneity can be crucial in causal inference for any observational studies because it is systematically generated and structured by covariates which influence the cause and/or its related outcomes. Addressing the causal inference problem in view of data structure, machine learning techniques such as tree analysis can be naturally necessitated. Kang, Su, Hitsman, Liu and Lloyd-Jones (2012) proposed Marginal Tree (MT) procedure to explore both the confounding and interacting effects of the covariates on causal inference. In this paper, we extend the MT method to the case of binary responses along with a clear exposition of its relationship with established causal odds ratio. We assess the causal effect of dieting on emotional distress using both a real data set from the Lalonde’s National Supported Work Demonstration Analysis (NSW) and a simulated data set from the National Longitudinal Study of Adolescent Health (Add Health).