Modern precision medicine aims to utilize real-world data to provide the best treatment for an individual patient. An individualized treatment rule (ITR) maps each patient’s characteristics to a recommended treatment scheme that maximizes the expected outcome of the patient. A challenge precision medicine faces is population heterogeneity, as studies on treatment effects are often conducted on source populations that differ from the populations of interest in terms of the distribution of patient characteristics. Our research goal is to explore a transfer learning algorithm that aims to address the population heterogeneity problem and obtain targeted, optimal, and interpretable ITRs. The algorithm incorporates a calibrated augmented inverse probability weighting estimator for the average treatment effect and employs value function maximization for the target population using Genetic Algorithm to produce our desired ITR. To demonstrate its practical utility, we apply this transfer learning algorithm to two large medical databases, eICU Collaborative Research Database and Medical Information Mart for Intensive Care III. We first identify the important covariates, treatment options, and outcomes of interest based on the two databases, and then estimate the optimal linear ITRs for patients with sepsis. Our research introduces and applies new techniques for data fusion to obtain data-driven ITRs that cater to patients’ individual medical needs in a population of interest. By emphasizing generalizability and personalized decision-making, this methodology extends its potential application beyond medicine to fields such as marketing, technology, social sciences, and education.
Pub. online:20 Jan 2025Type:Computing In Data ScienceOpen Access
Journal:Journal of Data Science
Volume 23, Issue 4 (2025): Special Issue: Statistical Frontiers of Data Science, pp. 648–658
Abstract
Piecewise linear-quadratic (PLQ) functions are a fundamental function class in convex optimization, especially within the Empirical Risk Minimization (ERM) framework, which employs various PLQ loss functions. This paper provides a workflow for decomposing a general convex PLQ loss into its ReLU-ReHU representation, along with a Python implementation designed to enhance the efficiency of presenting and solving ERM problems, particularly when integrated with ReHLine (a powerful solver for PLQ ERMs). Our proposed package, plqcom, accepts three representations of PLQ functions and offers user-friendly APIs for verifying their convexity and continuity. The Python package is available at https://github.com/keepwith/PLQComposite.
Statistical learning methods have been growing in popularity in recent years. Many of these procedures have parameters that must be tuned for models to perform well. Research has been extensive in neural networks, but not for many other learning methods. We looked at the behavior of tuning parameters for support vector machines, gradient boosting machines, and adaboost in both a classification and regression setting. We used grid search to identify ranges of tuning parameters where good models can be found across many different datasets. We then explored different optimization algorithms to select a model across the tuning parameter space. Models selected by the optimization algorithm were compared to the best models obtained through grid search to select well performing algorithms. This information was used to create an R package, EZtune, that automatically tunes support vector machines and boosted trees.