Identiﬁcation of Optimal Combined Moderators for Time to Relapse

Identifying treatment eﬀect modiﬁers (i.e., moderators) plays an essential role in improving treatment eﬃcacy when substantial treatment heterogeneity exists. However, studies are often underpowered for detecting treatment eﬀect modiﬁers, and exploratory analyses that examine one moderator per statistical model often yield spurious interactions. Therefore, in this work, we focus on creating an intuitive and readily implementable framework to facilitate the discovery of treatment eﬀect modiﬁers and to make treatment recommendations for time-to-event outcomes. To minimize the impact of a misspeciﬁed main eﬀect and avoid complex modeling, we construct the framework by matching the treated with the controls and modeling the conditional average treatment eﬀect via regressing the diﬀerence in the observed outcomes of a matched pair on the averaged moderators. Inverse-probability-of-censoring weighting is used to handle censored observations. As matching is the foundation of the proposed methods, we explore diﬀerent matching metrics and recommend the use of Mahalanobis distance when both continuous and categorical moderators are present. After matching, the proposed framework can be ﬂexibly combined with popular variable selection and prediction methods such as linear regression, least absolute shrinkage and selection operator (Lasso), and random forest to create diﬀerent combinations of potential moderators. The optimal combination is determined by the out-of-bag prediction error and the area under the receiver operating characteristic curve in making correct treatment recommendations. We compare the performance of various combined moderators through extensive simulations and the analysis of real trial data. Our approach can be easily implemented using existing R packages, resulting in a straightforward optimal combined moderator to make treatment recommendations.


Introduction
Substantial heterogeneity of treatment effectiveness exists in some clinical studies, and identifying treatment effect modifiers plays an essential role in improving the treatment efficacy.
Here treatment effect modifiers (or moderators) are defined as variables measured at baseline that exhibit an interactive effect with treatment on outcomes (Kraemer et al., 2002).In practice, existing trial data and observational studies are often utilized to search for moderators by looking for all possible interactions with treatment through regression-based methods.However, the power to detect the treatment effect modifiers could be reduced by limited sample sizes, modest interaction effects, or the strong main effect that explains much of the variability in the outcome (Kraemer, 2013).On the other hand, exploratory analyses that examine one moderator per statistical model are known for the tendency of finding spurious interactions, especially when a long list of variables is tested for moderation effects.Thus, extant literature has focused on recommending an appropriate treatment by estimating effect modification via a systematic approach (Kraemer, 2013;Tian et al., 2014;Chen et al., 2017;Song et al., 2017;Liang and Yu, 2022;Yadlowsky et al., 2021;Park et al., 2022).
The goal of this article is to propose an intuitive and readily implementable approach to discovering treatment effect modifiers and making treatment recommendations for time-to-event outcomes.The work is motivated by one of our randomized controlled trials (RCTs), Strategies to Avoid Returning to Smoking (STARTS), with the goal of preventing postpartum smoking relapse (Levine et al., 2013).In STARTS, a cognitive behavioral treatment (CBT) was compared to a supportive behavioral treatment (SBT), and no significant differences were found in time to relapse during one-year postpartum (Levine et al., 2016).We are curious, given null effects, particularly, in studies with two active treatments, about whether one condition might benefit a subgroup more than another and thus search for moderators.
We start with the contrast function where E[Y |D, M] is the expected outcome Y given an intervention D and a set of moderators M.
Adopting the potential outcome framework in causal inference, let (Y 1 , Y 0 ) denote the potential outcomes if a participant received a new treatment and a standard treatment, respectively.Then, under regular causal inference assumptions, (M) is the conditional average treatment effect (CATE) and can be interpreted as a causal effect modifier (Rubin, 1974(Rubin, , 2005)), i.e., Kraemer (2013) developed a parametric framework based on matched pairs of treated and untreated subjects and modeled the difference in the paired outcomes as a linear combination of moderators.Similarly, based on the causal interpretation of the moderator effect, Tian et al. (2014) developed a framework that posited working models for estimating the moderator effect in RCT studies by directly modeling the outcome on modified moderators.Mo and Liu (2022) recently proposed an efficient learning framework for continuous outcomes, which includes the model by Tian et al. (2014) as a special case under homogeneous variance.However, the implementation of efficient learning is not straightforward.
In this work, we focus on developing an intuitive method to be used in practice.Kraemer's framework has been frequently implemented in psychiatric studies to detect treatment effect modifiers for eating disorders, anxiety, and depressive disorder, among others (Wallace et al., 2013(Wallace et al., , 2018;;Wallace and Smagula, 2018;Smagula et al., 2016;Kaneriya et al., 2016;Niles et al., 2017;Hildebrandt et al., 2020;Chin Fatt et al., 2020).We will extend Kraemer's framework from a continuous outcome to the time-to-event setting, and construct a composite moderator from a list of candidates as an optimal causal effect modifier for time to relapse in STARTS.CATE is often the focus in detecting treatment effect modifiers for survival outcomes and is modeled based on standard survival models like Cox proportional hazard (PH) model to handle censoring (Tian et al., 2014;Yadlowsky et al., 2021).In this work, we adopt Tian's interpretation by modeling CATE as the causal effect modifier and use inverse-probability-of-censoring weighting (IPCW) to handle censoring, as it can be flexibly combined with different frameworks without changing the model interpretations (Goldberg and Kosorok, 2012;Zhao et al., 2015;Cui et al., 2017).
We will model the CATE based on matched pairs to minimize the impact of misspecifying main effects and avoid complex modeling as in Mo and Liu (2022).The 1:1 nearest neighbor matching (NNM) algorithm is usually used to estimate the conditional average treatment effect on treated (CATT) because it matches control individuals to the treated and discards unselected controls.Different from the CATE, the CATT is the conditional expectation over the subpopulation of treated people of the treatment effect.
However, under random assignment, the CATT is equivalent to the CATE.Besides, given similar sample sizes in each group, the 1:1 NNM algorithm discards few observations and thus has a limited reduction in power (Stuart, 2010).To reduce matching bias, it is recommended to convert categorical covariates to a series of binary indicators and standardize continuous covariates before matching (Kraemer, 2013).
After matching, we regress the weighted difference in the observed outcomes from a matched pair on the differences in potential effect modifiers, with or without their corresponding average scores, and select important factors to be included in the final linear combination using Z scores, Lasso, and random forest.As subsequent modeling depends on matching, it is important to explore the impact of matching on estimating causal effect modifiers.Two matching metrics are considered in this study: Mahalanobis distance (MD) and propensity score (PS).King et al. (2011) showed that using PS could degrade causal inferences as compared to unmatched methods if the two groups are already well balanced, while using the MD would achieve a lower imbalance.Thus, we will compare these two metrics via simulation studies under different scenarios to assess the impact of matching on our estimators.
The rest of the article is organized as follows: In Section 2, we propose various matchedweighting (MW) estimators for the causal effect modifier.In Section 3, simulation studies under different scenarios are conducted to compare the performance of MW estimators to comparative methods in estimating the treatment effect modification and making treatment recommendations.In Section 4, we illustrate the utility of our modeling framework by applying it to STARTS.Finally, the conclusion and discussion are provided in Section 5.

Matched-Weighting Estimators
In the following, for individual i, let Ti be the minimum of event time T i and independent censoring time C i .Denote the event status as δ i = 1{T i C i }.Let M i be a p-dimensional vector of all potential moderators.The independent censoring assumption can be relaxed to be conditional independence given moderators.In addition, we center the treatment allocation D i , which equals to 0.5 if individual i is in the treatment group and −0.5 if individual i belongs to the control group.Then, the observed outcome can be denoted by n independent and identically distributed (i.i.d.) replications of ( T , δ, D, M), such that {( Ti , δ i , D i , M i ), i = 1, . . ., n}.
In general, we assume that given any variable M, the treatment assignment D is independent of the potential outcomes and is not deterministic, i.e., for all d and M (Strong Ignorability, SI).Furthermore, the Stable Unit Treatment Value Assumption (SUTVA) is assumed, where the potential outcomes for any individual are not affected by the treatment assigned to other individuals, and there is only one form of treatment for each treatment level.

A Model
Now we adopt the framework of Kraemer (2013) based on the matched pairs and extend it to survival outcomes and handle censoring by introducing IPCW weights w j to adjust for the bias caused by censoring.Consider a survival model for the event time T of an individual with treatment D and a vector of potential treatment modifiers M where h is some monotone function of T , is a mean-zero error term, θ 0 is the intercept, θ d is the treatment effect, and θ ma and θ mo are the transposes ( ) of two p-dimensional vectors, referring to the main and moderator effects, respectively.When h = log, it becomes the familiar accelerated failure time (AFT) model.Then under the SUTVA assumption, we have where θ d is the treatment effect when M = 0, and the moderator effect θ mo becomes the coefficient for the causal effect modifier.Motivated by the above relationship, we now consider a matched pair of a treated subject and their control with event times and moderators (T 1 , M 1 , T 0 , M 0 ).With a perfect match, the difference in the potential moderators dM = 0. Thus, if one works on the matched pairs and regresses the difference in the matched outcome on the average of two moderators aM, the slope will be an unbiased estimator of θ mo .
For time-to-event data, not all outcomes are observed.To maintain a relatively large number of matched pairs, all censored observations from each treatment will be excluded before matching.For each matched pair j , denote the paired event time as (T 1j , T 0j ) and the paired patient profile as (M 1j , M 0j ), for j = 1, . . ., n p and n p is the number of pairs.After matching, we start modeling the paired contrast, Then, based on model (1) and the relationship revealed above, we have: where θ d ∈ R, θ mo ∈ R p , d j are i.i.d.mean zero error terms.The modified moderators, which is due to the centering of the treatment allocation.The IPCW is calculated in terms of the paired event times and the survival probability of the censoring time S c , In practice, S c is typically unknown and can be estimated by the Kaplan Meier estimator (Kaplan and Meier, 1958) or from a Cox proportional hazards model (Cox, 1972), denoted as Ŝc .
Based on model (2), we propose an Ordinary Least Squares (OLS)-typed matched-weighting (MW) estimator for the causal effect modifier, θa : Thus, the "A model" indicates that the modified outcome is fitted on the paired average only.

DA Model
However, in practice, the misspecification of the statistical model or the covariate set when calculating matching metrics may lead to imbalanced baseline characteristics after matching.Therefore, considering the impact of matching imbalance on estimating the causal effect modifier, we adjust for the paired difference term dM and name it as the "DA model": where the main effect θ ma ∈ R p , Subsequently, we propose another MW estimator θda : (5)

Moderator Selection
In Kraemer (2013), an optimal treatment effect modifier was constructed as a linear combination of modified continuous moderators, where those moderators were selected based on their correlations with the paired difference of continuous outcomes.A selection threshold was set after the univariate analysis, but the correlations calculated in practice are generally small.Therefore, we propose to use the standardized estimated coefficient of aM in each univariate analysis as the selector and use the critical value of the corresponding distribution of the estimated coefficient as the threshold.In addition, to minimize the impact of imbalance caused by matching, dM could also be screened and selected into the composite moderator based on its standardized estimated coefficient.These screening procedures yield the other two MW estimators, θSa (α) and θ Sda (α), where S stands for screening, and α denotes the screening threshold.θ Sda (α) is defined similarly as θSa (α), with the additional adjustment θ ma (α)dM j (α) in the equation.
Similarly, L 1 penalized (Lasso) estimators proposed by Tibshirani (1996) can be applied to this weighted matching framework to select important causal effect modifiers, with or without adjusting for the imbalance captured by dM.The penalized MW estimators with a shrinkage parameter λ are denoted as θLa (λ a ) and θLda (λ da ), without and with dM, respectively, which can be calculated by minimizing: where • 1 is the L 1 norm.If the focus is on making treatment recommendations rather than estimating treatment effect modifiers, one could adopt machine learning techniques like random forest or the decision tree under the MW framework (Breiman, 2001;Liaw et al., 2002).

Evaluation
To evaluate the performance of our proposed MW estimators, we will tabulate the sample mean, the average of estimated standard errors (ASE), the empirical standard deviation (ESD), and the empirical coverage rate (CVRT) of the coefficients for causal effect modifiers.On the other hand, the out-of-bag prediction error (OOBPE) and the out-of-bag area under the receiver operating characteristic curve (OOBAUC) will be used to evaluate the performance in making personalized recommendations.The MW estimators with relatively larger OOBAUC and smaller OOBPE will be considered optimal ones to make treatment recommendations.In addition, we also calculated PE and area under the curve (AUC) under a two-sample (TS) setting to check the robustness of the OOB metrics, where the original simulated sample is treated as the training data, and another independent sample is simulated as the testing dataset.

Simulation Setting
In this section, we performed numerical studies to investigate the finite sample performance of the proposed MW estimators in various settings.The causal effect modifiers were estimated under the MW framework in combination with the OLS (MW.O), Lasso (MW.L), and random forest (MW.RF) methods.
Here our method was evaluated and compared with existing approaches in estimating moderator effects, including the AFT model with prior knowledge of error distribution and the Cox PH model.Both approaches were fitted on treatment allocation, each (U) or all possible moderators, and their interactions with treatment.As a competitive method, we also adopted a random survival forest (RSF) model with the log-rank score splitting rule and a Kaplan-Meier (KM) based OOB ensemble estimator (Ishwaran et al., 2008;Ishwaran and Kogalur, 2007).Although Cox could not provide a comparative estimation of the moderator effect when the PH assumption is violated, and the RSF predicts the survival probability only, one could still calculate the pair difference in the predicted hazard risks or survival probability and evaluate their performance in making treatment recommendations.
The outcome was generated from parametric survival models with log-linear representation: where e i ∼ F and σ is the scale parameter, i = 1, . .., 300.Two treatment arms with equal group sizes were specified.We assumed that the first 5 of 15 moderator candidates had an interactive effect, such that (α, β) = (−6, 0.10), θ ma = (0.15, −0.20, −0.50, 0.25, −0.20, 0.50, 0.25, 0, 0, −0.20, − 0.20, 0, 0, 0, 0) and θ mo = (−0.75,0.50, 0.25, −1.25, 1, 0, . . ., 0).The baseline covariates were generated independently from either the standard normal distribution or a Bernoulli distribution with mean of 0.5.To study the properties of the proposed MW estimators, we considered different simulation scenarios with the following key aspects of interest: (1.)The 1:1 NNM algorithm with two different metrics, MD and PS, was implemented using the R package "MathIt" (Ho et al., 2011); (2.)We considered two error distributions: extreme value distribution (EV) and standard logistic distribution (Logistic); (3.)Two scales of error variance, σ = 1 2 , 1 6 , were used to determine the noise of the data; (4.)We assumed the independent censoring time to follow a 50-50 mixture distribution of exp(λ 1 ) and exp(λ 2 ) and considered two censoring rates 15% and 25%.One thousand simulations were performed under each simulation setting.Due to the space limitation, in this section, we only present the results from representative scenarios and refer the reader to Supplementary Material for the remaining results.

Simulation Results
We first compared the matching performance using MD and PS to study how the matching bias impacts our proposed methods.Ho et al. (2007) pointed out that, one should try as many matching solutions as possible and choose the one that yields the best balance.Consequently, the inclusion of covariates depends not only on factors like the covariate distribution, covariate effect, sample size, etc., but also on the objective of matching and how the optimal balance is defined.In Stuart (2010), the method that achieves optimal balance can be defined as follows: (1.)The one yields the smallest standardized mean difference across the largest covariates.(2.)The one minimizes the standardized difference of means of a few, particularly prognostic covariates.(3.)The one results in the fewest number of "large" standardized differences of means.
In this study, we aim to create matched pairs with similar characteristics.Thus, we included all covariates and evaluated matching performance by the standard pair difference (SPD) for each moderator, i.e., the average absolute within-pair difference of each covariate after matching (Ho et al., 2011).In addition, the proportion of "perfect match" is reported for categorical variables for both MD and PS.Shown in Table 1, the SPDs of moderators matched by the PS are generally larger than the MD, especially for categorical moderators, resulting in smaller  .12(.15) .12(.12) .12(.15) .13(.13) .12(.15) .12(.12) .24(.30 proportions of "perfect match."When estimating the causal effect modifiers (comparing Table 2 to Table 3 under σ = 1/6), the estimated and empirical standard deviations increase as the matching bias increase.A slightly larger bias also appears in the estimators matched by the PS if the selection is involved.We observe similar results from Tables S1 and S2 in Supplementary Material under σ = 1/2.When making treatment recommendations, comparing Table 4 to Table 5, a larger matching bias would degrade the performance of A models, as shown by smaller AUC and larger PE.For example, in Tables 4-5, with 25% censoring, an EV error distribution, and a smaller variance (σ = 1/6), the OOB (TS) AUC of the MW estimator with the OA model using the MD is 0.73 (0.75), which is larger than the AUC using PS, 0.66 (0.67).While for the MW.ODA estimators, Table 3: Estimated moderator effects under the setting: PS σ = 1/6 15% censoring rate; for each estimator of the five non-zero moderators, the cell above shows the estimated moderator effect (and coverage rate), the cell below shows the average standard error(and empirical standard deviation), and k = 1, 2, 3 in MW.SDAk and MW.SAk refer to the threshold value used to select meaningful moderators..17(.17) .17(.16) .17(.17) .17(.18) .17(.17 .17(.18) .18(.17) .17(.18) .17(.18) .15(.15) .15(.16) .34(.34) .35(.35) .33(.36) .34(.37) their AUCs are quite close to each other: 0.91 (0.92) and 0.91 (0.90).This suggests that adjusting D terms could provide a more robust result in the presence of a larger matching bias.When we change censoring rates, we observe by comparing Table 2 to Table 7, and comparing Table 4 to Table S3 in Supplementary Material that the MW estimators have a robust performance in estimating causal effect modifiers and making treatment recommendations as the censoring rate increases from 15% to 25%.The IPCW appears effective in overcoming the censoring issue when the censoring rate is modest.As illustrated by Tables 2-3 and 6-7, if the error term follows a logistic distribution, our proposed estimators' SDs become larger, as the logistic distribution has a heavier tail than the EV error in our setting.When the selection procedure is involved, the skewer the error distribution, the larger the bias.In terms of prediction, a logistic error yields a smaller AUC and a larger PE than the EV error, according to Tables 4-5.Furthermore, when the scale parameter of the error term σ increases, i.e., the data become noisier, the performance of all methods degrades, especially for a heavier tail error, resulting in a larger bias, larger ASE, empirical SD, and PE, and a smaller AUC. .7 9 ( .8 1 ) .9 2 ( .9 3 ) .7 2 ( .7 4 ) .8 8 ( .9 0 ) ----R S F . 7 0 ( .7 3 ) .7 6 ( .7 9 ) .6 7 ( .6 9 ) .7 5 ( .7 8 ) ---- .8 0 ( .8 2 ) .9 3 ( .9 4 ) .7 4 ( .7 6 ) .8 9 ( .9 0 ) ----RSFALL

Moderator
. 7 2 ( .7 5 ) .7 8 ( .8 0 ) .6 9 ( .7 1 ) .7 7 ( .7 9 ) ----Table 6: Estimated moderator effects under the setting: MD σ = 1/2 25% censoring rate; for each estimator of the five non-zero moderators, the cell above shows the estimated moderator effect (and coverage rate), the cell below shows the average standard error(and empirical standard deviation), and k = 1, 2, 3 in MW.SDAk and MW.SAk refer to the threshold value used to select meaningful moderators..13(.12) .18(.18) .13(.12) .17(.17) .12(.12) .15(.13Among all MW estimators, those with variable selection (MW.ODA.S, MW.OA.S and MW.LDA, MW.LA) have slightly larger biases and lower coverage probabilities than the estimators using all possible moderators (MW.ODA, MW.OA).Additionally, selections based on each moderator's standardized coefficient tend to underestimate the variability (i.e., smaller than the empirical standard deviation) as the threshold increases, while for the Lasso-based selections, the averaged standard errors are close to the empirical ones.When making treatment recommendations, ODA and LDA estimators seem to have larger AUCs and smaller PEs than other estimators across all scenarios, where the random forest method has the worst AUC.The MW estimators accounting for all possible moderators have a better performance than the "univariate" analysis (ODA.U, OA.U, AFT.U, and Cox.U).
Table 7: Estimated moderator effects under the setting: MD σ = 1/6 25% censoring rate; for each estimator of the five non-zero moderators, the cell above shows the estimated moderator effect (and coverage rate), the cell below shows the average standard error(and empirical standard deviation), and k = 1, 2, 3 in MW.SDAk and MW.SAk refer to the threshold value used to select meaningful moderators.As the true model in our simulation studies, the AFT model performs well in general, with a smaller bias, estimated/empirical SD, PE, and a larger AUC.Even though the Cox PH model tends to have a larger bias, slightly underestimate the variability, and consequently have a lower coverage probability than the AFT model and our proposed MW estimators, it still seems robust enough to classify patients to suitable treatments when the proportional hazards assumption is violated.The RSF approach yields a smaller AUC than both AFT and Cox, especially when the error variance is small.Compared to competitive methods, when making recommendations, the ODA and LDA estimators have similar AUCs and PEs as the AFT and Cox PH models, where the MW estimators are more sensitive to the increase in noise.When estimating the causal effect modifiers, the AFT and Cox have a smaller estimated standard error than the empirical standard deviation, and thus have a smaller coverage probability than ODA and LDA estimators.In general, the ODA estimator and the AFT model have a smaller bias.

Real Data Application
The Strategies to Avoid Returning to Smoking (STARTS) study was conducted on 300 women from September 2007 to June 2014 in Pittsburgh, PA, USA (Levine et al., 2013).It was a randomized controlled trial aiming to assess the effect of a 24-week cognitive behavioral therapy (CBT) on postpartum smoking relapse prevention, as compared to a standard supportive behavioral therapy (SBT) with fewer interventions.The primary endpoint was the biochemically confirmed sustained tobacco abstinence within 52 weeks postpartum.Then, the time to relapse was determined by counting the number of days between delivery and the first day of 7 consecutive days of smoking.
To illustrate the use of our proposed methodology, we chose thirteen baseline variables as the moderator candidates, including age in years, motivation to stay quit, the number of previous quit attempts, Fagerstrom (FAGR) test score for nicotine dependence, Smoking Self-Efficacy Questionnaire (SEQ-12) score, smoking year to age ratio, the number of cigarettes smoked daily, Edinburgh Postnatal Depression Scale (EPDS) (higher vs. not), Perceived Stress Scale (PSS) (higher vs. not), race (black vs. Others), income level (household income below $30k/yr vs. not), parity and education background (High school or equivalent vs. not), after considering clinical rationales, missing data, and substantial collinearity with others.
Among 268 women with complete data, the censoring rate is 22.8%.Then, 103 matched pairs were created via the 1:1 NNM algorithm with MD, as it yields a more negligible matching bias.The analysis results on STARTS data, including the OOB PE/AUC of Cox, RSF, MW estimators combined with different methods and their causal effect estimators, are tabulated in Table 8 and Table 9, respectively.
Based on Table 8, we observe that all MW estimators have similar PE and AUC.When compared with Cox and RSF, MW methods have slightly larger AUCs.The AUCs from all are generally around 0.60, indicating that the data are noisy.According to Tables 8 and 9, Cox fails to select any significant moderator.At the same time, MW estimators, in general, reveal that  women with stronger motivation, fewer quit attempts, shorter lengths of smoking concerning their age, higher EPDS screening scores, and milder perceived stress and those who were not identified as African American would benefit more from the CBT than the SBT.If we used the combined MW estimator from the LDA model, the one with relatively larger OOBAUC and smaller OOBPE, as our optimal estimator to make treatment recommendations, then, for the 103 matched pairs, 47 of them would be assigned to the CBT group and the rest to SBT.Furthermore, the mean (SD) of time to smoking relapse among the 103 CBT-treated patients is 18.2 (15.3) weeks before re-assignment.After assigning the rest to SBT, the mean (SD) of the modified time to smoking relapse becomes 24.5 (16.1).The roughly six weeks improvement suggests the usefulness of the recommendation by our proposed method.

Discussion
In this paper, we proposed an intuitive and readily implementable framework for estimating causal effect modifiers and making treatment recommendations for a study with survival outcomes.Our approach can be easily applied using well-established R packages, and the resulting optimal combined moderator has a clear and straightforward interpretation.Our framework is built upon matching, which might yield a non-negligible bias.We explored the impact of matching imbalance on the performance of our estimators of causal effect modifiers.With a larger matching imbalance, the bias and estimated and empirical standard deviations also increase.When making treatment recommendations, a larger matching bias could degrade the performance with a smaller AUC and a larger predicted error.However, adjusting the paired differences (DM) in the model provides a more robust result in the presence of a larger matching bias, and matching bias has a limited impact on the performance of our estimators.In the literature, there are other methods that do not require matching, e.g., the method in Tian et al. (2014).However, those methods are more complicated and less straightforward to interpret.Our goal was to find an intuitive method to be used by practitioners to make personalized recommendations for survival outcomes.Thus, we made the trade-off between some bias and the simplicity of the method.
In general, modeling the CATE on a composite of moderator candidates provides higher precision and a more significant effect than exploring the moderator effect univariately.The optimal MW estimator could achieve similar performance as the results of the AFT model with prior knowledge of the error distribution.We also observe that even though the PH assumption is violated, the Cox PH model is often robust enough to make treatment recommendations.
The proposed methods can also be adapted to scenarios with nonlinear effects.If only the main effect exhibits nonlinearity, it will impact the estimation of the intercept θ d but not the moderator effect θ mo , and our proposed methods remain valid.If both the main effect and the interactions are nonlinear, the matching framework simplifies the detection of the nonlinear pattern, as one can plot the residuals from a linear model versus a potential moderator.However, interpreting a nonlinear moderation effect can be notably challenging.In practice, a dichotomized moderator is often employed as a workaround.
One limitation of the MW estimator is that it fails to make precise treatment recommendations with considerable noise, a common problem faced by traditional methods as well.The other disadvantage of our MW framework is that it is subject to matching performance, where the imbalance can be enormous in a high-dimensional setting.Therefore, future studies could adopt high-dimensional matching methods with penalization methods like Lasso to extend the MW framework to a high-dimensional setting.Nevertheless, the proposed methods provide a straightforward and intuitive framework for practitioners to explore heterogeneous treatment effects.More importantly, although we used an RCT study as our data example, the matching framework is more useful for observational studies to draw any causal inference on CATE.

Table 2 :
Estimated moderator effects under the setting: MD σ = 1/6 15% censoring rate; for each estimator of the five non-zero moderators, the cell above shows the estimated moderator effect (and coverage rate), the cell below shows the average standard error (and empirical standard deviation), and k = 1, 2, 3 in MW.SDAk and MW.SAk refer to the threshold value used to select meaningful moderators.

Table 8 :
Prediction result of MW estimators on STARTS data.

Table 9 :
Analysis of STARTS data using the Cox and MW estimators.