A New Analytic Framework for Moderation Analysis — Moving Beyond Analytic Interactions

Conceptually, a moderator is a variable that modifies the effect of a predictor on a response. Analytically, a common approach as used in most moderation analyses is to add analytic interactions involving the predictor and moderator in the form of cross-variable products and test the significance of such terms. The narrow scope of such a procedure is inconsistent with the broader conceptual definition of moderation, leading to confusion in interpretation of study findings. In this paper, we develop a new approach to the analytic procedure that is consistent with the concept of moderation. The proposed framework defines moderation as a process that modifies an existing relationship between the predictor and the outcome, rather than simply a test of a predictor by moderator interaction. The approach is illustrated with data from a real study.


Introduction
Moderation and mediation analyses are widely used in biomedical and psychosocial research (Baron and Kenny, 1986;Chaplin, 1991;Cole and Maxwell, 2003;Crits-Christoph et al., 2003;Holmbeck, 1997;Kraemer et al., 2001;2002;Krull and MacKinnon, 2001;Rogosch et al., 1990;Rothman and Greenland, 1998).Although often implemented in correlational studies in social psychology and other fields of inquiry, moderation and mediation analyses have become increasingly popular and an integral part of data analysis in treatment research (Kraemer et al., 2002).In intervention studies, moderation analysis helps determine whether an intervention has a differential effect among subgroups that are defined by baseline characteristics.Thus, moderators provide useful information for treatment decisions and maximizing treatment effect.In contrast to moderation analysis, mediation analysis helps identify mechanisms by which an intervention achieves its effect.By identifying the correct mediation process through which treatment affects study outcomes, not only can we further our understanding of the pathology of the disease and treatment, but also provide information for developing new and alternative treatments to treat the disease with efficient use of resources.Moderation and mediation analyses are also often performed for epidemiologic studies to determine risk factors and elucidate the causes of a disease.
In a seminal paper, Baron and Kenny (1986) proposed a general framework for characterizing a moderation and mediation process.In particular, they laid a theoretical foundation for conceptualizing such processes and for approaching the underlying analytic problems.In addition, their work clarified the fundamental difference between the closely related, yet fundamentally distinct notion of moderation and mediation.However, as recently pointed out by Kraemer et al. (2001), the limited analytic strategies proposed in their paper have been used and extrapolated to situations to which they often do not apply, leading to confusion in interpreting analysis results and even conflict with the conceptual definition of such processes.For example, their work showed that the presence of analytic interaction between a moderator and a predictor (the product of the two variables) is model-dependent; the same data may show zero or non-zero moderator by predictor interactions depending on which analytic models (e.g., logistic or linear model) are used to fit the data.Thus, the popular approach of simply looking for non-zero interactions as used in most moderation analyses has limited applications and often leads to dubious and uninterpretable results.Defining a general analytic definition consistent with the conceptual notion of moderation is the focus of this paper.
In this paper we restrict our attention to moderation analysis and discuss a new approach to address the limitations of current methods.More specifically, our approach more broadly models the effect of a moderator so that its effect is not limited to analytic interactions.For convenience, we mainly focus on moderation analysis.We show how analytic interactions can become uninterpretable as moderation effect and how one variable can be a moderator without assuming the form of such interactions.After describing the new analytic framework in Section 2, we illustrate the proposed approach with a real data example in Section 3, followed by concluding remarks.

Moderation Analysis
In this section, we first review existing models for moderation analysis and in the process outline the problems with such methods.We then propose our approach to address these issues.

Modeling moderation with interaction terms
For convenience, assume a relatively simple moderation process involving only one predictor, x, a response, y, and a moderator, z.Assume that y is continuous and consider the following linear model relating x to y: where i indexes the subjects from a sample of size n and (0, σ 2 ) denotes a random variable with mean 0 and variance σ 2 .For robust inference, we only assume that i has a mean of and is uncorrelated with x.The latter is known as the pseudoisolation condition, which enables one to establish the influence of on isolated from in a causal relationship (e.g.Bollen, 1989).
A moderator is a variable that affects or modifies the relationship between x and y.In a conceptual sense, if z is a moderator, it interacts with the predictor x to alter the effect of the latter variable on the response y.Because of such an "interaction" interpretation, a popular approach as used in many moderation analyses is to include the first-order (xz) or even higher-order (e.g.x 2 z) moderator by predictor interactions to examine moderation effect.For example, by including the first-order x by z interaction in (2.1), we obtain: (2.2) Under this model, the effect of x on y defined by α 1 in (2.1) has been altered and replaced by a function of z in the form of α 1 + α 2 z i (Aiken and West, 1991;Neter et al. 1990).Because of this, a moderator is also known as an effect modifier.
Although moderation does translate into analytic interactions in the case of (2.2), it is a fundamentally different concept.Indeed, simply interpreting moderation as analytic interactions can have serious ramifications.In particular, not all interactions will have the moderation interpretation.For example, consider the two scatter plots in Figure 1, in which the relationship between y and x is plotted for the two levels of a binary variable z as indicated by circles (z i = 0) and squares (z i = 1).The data in the left diagram is modeled by: (2.3) In this model, the interaction z i x i changes the slopes of the linear relations between x and y at the two levels of z and thus modifies the initial linear relationship in (2.1) for the differential effect by z.Note that (2.3) has an extra term, δ 0 z i , and is thus different from (2.2).This additional main effect of z in (2.3) is used to accommodate the different intercepts corresponding to the two levels of z.The data in the second diagram is modeled by a quadratic model: Although similar, the above is fundamentally different from (2.3) as a model for moderation.Unlike (2.3), this model does not have a moderation interpretation with respect to (2.1) since it does not merely modify the effect of x on y, but rather it postulates quite a different relationship between x and y by adding a quadratic term z i x 2 i .Thus, z cannot be considered as a modifier for the linear relationship between y and x as initially modeled in (2.1).An obvious difference between (2.3) and (2.4) is that the latter involves a higher-order interaction z i x 2 i .However, a moderation model for (2.1) does not have to involve only the first-order interaction.For example, although the model below contains a higher-order interaction between x and z: It is still a moderation model for (2.1) since the inclusion of the interaction z 2 i x i does not change the initial linear relationship between y and x.In comparison to (2.2), it modifies the effect of x on y through a slightly more complex quadratic form.
Note that although (2.4) is not a moderation model for (2.1), it may be viewed as such a model for a different quadratic relationship between y and x: since z in (2.4) modifies the effect of x on y by altering the coefficients associated with the linear and quadratic terms.In general, any model we create by adding certain analytic interactions to (2.1) can be viewed as a moderation model for some model that relates y to x.However, a moderation model for (2.1) should only alter the effect of x on y without changing the original linear relationship between x on y.Thus, the essence of the definition of a moderator z is that the model form remains the same , but the coefficients may change and become functions of z.
The examples above show that not all interactions have a moderation interpretation.The reverse is also true.Interaction in a conceptual sense has a broader interpretation, not just limited to analytic interactions in the form of cross-variable product.Consider for example a non-linear model relating x to y as given by: (2.7) This bi-exponential model, which is not a linear model since y is not a linear function of the coefficient α 1 , is widely used in modeling plasma concentration y as a function of time x in biomedical research (Neter et al. 1990;Davidian and Giltinan, 1995).Although non-linear, the effect of x on y is still defined by α k , (0 ≤ k ≤ 3); if z alters the effect of x on y, it must do so through changes in these coefficients.For example, if α 2 is a linear function of z, it follows that: The above model has the extra term α 21 z(e −x i ) α 3 in comparison to (2.7).Obviously, this term is not an analytic interaction in the form of xz.Further, it is not even possible to express it in the more general form of analytic interaction as h(x)g(z) where h(x) and g(z) denote some functions of x and z, respectively.However, z still modifies the effect of x on y and thus conforms to the conceptual notion of moderation.As in the linear model case, if we literally add analytic interactions to (2.8), we may end up with models with no moderation interpretation.For example, if we simply add the analytic interaction xz to (2.8), we obtain: (2.9) As with (2.4), the above actually represents a new model for the relationship between y and x, rather than a moderation model to account for the altered effect of x on y by z based on the original model in (2.5).
Note that the problem with interpreting conceptual interaction as simply analytic interaction involving cross-variable products has also been noted by Kraemer et al. (2001).By considering analytic interactions across different types of models (e.g.linear, logistic etc.), they demonstrated that the presence of such interaction effects depends on the type of models being fitted.Our considerations above complement their findings by further elucidating the mechanism that causes such model dependency when defining moderation through interactions.

A varying-coefficient model based general framework for moderation -moving beyond interactions
As illustrated in the preceding section, current methods for moderation analysis developed on the premise of analytic interactions are problematic.If used without caution, they may give rise to models that do not have moderation interpretation in the conceptual sense.In addition, such interaction-based strategies generally do not work for non-linear models, as interactions between a predictor and a moderator do not have to be in the form of cross-variable product.In this section, we systematically address these issues simultaneously by proposing an approach that does not rely on analytic interactions.
Let us start with the linear model (2.1) again.As we discussed earlier, this model is determined by the coefficients or parameters, α 0 and α 1 .Thus, if z interacts with x to alter the relationship between x and y, these parameters become a function of z, i.e., (2.10) Unlike the models in (2.2) and (2.3), no specific form is assumed for α k (z) (k = 0, 1).Thus, it represents a general class of models for moderation effect derived based on the original model (2.1).For example, if we know a priori that then we immediately obtain the model in (2.2).The model in (2.10) automatically excludes models that contain analytic interactions but do not have a moderation interpretation such as the model in (2.4).The linear model in (2.10) with the coefficients being a function of a variable is known as the varying-coefficient linear model (Fan and Zhang, 1999;Hastie and Tibishirani, 1993).Thus, our approach to defining the effect of a moderator z on the linear model (2.1) is to change the definition of the coefficients so that they become a function of z.This principle is readily applied to general models such as the generalized linear and non-linear models (McCullagh and Nelder, 1989;Davidian and Giltinan, 1995).For example, the generalized linear model for a binary response is expressed as: where Bib(p, 1) denotes a Binomial (or Bernoulli) distribution with sample size n = 1 and the probability of success p.Thus, in (2.11), the mean of y i is modeled as a function of The most popular choice is the logit function, (2.12) though other link functions such as the probit link are also often used (McCullagh and Nelder, 1989;Tu et al., 1999).Once a link function is chosen, the relationship between y and x is determined by the parameters α 0 and α 1 .Thus, as in the linear model case, we define the effect of a moderator z by letting α k be a function of z: (2.13) In the logistic model case, α k in (2.12) becomes a function of z.
The definition also carries through in a straightforward fashion for non-linear models.For example, by modeling the coefficients as a function of z in (2.6), we obtain: As in the linear model case of (2.10), the above includes (2.8) as a special case, but excludes (2.9) as a model for moderation analysis.
Note that in semiparametric regression analysis, models are specified by the conditional mean of the response given the predictor (Robins et al. 1995): where α is the vector of parameters or coefficients and h(x, α) is a function of x and α.When defined under the semi-parametric regression setup, a moderator z can affect onlyα, without altering the functional form h(x, α), i.e., For example, by expressing (2.1) in the form (2.14), we obtain: Thus, (2.2) and (2.3) are both moderation models for (2.1) since z alters only the parameter vector.However, (2.4) is not a moderation model for (2.1) since it also changes the functional form h(x, α).

Inference for varying-coefficient models
Procedures for fitting varying-coefficient models are based on the idea of "local averaging."For example, for the linear varying-coefficient model in (2.10), first we fix z and use data close to z (window) to fit the model by treating z as a constant.By moving z over the range of z in the data, we obtain estimates of α k (z) as a function of z.This approach will enable us to determine the appropriate form for as well as make inference about α k (z) (e.g.Carroll et al., 1998;Fan and Zhang, 1999;Hart, 1997;Hastie and Tibishirani, 1993).To overcome the difficulty with varying degrees of sparseness and gaps in the distribution of z, methods have been developed to utilize all data in the estimation of α k (z) by employing varying window size and weighted averaging (with more weights attached to observations closer to z).This so-called "kernel smoothing" approach produces smooth functions of α k (z) that can be used to test whether α k (z) is a function of z and to suggest appropriate functional form for their modeling.Since inference for general varying-coefficient models is an area of on-going methodological research, we will not pursue estimation for general varying-coefficient models in this paper.Rather, we discuss several special cases of the varying-coefficient linear model and illustrate how such specific models can be fitted using standard procedures.

Binary and Categorical Moderator
Since the case with a binary z can be subsumed into the discussion of a categorical moderator, we consider only a categorical z and assume that z has a total of K categories.
For such a moderator, (2.10) becomes: In this case, the original sample is partitioned into K sub-samples, each of size n k , and a different linear relationship is postulated for each sub-sample as characterized by the different coefficients or parameters α 0k and α 1k (1 ≤ k ≤ K).
Least squares or estimating equations can be used to estimate the parameters (McCullagh and Nelder, 1989).If z is truly a moderator, then at least two of the α 1k 's will be different.Thus, we can ascertain the mediation role of z by testing the following hypothesis: (2.16) Note that sometimes it may happen that z changes only the intercepts without affecting the slope, i.e., α 1k = α 1 for all 1 ≤ k ≤ K.In this case, z becomes a covariate, since a moderator must change the effect of x on y.

Binary and categorical predictor
As in the preceding section, we consider only a categorical predictor x withK levels.Let n k denote the sub-sample size for the kth level of x (1 ≤ k ≤ K).The varying-coefficient model in (2.10) reduces to: (2.17) As a special case, if α k (z ki ) = α k , we immediately obtain from (2.17) an analysis of variance model (ANOVA) with α k interpreted as the cell mean of the kth subsample or group.Thus, the variable z in (2.17) can be viewed as modifying the cell or group means of an ANOVA.Now, consider a linear α k (z ki ) = γ 0k + γ 1k z ki , in which case (2.17) becomes: (2.18) In this case, the difference between the means of two groups, k and r, is given by: The second term in (2.19) represents the differential effect of z on the means of the two groups and constitutes the moderation effect.If the slopes, γ 1k (2.18) equal to a constant across all groups, i.e., γ 1k ≡ γ 1 (1 ≤ k ≤ K), then this differential effect will be zero and the model in (2.18) reduces to: The above is an analysis of covariance (ANCOVA) model and γ 1 z ki is the adjustment factor for the effect of z ki on the mean response (e.g.Neter et al., 1990).Unlike (2.18), zin (2.20) exerts the same effect on the mean response across all the groups.Thus, in (2.20), z still modifies the effect of x on y (in the form of group means defined by the levels of x), but it does so uniformly across the groups (or there is no x by z interaction).In real study applications, one or more α k (z) in (2.17) may be non-linear or other more complex functions.For example, if K = 2, (2.17) with α k )z) given by: (2.21) can be used to model the scenario where the effect of the dichotomous variable x on y is through a step function defined by some cut-off c of the moderator z as depicted in Figure 2 of Baron and Kenny (1986), where I [z≤c] denotes the set indicator with I [z≤x] = 1 if z ≤ c and 0 if otherwise.Note that the advantage of formulating the model using the vary-coefficient model (2.17) is that we can use smoothing techniques to estimate the functional form of α k (z) so that it is not necessary to specify a priori the value of the cut-off c as in (2.20).We illustrate how such an approach works for a real study example in Section 3.

Relationship to linear mixed-effects model
The linear varying-coefficient model (2.10) is also closely related to a linear mixed-effects (LMM) or hierarchical linear model (HLM) (Laird and Ware, 1982;Gibbons et al., 1994;Raudenbush, 1994).In particular, the varying-coefficients, α k (z), can be viewed as the mean of random individual coefficients given the value of the moderator at z.In this sense, we can derive the model in (2.10) from the perspective of this popular modeling framework.
As in the usual derivation of the linear mixed-effects model, at the first level, we assume a linear model with random individual effects as follows: In the above model, α 0i and α 1i are random variables and are assumed to be uncorrelated with i (k = 0, 1).In the usual linear mixed-effects model, α 0i and α 1i are assumed to have a joint normal distribution and i is also assumed to be normal.In (2.22), we do not assume such parametric distributions.At the second level, we model the conditional distribution of each α ki given z i as: where e ki is assumed to have a mean of 0 and to be uncorrelated with both z i and x i .It follows from the assumption of α ki that e ki is also uncorrelated with i for k = 0, 1.By combining the two models in (2.22) and (2.23), we immediately obtain the following mixed-effects model: where ˜ i = e 0i + e 1i x i + i .As the random effects e ki are not of interest, they have been combined with i to form the model error in (2.24).It is readily shown that ˜ i is uncorrelated with z i and x i , and thus satisfies the pseudo-isolation condition.Thus, (2.24) defines the same linear varying-coefficient model as (2.10), except that the error variance is a function of x 2 rather than a constant as in (2.10).
Note that if e 1i = 0 in (2.23), we obtain the same model as in (2.10).

Longitudinal data analysis
The extension to longitudinal data analysis is straightforward.As before, we only consider a continuous response y with a predictor x and moderator z.For convenience, we consider modeling such a response using the linear mixed-effects model, as this approach is widely used in modeling longitudinal data (e.g.Laird and Ware, 1982;Gibbons et al., 1994;Raudenbush, 1994).
Consider a longitudinal study with n subjects and m assessment points.For illustration purposes, we only consider linear growth-curve analysis in which the trajectory of each subject is modeled as a linear function of time as follows: (2.25) where t denotes time, α k the fixed-effects for the population mean, and b k the random-effects to account for individual differences (0 ≤ k ≤ 3).As in the literature, we assume that i follows a normal distribution with mean 0 and variance σ 2 , and b k (0 ≤ k ≤ 3) follow a joint normal with mean 0 and variance Σ b .We define moderation to be the effect of z on the fixed-effects α k , i.e., α k (z) is a function of z.
In most applications, interest lies in whether z moderates treatment differences.For example, suppose that x is a binary indicator for two treatment conditions.The vary-coefficient linear model in this case is given by: (2.26) In this model, α 0 (z i ) moderates the within-treatment effect, α 2 (z i ) the change over time due to the within-treatment effect, α 1 (z i ) the between-treatment effect at baseline, and α 3 (z i ) the change over time due to between-treatment difference.
If all the varying-coefficients are a linear function of z, α k (z) = γ k0 + γ k1 z, then (2.26) gives rise to the usual approach for modeling moderation effect by including analytic interactions involving x and z.In this case, (2.26) simplifies to: (2.27) In most randomized trials, the mean response does not differ between treatment conditions at baseline so that γ 10 = 0.In addition, if randomization works effectively, z should not have a differential effect at baseline, which implies that γ 11 = 0. So, (2.27) further simplifies to: (2.28) In the above model, the treatment by time interaction, γ 30 , represents treatment difference over time in the absence of the moderator z (when z = 0), while the treatment by time by moderator interaction, γ 31 , represents the moderation effect of z on the treatment difference.The model in (2.28) and its generalizations for multiple treatment conditions are widely used in testing moderation effect in longitudinal studies.
As in the case of cross-sectional study designs, inference for varying-coefficient models can still be made using the usual estimation procedures when z or x or both are categorical variables.For example, we can use standard estimation procedures to fit the model in (2.28).When both z and x are continuous, inference becomes much more complex and smoothing methods may be used.Again, this issue will not be pursued here.Fortunately, in many randomized studies, treatment differences are modeled by binary indicators, in which case standard procedures can be used to fit linear mixed-effects models with varying-coefficients.

A Real Study Data Application
We illustrate the proposed methodology with real study data from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study (Crits-Christoph et al., 1999).This randomized and multi-center project investigated the efficacy of psychosocial treatment for cocaine dependence, with a sample of 487 patients who were randomized to one of four treatment conditions: cognitive therapy (CT) plus group drug counseling (GDC), supportive-expressive (SE) plus group drug counseling (GDC), individual drug counseling (IDC) plus group drug counseling (GDC), and GDC alone.Primary outcome analyses focused on the intent-to-treat sample and examined several measures of drug use (Crits-Christoph et al., 1999).
For illustration purposes, we applied the proposed methodology to data at six month post-treatment using the Addiction Severity Drug Use composite variable (ASI; McLellan et al., 1992) as the response variable.As a significant treatment difference was found among the four treatment groups, it was of interest to examine if the treatment differences were moderated by baseline alcohol consumption as measured by the ASI alcohol use composite.
Let y denote the drug use composite variable at six month post-treatment and z the pre-treatment alcohol use composite variable.We applied the ANOVA model with varying coefficients in (2.17) to examine the effect of moderation by z.To determine the appropriate analytic form for modeling the mean response of each group α k (z ki ) (1 ≤ k ≤ 4), we applied a smoothing procedure, LOWESS (locally weighted scatter plot smoother), to data from each treatment condition (e.g.Fox 2000;Loader 1999).Shown in Figure 2 are the scatter plots together with the fitted LOWESS curves for each of the four treatment groups.20) with linear and quadratic mean response.The γ k0 represent the main effect of group k, γ 1 and γ 2 are the coefficients for the first-and second-order main effect of moderator z, δ's denote coefficients for the first-order moderator by group interactions and η's denote the second-order moderator by group interactions.The fitted LOWESS curves indicated a quadratic mean response for the SE group, but a linear response for each of the other three groups.Thus, to formally test for moderation by z, we fitted the following quadratic response model: where γ's, δ's and eta's are model parameters, and k = 1, 2, 3, 4 denote the IDC, CT, SE and GDC treatment groups, respectively.For robust inference, we did not assume normality for and estimated the parameters using estimating equations or quasi-likelihood (e.g.McCullagh and Nelder, 1989).Shown in Table 1 are the estimated parameters and the associated p-values.The estimated coefficients for the first-and second-order interactions are statistically significant only for the SE group (see estimates of δ 2 and η 3 and their associated p-values), indicating that unlike the other groups, the response for SE had a quadratic relationship with the moderator.Thus, pre-treatment alcohol use was a moderator, as it affects treatment response differentially between this and the other three treatment conditions.
It is interesting to note that when we applied the linear coefficient model (2.18) without the second-order interactions, none of the coefficients were significantly different from 0 (see estimates and associated p-values in Table 1).Thus, by looking only at the first-order interactions as in the traditional way, we would not be able to detect any moderation effect in this case.In this particular application, the use of the varying-coefficient model (2.17) helped identify the correct analytic interactions to model the effect of moderation by z.

Discussion
In this paper, we discussed a general analytic framework for moderation analysis by defining moderation as a process that modifies an existing relationship between the predictor and outcome.As illustrated by both theoretical considerations and real data analyses, moderation can follow quite a complex process, which may not be modeled by simply including analytic interactions involving the moderator and predictor as in most moderation analyses.Since the relationship between the response and predictor is defined by the coefficients or parameters of a given model, it is logical to define the effect of a moderator through such model parameters.Thus, the proposed approach is consistent with the conceptual definition of a moderation process.
Although moderation effect often exhibits in the form of analytic interaction, especially for linear regression models, not all such interactions can be interpreted as moderation effect.By defining moderation effect using the varying-coefficient model, we are able to delineate the types of analytic interaction that have a moderation interpretation from those that do not.Also, since moderation models are defined based on the original model relating the response and predictor, they are consistent and well-interpreted.Thus, the model-dependent issue as pointed out in Kraemer et al. (2001) does not arise.For example, if the original relationship between the response and predictor is a linear model, the effect of a moderator is limited to modifying the coefficients of the linear model, ruling out other types of models, such as the logistic model, as potential candidates for moderation analysis.
Since our goal in this paper was to present an appropriate analytic framework for moderation analysis, we did not get into technical details about inference for general varying-coefficient regression models.When there are multiple continuous predictors and moderators, inference for such models may become quite complex, especially with longitudinal study data.We will address these issues in future research.

Figure 1 :
Figure 1: Two patterns of treatment response and fitted regression lines as a function of moderator z and treatment condition (circles for treatment 1 and squares for treatment 2).

Figure 2 :
Figure 2: Scatter plots with fitted Lowess curves for each of the four treatment conditions.

Table 1 :
Estimates of model parameters, standard errors and p-values for the varying coefficient ANOVA model (