Distortion Diagnostics for Covariate-adjusted Regression: Graphical Techniques Based on Local Linear Modeling

Linear regression models are often useful tools for exploring the relationship between a response and a set of explanatory (predictor) vari- ables. When both the observed response and the predictor variables are contaminated/distorted by unknown functions of an observable confounder, inferring the underlying relationship between the latent (unobserved) vari- ables is more challenging. Recently, S urk and Muller (2005) proposed the method of covariate-adjusted regression (CAR) analysis for this distorted data setting. In this paper, we describe graphical techniques for assessing departures from or violations of specific assumptions regarding the type and form of the data distortion. The type of data distortion consists of multi- plicative, additive or no-distortion. The form of the distortion encompasses a class of general smooth distorting functions. However, common confound- ing adjustment methods in regression analysis implicitly make distortion assumptions, such as assuming additive or multiplicative linear distortions. We illustrate graphical detection of departures from such assumptions on the distortion. The graphical diagnostic techniques are illustrated with numeri- cal and real data examples. The proposed graphical assessment of distortion assumptions is feasible due to the CAR estimation method, which utilizes a local regression technique to estimate a set of transformed distorting func- tions (Senturk and Nguyen, 2006).


Introduction 1.1 Examples of covariate adjustments in the health sciences
Regression modeling is a useful tool for exploring possible relationships between the primary response and explanatory variables of interest, especially for observational studies.For situations where both the predictors and the response in a regression model are not directly observed, but instead are observed after being contaminated by unknown functions of a common confounder, straightforward applications of regression models may result in misleading conclusions.Adjustment for the effect of the observed confounder is needed.Observable confounders, such as body mass index (BM I) and/or other measures of body configuration, are common in medical or health related studies because they are known confounding variables that affect the primary variables of interest.
The method of covariate-adjusted regression (CAR), proposed by S ¸entürk and Müller (2005), was designed to infer the underlying relationship between the (latent) primary variables of interest under a general multiplicative data distortion setting.Their method was originally motivated by data on inflammation protein markers in haemodialysis patients.More specifically, a primary outcome variable is elevated plasma fibrinogen level (Kaysen et al., 2003;S ¸entürk and Müller, 2005).Fibrinogen is a protein found in blood plasma and it is a risk factor for cardiovascular disease in haemodialysis patients.It is of interest to examine the relationship between fibrinogen concentration and other predictors, such as serum transferrin protein level.However, both primary variables of interest, fibrinogen and transferrin protein levels, are known to depend on body mass index (BM I), which exerts a confounding effect on the protein measurements.A common approach to adjust for the confounders, like BM I, is to normalize the primary variables of interest by simply dividing (by the confounder BM I).S ¸entürk and Nguyen (2006) provide another example of adjustment for BM I in exploring the underlying regression relationship between hypertensive variables and glycosolated hemoglobin (a diagnostic measurement for diabetes).
Adjustment for confounding/distorting covariates is also common in the assessment of environmental contaminants on human health risks from observational or epidemiological studies.For example, the relationship between exposure to lipophilic agents, such as polychlorinated biphenyls (P CBs), and health outcomes is often analyzed after adjustment for the distorting effect of serum lipid (SL; Schisterman et al., 2005).The covariate adjustment here involves the ratio P CB/SL ρ , where the power ρ allows for a more general relationship between P CB and SL.
To provide a more formal description of the above examples, in terms of the multiplicative distortion framework considered here, some notations are needed.
Denote the observed response, the p predictors and the confounder by Y , { X 1 , . .., X p }, and U , respectively.The confounder U in the two examples above is BM I and SL.The above examples further suggest that the distortion is believed to be specific cases of multiplicative distortion of the following type: and where Y and {X} p r=1 are the underlying (latent) variables of interest.The functions ψ(•), φ 1 (•), . . ., φ p (•) are unknown (smooth) distorting functions.The above distortions can induce an artificial relationship between the observed variables, Y and { X r }, that may not be reflective of the true underlying regression relationship between Y and {X r } of interest.The underlying/latent regression relationship is given by E(Y ) = γ 0 + p r=1 γ r X r , where {γ r } p r=0 are the parameters of interest.It is of interest to estimate this latent regression relationship based on the available distorted data, namely Y , X r , and the confounder U .
Note that the distortion framework (1.1) accommodates various forms of covariate adjustments.For instance, it allows for linear and/or possibly nonlinear distortion on both Y and X 1 , . . ., X r .In the PCB example, the predictor distortion is assumed to be nonlinear: φ(U ) = U ρ .In the inflammation protein marker example, the distortion on the response and predictors are assumed to be both linear: ψ(U ) = φ(U ) = U = BM I.This assumption of a common linear distortion is used in practice for its simplicity.
We emphasize that the distortion framework (1.1) allows for the unknown contaminating functions.This is an appealing aspect, from a practical point of view.This is because, in practice, the precise nature of the multiplicative relationships between the confounder and the primary variables of interest is unknown.Lacking this precise knowledge, the practice of dividing by the confounder U , or equivalently assuming the specific linear distortion form, ψ(U ) = U and φ r (U ) = U in (1.1), imposes unnecessarily rigid constraints on the form of the data distortion.Also, the assumption of a specific linear form under multiplicative distortion may be incorrect.We suggest simple graphical techniques that can be used to check if this specific assumption does not hold, as well as other assumptions regarding the data distortion.
We point out here that CAR, an adjustment method under distortion framework (1.1), does not restrict the form of the distorting functions, assuming only that they are smooth functions.Using CAR, the regression relationship between the unobserved variables, Y and {X r } p r=1 , can be consistently estimated based on the distorted data.In addition to allowing the forms of the distorting functions to be more general, CAR also accommodates different types of distortion models, namely: (a) multiplicative distortion (i.e. Under suitable identifiability conditions, given in Section 2, the consistency of the CAR estimators holds under these three types of distortion (S ¸entürk and Nguyen 2006;S ¸entürk and Müller 2005).Covariateadjusted regression was originally proposed in S ¸entürk and Müller (2005) using a rough binning approach for estimation.A more refined estimation method to reduce the variance, based on local regression modeling, was proposed in S ¸entürk and Nguyen (2006).The asymptotic distributions of the CAR estimators were established in S ¸entürk and Müller (2006).
In this work, we examine graphical approaches for assessing specific assumptions regarding the types and forms of the data distortion.For example, violations of the assumption of a specific linear distortion, ψ(U ) = φ r (U ) = U , under multiplicative distortion can be checked graphically.Another example is the assumption that the above distortion only affects the predictors (i.e.ψ(U ) = 1).Also, in some cases, it is possible to fully characterize the types of distortion (i.e.no-distortion, additive, or multiplicative) graphically.We describe graphical techniques to assess these and other related assumptions regarding the data distortion in the context of covariate-adjusted regression.
Finally, we note here that the multiplicative distortion framework (1.1) has similarities with measurement error modeling if the distortion by U is thought of as an error affecting both the response and the predictors.However, a distinct difference with the measurement error literature is that the "measurement" error is a function of an observable confounder U .Although there is a vast literature on additive measurement error modeling, the work on multiplicative measurement error modeling is limited.Estimation procedures targeting the regression coefficients under multiplicative measurement error in the predictor variables were considered by, for example, Hwang (1986) and Iturria, Carroll and Firth (1999).The case of multiplicative measurement errors in both the response and predictors has not been considered previously to our knowledge.

An example of the distortion effects
To further introduce and illustrate the potential distortion effects on the underlying regression relationship between Y and {X r } p r=1 , we consider the following numerical example.Suppose that the underlying (unobserved) regression model of interest is where the predictors {X 1 , X 2 } are bivariate normal with means (2, 4), variances (2 2 , 1.8 2 ) and with correlation r(X 1 , X 2 ) = 0.2.Also, assume that the error term e is normally distributed with mean 0 and variance σ 2 = 0.5 2 .Suppose that we have n = 500 observations from model (1.2).Then the simple ordinary least squares (OLS) estimators will target the underlying regression parameters of interest: γ T = (γ 0 , γ 1 , γ 2 ) = (2, −1.5, 0.8).However, estimation of the relationship between Y and {X 1 , X 2 } is more difficult when the available data has been contaminated.More precisely, suppose that the observed response and predictor values for n observations are { Y i , ( X i1 , X i2 )} n i=1 .The observed (available) data is the result of multiplicative distortions on the response and the predictors: where the unknown smooth distorting functions are ψ(U i ) ∝ U 3 i , φ 1 (U i ) ∝ exp(U i − 4) and φ 2 (U i ) ∝ (U i + 4) 2 .For illustration, we take the confounder U i to be uniformly distributed on the interval [1,6].Figures 1(a) and (b) show the distortion effects on the marginal relation of the response Y to X 1 and Y to X 2 , respectively.Displayed are the undistorted data (black) and the available distorted data (gray/green in online version) along with the OLS regression fits.Although there is a strong negative marginal relationship between Y and X 1 , with r(X 1 , Y ) = −0.76, the strength of this relationship is substantially diminished after the distortion by ψ(•) and φ 1 (•).This is also reflected in the reduced estimated correlation based on the distorted observations: r( X 1 , Y ) = −0.38.On the other hand, the distortion can also artificially strengthen (or weaken) the observed relationship between the response and the predictor(s), when in fact the strength of association is weak (or completely lacking).For instance, in this example, the estimated sample correlation between Y and X 2 is r(X 2 , Y ) = 0.37.However, based on the distorted data, the estimated correlation is higher ( r( X 2 , Y ) = 0.44; see 1(b)).The estimated relationship between the response and the predictors using OLS, if we were to have the original data This is close to the true relationship given by (1.2), as expected.However, the estimated relationship based on the distorted data The overall distortion effect on the relationship between Y and {X 1 , X 2 } is illustrated in Figure ??(c).CAR provides consistent estimation of the underlying relationship based on the distorted data, as detailed in the next Section.

The basic CAR model
We formally describe the basic CAR model and review the estimation method based on local (linear) regression (S ¸entürk and Nguyen, 2006).The regression parameters of interest are {γ r } p r=0 in the underlying (unobserved) regression model, where Y i and {X ir } p r=0 are the response and predictor values corresponding to the ith subject, respectively.The error variable e i is assumed to have E(e i ) = 0 and var(e i ) = σ 2 .Parameter estimation is based on n distorted predictor and response observations, { Y i , X i1 , . . ., X ip } n i=1 , along with the confounding covariate U , where are independent and identically distributed, where X, e and U are mutually independent for the underlying model (2.1) only.
The problem of estimating the parameters, {γ r } p r=0 , is identifiable under some constraints on the unknown smooth distorting functions.A set of reasonable constraints for ψ(•) and {φ r (•)} p r=1 is implied by the natural assumption that the mean distorting effect should correspond to no distortion (S ¸entürk and Müller, 2005), i.e.
E{ψ(U )} = 1 and E{φ r (U )} = 1. (2. 3) The multiplicative distortion model described collectively by (2.1)-( 2.3) is referred to as the covariate-adjusted regression (CAR) model.From the CAR model (2.1)-( 2.3), it appears that targeting the underlying regression parameters will first require the difficult task of estimating the distorting functions directly.However, a connection between the CAR models and varying coefficient models still allows for consistent estimation of the underlying parameters without directly estimating ψ(•) and φ r (•).This relationship results from the following regression of Y on { X r } p r=1 (S ¸entürk and Müller, 2005;2006), where Therefore, a direct regression of the observed response on the set of observed predictors leads to the following multiple varying coefficient model, Cleveland et al. (1991) and Hastie and Tibshirani (1993) proposed varying coefficient models to allow for more flexible regression modeling where the variable U changes the coefficient of X r through the unspecified function β r (U ).Consequently, because the varying coefficient model (2.6) is completely observable, estimation techniques for varying coefficient models can be utilized in the CAR model estimation.One efficient approach is based on local regression modeling (Fan and Gijbels, 1996;Fan and Zhang, 1999;Cai, Fan and Li, 2000), as proposed in S ¸entürk and Nguyen ( 2006) and S ¸entürk ( 2006).We note that there is a vast literature on the theory and application of varying coefficient models.The literature includes Chen and Tsay (1993) for nonlinear time series, Chiang, Rice, and Wu (2001) for repeatedly measured response, and Hoover et al. (1998), Wu and Chiang (2000), Wu andYu (2002), andS ¸entürk (2006) for longitudinal data.
Based on the relationships in (2.5) between the varying coefficient functions, {β r (•)} p r=0 , and the distorting functions, {ψ(•), φ r (•)} p r=1 , the CAR method provides consistent estimation of the underlying (unobserved) regression relationship between Y and {X r } p r=1 .Note that we can consider the {β r (•)} as a set of transformed distorting functions.If we denote the estimators of the varying coefficient functions as { β r (•)} p r=0 , then the CAR estimators of the underlying regression parameters are where X r = n −1 n i=1 X ir and X i0 ≡ 1. (More details are given in Section 2.2 below.)The consistency of the estimators has been shown (S ¸entürk and Nguyen, 2006).
Furthermore, because of the relationships given by (2.5), we can directly use the estimated varying coefficient functions, { β r (•)} p r=0 , for diagnosing various types and forms of the distortion.We provide details of these graphical techniques in Section 2.2 below.However, we first provide a brief summary of the local linear regression estimator of β r (U ), as they are the main quantities used for the graphical assessment of violations of distortion assumptions.

CAR estimators based on local linear regression
Graphical assessment of specific assumptions regarding the forms and types of data distortion can be implemented based on the relationships described by (2.5).This involves the estimated varying coefficient functions { β r (•)} p r=0 .We use a simple local linear regression estimator (Fan and Gijbels, 1996) for estimating the varying coefficient functions as follows.For a given point u, the function β r (•) can be approximated locally as for U in a neighborhood of u.For simplicity, we consider local linear fits, although higher order polynomial approximation for β r (U ) can also be used.However, our previous experience suggests that local linear fits are sufficiently accurate for CAR estimation purposes.
Explicit expressions for the local linear estimators of {β r (•)} are obtained by minimizing the sum Then an estimate of β r (•) is β r (u) = b r and an estimate of the derivative of β r (•) is β r (u) = c r .The bandwidth h can be chosen, for instance, by generalized crossvalidation (Wahba, 1977;Craven and Wahba, 1979).We will elaborate on the bandwidth choice subsequently.
To illustrate the distorting functions, {ψ(U ), φ r (U )} p r=1 , and the estimation of the set of transformed distorting functions, {β r (U )} p r=0 , consider the example introduced in Section 1.2.The distorting functions are ψ(U ) = U 3 /ω 1 , φ 1 (U ) = exp(U − 4)/ω 2 , and φ 2 (U ) = (U 1 ) 2 /ω 3 .The constants (ω 1 , ω 2 , ω 3 ) ≈ (64.81, 1.47, 58.32) are chosen so that the distorting functions satisfy the identifiability constraints in (2.3).The local linear regression estimators of the corresponding varying coefficient functions, namely In the next section we describe how the estimated transformed distorting functions, namely { β r (U )}, can be used to graphically check for violation of specific assumptions about the form and type of data distortion.More precisely, we identify the structures of β r (•) under various distortion assumptions.These structures can then be used to check for violations of model assumptions related to the distortion.

Assessing violation of assumptions on the type of distortion
We first consider the assessment of specific assumptions regarding the type of distortion, namely (1) multiplicative, (2) additive, or (3) no-distortion.Although the CAR model, described in Section 2.1, can account for all three types of distortion, it may be of interest in practice to examine the violation of specific assumptions on the types of distortion.This may lead to the use of simpler estimation procedures than CAR, such as ordinary least squares regression in some special cases.Graphical assessment of distortion assumptions in the context of CAR essentially makes use of the relationship between the unknown distorting functions and their transformed versions, namely the varying coefficient functions in (2.5).
For simplicity of exposition, let us first consider the case of a single predictor.The multiple predictors case is similar and we will address it at the end of this section.For a single predictor, the unobserved model is Y i = γ 0 +γ 1 X i +e i .Under the assumption of additive distortion, the available observations are { Y i , X i } n i=1 , where Y i = ψ(U i ) + Y i and X i = φ(U i ) + X i .Thus, from the underlying model, it follows that where ν(U i ) ≡ {ψ(U i ) − γ 1 φ(U i )}.One may recognize the above model to be a partly linear model (PLM; Heckman, 1986).Note that the PLM is a special case of the varying coefficient model , , where β 0 (U i ) = ν(U i ) + γ 0 and β 1 (U i ) = γ 1 .Because the resulting slope varying coefficient function is constant under additive distortion, i.e. β 1 (U i ) = γ 1 , departures from the assumption of additive distortion can be detected graphically by examining β 1 (U i ) for constancy.
In fact, more information can be obtained from the graphical examination of β 1 (U i ) regarding the type of distortion.More precisely, if β 1 (U i ) is not constant then the distortion type is consistent with a multiplicative form where ψ(U i ) = φ(U i ).This follows directly from the relationships in (2.5) and resulting regression equation under additive distortion given in (3.1) above.
In the special case where the distortion processes on the response and predictor are the same, i.e. when ψ(U i ) = φ(U i ), then the constancy of β 1 (U i ) implies that the distortion can be additive or multiplicative with ψ(U i ) = φ(U i ).More precisely, under multiplicative distortion with ψ( Next, consider the case of no-distortion under the additive model.This is, ψ(U i ) = φ(U i ) = 0. Consequently, we have ν(U i ) = 0, so both the intercept and slope varying coefficient functions are constants: β 0 (U i ) = γ 0 and β 1 (U i ) = γ 1 .These varying coefficient functions are constants under multiplicative distortion model (2.2) as well (i.e.ψ(U i ) = φ(U i ) = 1).Thus, we can graphically check the estimated intercept and slope varying coefficient functions for no-distortion.Again, this is feasible under additive or multiplicative distortion models.Clearly, under the no-disortion case, measurements on U can be ignored.
For the case of multiple predictors in the context of additive distortion, it follows similarly as in (3.1) that Graphical examination of the estimated coefficient functions { β r (U )} p r=1 for constancy, as in the single predictor case, is sufficient for diagnosing departures from the additive distortion assumption.
The following key points summarize our discussion of the graphical assessment of assumptions on the types of data distortion based on { β r (U i )}.
• To detect departures from the additive distortion assumption, it is sufficient to examine whether the estimated slope varying coefficient functions β r (U i ) are constants.
• If β r (U i ) = constant, then the distortion is consistent with a multiplicative form where ψ(U i ) = φ r (U i ).
• If β r (U i ) are constants, then the distortion can be (a) additive or (b) multiplicative with ψ(U i ) = φ r (U i ).
• If β 0 (U i ) and β r (U i ) are all constants, then there is no distortion.

Assessing violations of assumptions on the form of distortion
We next consider the graphical assessment of some specific and common assumptions regarding the functional form of the data distortion using the estimated varying coefficient functions.Of specific interest is the assessment of whether the distortion form is linear under additive distortion.Under linear additive distortion, the distortion functions are ψ(U i ) = a + bU i and φ(U i ) = c + dU i .Thus, we have that where the parameters are α 0 = γ 0 + (a − cγ 1 ) and α 1 = b − dγ 1 .As in (3.1), the resulting regression in (3.2) can be viewed as a varying coefficient model: As before, the slope varying coefficient Additionally, the corresponding intercept varying coefficient function is linear: Thus, departures from or violations of the assumption of a linear additive distortion can be graphically examined by checking for constancy of β 1 (U i ) and linearity of β 0 (U i ).Note that the resulting model (3.2) implies that inclusion of the observable confounder U into a direct regression model based on the distorted data (response and predictor data) will provide a consistent estimate of the underlying slope coefficient γ 1 .Therefore, the commonly used adjustment method of including U as an additional predictor is justified only under an additive linear distortion assumption.Next, we consider two common assumptions on the functional form of the distortion under multiplicative distortion.The first case is linear regression models based on the adjusted response and predictor variables obtained via division by the confounder U .Such models implicitly assume that the distortion type is multiplicative and that the form is a special case of the linear distortion: ψ(U ) = φ(U ) ∝ U .One example, provided in the Introduction Section, involves dividing the observed response P F L (plasma fibrinogen level) and the predictor ST P (serum transferrin protein) by the confounder U = BM I.The adjusted variables assumed to be free of the effect of BM I are P F L/BMI and ST P /BM I.This assumption would hold if, in fact, the distortion on the protein markers are of the form ψ(U ) = φ(U ) ∝ U .Other examples that make this assumption are neurological studies comparing volumetric structures, such as amygdala and hippocampal volumes, obtained from magnetic resonance imaging (MRI).(See, for example, Pinter et al. (2001).)Typically, to compare across patients, the volumetric structures are normalized via division by total cranial volume (T CV = U ) or total brain volume (T BV = U ).This practice of division by the confounder implicitly assumes the special linear-multiplicative distortion of the forms ψ(U ) ∝ U and φ(U ) ∝ U .Thus, it follows directly from the relationships given by (2.5) that β r (U ) ∝ γ r for r ≥ 1 and β 0 (U ) ∝ γ 0 U .Thus, violations of the assumption of this specific multiplicative linear distortion can be detected by checking for departures from linearity of β 0 (U ) and the constancy of β r (U ).r=0 (with h = 2.0), for the case of special linear distortion on the predictor variable only: ψ(U ) = 1, φ r (U ) ∝ U .These distorting functions correspond to β 0 (U ) = γ 0 and β r (U ) ∝ U −1 under the covariate-adjusted regression model.The light thin (green for online version) cures are reference curves cU −1 for various constants of proportionality c.Departure from the distortion assumption occurs when the the estimated curve (dashed) deviates from a reference curve.The second common assumption used in practice is the assumption that φ(U ) ∝ U and ψ(U ) = 1.That is, the special linear distortion is believed to only affect the predictor variable and the response variable is unaffected by the confounder.In this case violation of this assumption can be determined by checking for departures from β 0 (U ) = constant and β 1 (U ) ∝ U −1 .Figure 3 illustrates the estimated varying coefficient functions, β 0 (U ) and β r (U ), r = 1, 2, for this case and Figure 4 illustrates the above case where the distorting functions are proportional to the confounder U : ψ(U ) = φ r (U ) ∝ U .In both cases, the data were generated using the same parameters as the motivating example introduced earlier in Section 1.2 (and also summarized in Figure 1).The examples displayed in Figures 3 and 4 use the local linear regression estimation procedure described in Section 2.2.As discussed earlier, the local linear regression modeling require selection of the bandwidth h.We used generalized cross-validation (Wahba, 1977;Craven and Wahba, 1979) as previously described in S ¸entürk and Nguyen (2006) with the the Epanechnikov kernel, K(t) = 0.75(1 − t) 2 + .That is, the bandwidth h is chosen to minimize the generalized cross-validation criterion: Finally, we note that under the multiplicative distortion, if the distortion processes on the response and predictors are the same, whether they are linear or nonlinear, then β r (U ) (r ≥ 1) are constants.Consequently, plotting the estimated intercept function β 0 (U ) provides the functional form of the common distortion, since β 0 (U ) = γ 0 ϕ(U ), where ϕ(U ) ≡ ψ(U ) = φ r (U ).

A data example: Graphical assessment of the distorting effect of BMI on cholesterol and blood pressure measurements
In this data example, we examine the distortion effect of body mass index (BM I) on the regression relationship between serum cholesterol (SC; mg/100ml) and blood pressure (BP) measurements, namely systolic BP (SBP; mm Hg) and diastolic BP (DBP; mm Hg).The underlying relationship under exploration is SC = γ 0 + γ 1 SBP + γ 2 DBP + e.It is postulated that both response and predictor measurements may be affected by each individual's body mass index, resulting in the observed data { SC i , SBP i , DBP i } n i=1 and {BM I i } n i=1 are the measurements on the confounder for n individuals.As we discussed above, the distortion effect may be null (i.e., ψ(BM I) = φ r (BM I) = 1 for multiplicative distortion or ψ(BM I) = φ r (BM I) = 0 for additive distortion), additive, or multiplicative.We explore some of these possibilities as well as the functional form of the distortion.
The data that will be examined here was obtained from the National Health and Nutrition Survey (NHANES) and is available from Hosmer and Lemeshow (2000).For illustration, we analyzed a random subset of n = 1000 observations (from 7,344 complete observations available for male subjects).Based on the observed data, we fit the varying coefficient model . ., 1000.Using covariateadjusted regression (Section ??), the estimated relationship between serum cholesterol and blood pressure, adjusted for the effect of BM I, is given by SC = 131.21+ 0.3199SBP + 0.3904DBP , i.e. with γ = (131.21,0.3199, 0.3904) T .The standard error estimates for the γ r can be obtained using the bootstrap, as described in S ¸entürk and Nguyen (2006).Based on 300 bootstrap samples, the standard error estimates corresponding to γ are (13.196, 0.0996, 0.1486).Not surprisingly, predicted SC is positively related to BP , after adjusting for BM I. To explore the type and form of the distortion effects of BM I on SC, SBP , and DBP , we examine the corresponding estimated varying coefficient functions β r (BM I), r = 0, 1, 2. Figure 5 displays these estimated functions, obtained using a bandwidth of h = 8 from generalized cross-validation.Because the estimated varying coefficient functions corresponding to SBP and DBP (i.e.β 1 (BM I) and β 2 (BM I)) are not both constants, the hypothesis of no-distortion effects of BM I on both the response and predictors is not supported.In fact, the estimated varying coefficient functions vary significantly with BM I, so the hypothesis/assumption that the distortion is additive is also not tenable.The estimated functions suggest a multiplicative distortion where ψ(BM I) = φ r (BM I).Under multiplicative distortion, the assumption of no-distortion on the response variable only (i.e.ψ(BM I) = 1) and the assumption of a special linear distortion (i.e.ψ(BM I) = φ r (BM I) ∝ BM I) are not compatible with the observed data.Furthermore, because β 0 (BM I) ∝ ψ(BM I), the form of the distortion on the response variable, serum cholesterol, can be inferred directly from the plot of β 0 (BM I).As can be seen from Figure ?? the distortion on the response is approximately linear and increasing in BM I in a wide range of observed body mass index (mean BM I ± 1.5 standard deviation: 19.6-33.4).Thus, increasing BM I has an overall monotonic increasing and linear-multiplicative effect on serum cholesterol in this range.The estimated varying coefficient functions corresponding to SBP and DBP suggest that the distortion structure on SBP and DBP are more complex and may not be strictly linear throughout the range of BM I.
Finally, we note that the assumption of a common distortion form that affects both the response and predictors, whether linear or nonlinear, is not compatible with the observed data.That is, the distortion effect of BM I on cholesterol appears to be different than the distortion on blood pressure measurements (SBP and DBP).

Discussion
The covariate-adjusted regression model framework (2.1)-( 2.3) provides a consistent estimation procedure that is automatically adaptive to the case of nodistortion as well as linear and nonlinear additive or multiplicative distortion.Using this consistent estimation procedure as a basis, we have proposed simple graphical techniques to further assess violations of specific assumptions on the forms and types of distortion under the CAR model framework.In real data applications, various simpler adjustment methods are commonly used under specific assumptions on the distortion form and type.Diagnostic techniques presented here can be used to better understand the distortion structures and facilitate interpretation, as well as checking for departures from specific model assumptions.As illustrated with various examples, the approach is feasible due the simple local linear regression estimation of the varying coefficient functions.
When estimating the varying coefficient functions, β r (U ), selection of the bandwidth h can be chosen using the generalized cross-validation (GCV) criterion, for example.Generally, the choice of h is a trade-off between bias and variance.For estimation of the underlying parameters γ r , GCV works well to balance the bias and variance (S ¸entürk and Nguyen, 2006).However, even with the use of GCV, the estimates β r (U ) may not be sufficiently smooth for the graphical uses described here.There are various reasons for this, one of which is the different degrees of smoothness of the functions β r (U ), r = 0, . . ., p.For the graphical diagnostic purposes, one can reduce the variability by oversmoothing, using the chosen GCV choice of h as an initial guideline for the amount of oversmoothing.For the data example above, oversmoothing gave similar results as the GCV choice of h = 8.0.However, our experience with other data sets suggests that this "oversmoothing" after GCV selection may work better for graphical assessment of distortion assumptions.Alternatively, one can also use a two-step local linear approach to estimate the varying coefficient functions (Fan and Zhang, 1999), where the initial (first-step) estimate of β r (U ) is obtained by undersmoothing so that the bias is small.A re-estimation (re-smoothing) is done in the second step.Such an approach can be incorporated into the CAR estimation method and graphical diagnosis of assumptions on the data distortion.

Figure 1 :
Figure1: Example of distortion effects.Effects of the distorting functions ψ(U ) ∝ U 3 , φ 1 (U ) ∝ exp(U − 4) and φ 2 (U ) ∝ (U + 4) 2 on Y , X 1 and X 2 , respectively.The solid lines are the (estimated) marginal relationship of (a) Y on X 1 and (b) Y on X 2 obtained using ordinary least squares with unobserved/undistorted data (black dots).The dotted lines are the OLS fits based on the distorted data Y , X 1 and X 2 (gray dots/green color in online version).(c) The true underlying regression relationship between Y and {X 1 , X 2 } (black) and the corresponding incorrect relationship estimated using OLS based on distorted data (gray).

Figure 3 :
Figure3: Linear distortion on the predictor.Displayed are the estimated varying coefficient functions, { β r (U )} 2 r=0 (with h = 2.0), for the case of special linear distortion on the predictor variable only: ψ(U ) = 1, φ r (U ) ∝ U .These distorting functions correspond to β 0 (U ) = γ 0 and β r (U ) ∝ U −1 under the covariate-adjusted regression model.The light thin (green for online version) cures are reference curves cU −1 for various constants of proportionality c.Departure from the distortion assumption occurs when the the estimated curve (dashed) deviates from a reference curve.

Figure 4 :
Figure4: Linear distortion on both response and predictor.Special linear distortion on both the response and predictors, namely ψ(U ) = φ r (U ) ∝ U , leads to constant functions for β r (U ), r ≥ 1.The dashed lines are the corresponding local linear estimates with h = 2.0.Note that when the same nonlinear distortion affects both the response and the predictors (i.e., ψ(U ) = φ r (U ) ≡ ϕ(U ) and nonlinear), only the plot of β 0 (U ) (top left) will change, reflecting this nonlinearity of ϕ(U ).The remaining plots of β r (U ), r > 0, are still constants.

Figure 5 :
Figure 5: Distortion in cholesterol-blood pressure data example.Displayed are the estimate of the distorting function β 0 (U ) ∝ ψ(U ) on cholesterol (top left) and the transformed distortion functions (varying coefficient functions) corresponding to SBP (right) and DBP (below), with bandwidth h = 8.0.