Adjusting for Treatment Effect when Estimating or Testing Genetic Effect is of Main Interest

It is known that “standard methods for estimating the causal effect of a time-varying treatment on the mean of a repeated measures outcome (for example, GEE regression) may be biased when there are time-dependent variables that are simultaneously confounders of the effect of interest and are predicted by previous treatment” (Hernán et al. 2002). Inverse-probability of treatment weighted (IPTW) methods are developed in the literature of causal inference. In genetic studies, however, the main interest is to estimate or test the genetic effect rather than the treatment effect. In this work, we describe an IPTW method that provides unbiased estimate for the genetic effect, and discuss how to develop a family-based association test using IPTW for family-based studies. We apply the developed methods to systolic blood pressure data in Framingham Heart Study, where some subjects took antihypertensive treatment during the course of study.


Background
Originating in 1948, Framingham Heart Study (FHS, Dawber et al. 1951) is an ongoing prospective study of risk factors including genetic variants for cardiovascular disease.The clinical data from FHS includes blood pressure, total cholesterol, fasting HDL cholesterol, blood glucose and others, and the genetic data includes microsatellite markers and single-nucleotide polymorphisms (SNPs).This study provides opportunities to dissect genetic structure of blood pressure, cholesterol and other complex traits.
Since FHS started, one of the phenotypes, systolic blood pressure (SBP), has drawn many researchers' attention.For example, investigators in Levy et al. (2000) and Hunt et al. (2002) found genetic markers significantly associated with SBP.Interestingly, as pointed out in Shea et al. (1985), some subjects took antihypertensive treatment during the course of study.For these subjects, their underlying SBPs were distorted by the antihypertensive medication.In Levy et al. (2000) and Hunt et al. (2002), the SBP levels were adjusted for treatment before analysis by deriving adjusted residuals for subjects on treatment.It is well understood that the past SBP is a time-dependent confounder for estimating the treatment effect because it is a predictor of both the treatment assignment and the future SBP (Robins 1998).With the presence of such time-dependent confounder, standard approaches including adding confounder as a covariate do not work (Robins 1998).With estimating treatment effect as the main interest, Robins and colleagues proposed several remedies including inverse-probability of treatment weighting (IPTW).See, for example, Hernán et al. (2002) and Robins (1999).
In genetic studies, the main interest is the genetic effect that would have been observed if every subject was untreated.Consider a gene that increases SBP, therefore carriers are more likely to develop hypertension and more likely to receive antihypertensive medication.Assume that the treatment is effective in reducing subjects' SBPs.Then using the observed SBPs to estimate the genetic effect will create a downward bias because SBPs on carriers who are treated are lower than the underlying true values had they not received treatment.When the genetic effect is of interest, how does one adjust for subjects taking treatment which distorts their SBPs?Does the convenient practice of including confounder as a covariate work?
To answer these questions, understanding the unbalanced roles of the genetic effect and the treatment effect is essential: when the treatment effect is of primary interest, the genotype, which is not affected by the treatment, can be treated as a covariate; when the genetic effect is of primary interest, the treatment, which is predicted by the genotype through the underlying SBP, however, cannot be treated as a covariate.Therefore, including treatment as a covariate to adjust for bias is inappropriate.For cross-sectional studies, authors in Tobin et al. (2005) examined the biasedness of several standard methods and proposed to use censored regression model.It is shown in Tobin et al. (2005) that by including the treatment assignment as a covariate, some of the genetic effect is adjusted away.
In longitudinal studies, the baseline SBP measurements, based on which the treatment is assigned, may be available.If there are two measurements, one measured when the treatment is assigned and the other measured after the treatment is taken for some period, regular regression analysis with the baseline SBP as a covariate provides an unbiased estimate of the treatment effect, but a biased estimate of the genetic effect.When there are more than two measurements, regular regression analysis gives biased estimates for both the treatment and the genetic effect.
In the next section, we describe an IPTW method that provides asymptoti-cally unbiased estimate for the genetic effect.By an example and some simulation studies, we quantify the biases in some standard methods and compare them with the proposed methods.We then apply the developed methods to the SBP data in FHS.We conclude with some discussions on how to develop a family-based association test using IPTW for family-based studies.

An IPTW Method for Estimating the Genetic Effect
We describe an IPTW method for estimating the genetic effect of a candidate gene in longitudinal studies when subjects' phenotypes are distorted by taking treatment.The focus here is the genetic effect in the population where everyone is untreated.Let i index the subjects and j index the visits.Let Y ij denote the phenotype measurement at visit j of subject i, i = 1, • • • , n, and j = 0, 1, • • • , m i .Assume that measurements between subjects are independent, and at the baseline j = 0, no subject takes the treatment.Let Y i be the vector The predictors include the genotype at the candidate gene, the treatment assignment, and may include time-constant variables such as gender and time-dependent variables such as age.
The standard generalized estimating equation (GEE) regression solves where µ i (β) is the specified marginal means of Y i in terms of parameter β and the predictors, and V i is the working covariance matrix of Y i .For example in a candidate gene study, µ i (β) can be parametrized as a linear model, X i β, where X i includes carrier status at the gene and other important covariates such as gender and age to be adjusted.The standard GEE gives a consistent estimate for the genetic effect when there is no time-dependent confounder; see for example Liang and Zeger (1986).
It is shown in Robins (1986) that in the presence of time-varying confounder, the standard GEE may provide biased estimate for the treatment effect even after including past treatment history and other confounders as covariates.When the genetic effect is of interest, in Section 3 we demonstrate that the standard methods provide biased estimates for the genetic effect.We develop an IPTWtype method to find asymptotically unbiased estimate for the genetic effect.
Simply put, the IPTW method solves a weighted GEE with time-and subjectspecific weights.The basic idea is that, by properly weighting subjects, we construct a pseudo-population in which the past SBP and the treatment assignment are not associated, and therefore the genetic effect is not distorted by the treatment effect.It is then straightforward to estimate the genetic effect and/or its interaction with environmental effects in the pseudo-population.
To construct IPTW estimators, one can use nonstabilized weight, which is the inverse of the conditional probability of the observed treatment status given the corresponding subject's treatment history and observed time-dependent confounder.As an alternative, stabilized weight proposed in [6] is shown to be more efficient and stable, especially when the conditional probability is small.For subject i, the stabilized weights are defined as, where Z i comprises the baseline covariates believed to predict the treatment assignment, T i (k) is the indicator of taking treatment at visit k, Ti (k) is the treatment history up to visit k, C i (k) is the vector of time-varying variables including the confounder at visit k, and Ci (k) is the history of The IPTW method solves the weighted estimating equation, In practice, the weights (2.2) are unknown, but they can be estimated from the data by specifying a model (e.g., logistic model) for the treatment status T i (k); see [1] for more discussion.We illustrate how to obtain the weights in the next section.

Example
We assume there are 200 subjects, half with genotype G = 1 and the other half with G = 0.The threshold for defining high SBP is 130 mmHg.Let L indicate high SBP, with L = 1 being having high SBP and L = 0 being having low SBP.The probability of developing high SBP, P (L = 1|G), is higher in subjects with G = 1 than those with G = 0; they are 0.2 and 0.1, respectively.The assumptions and results are summarized in Table 1.Here we assume that the numbers of subjects within each categories of L and G are consistent perfectly with their expectations, as shown in the 6th column of Table 1.At time 1, the mean SBP of subjects with L = 1 is 140 mmHg, and those with L = 0 is 120 mmHg.Their SBPs are recorded before some subjects being prescribed and then starting to take treatment.Assume that among subjects with L = 1, 80% will receive treatment, and among those with L = 0, 10% will receive treatment.Assume the treatment reduces SBP by an average of 20 mmHg at time 2. For simplicity, no disturbance is added to the phenotype.The outcomes therefore coincide with their expectations.The expected bias is the same as shown here if adding a mean zero random disturbance to the SBP.The genetic effect is β = [20(140) + 80(120)]/100 − [10(140) + 90(120)]/100 = 2.If only using data at time 1, we can obtain an unbiased estimate of the genetic effect, but in practice it is more advantageous to use all the available data.
We consider four standard methods: (1) ignore that some subjects have taken treatments; (2) exclude subjects taking treatments; (3) treat T as a covariate; (4) treat both L and T as covariates.For method 1, the estimated genetic effect is Therefore method 1 underestimates the genetic effect by 0.7.Similar calculations show that method 2 and 3 underestimate the genetic effect by 0.58 and 0.33, respectively.Method 4 estimates the genetic effect as zero and the treatment effect as −20.Therefore, by adjusting for the baseline SBP and treatment assignment, the estimate of the treatment effect is unbiased but the estimate of the genetic effect is biased.Treating T as covariate has the smallest bias, but still non-negligible.
The IPTW method requires computing weights.At time 1, the weights are one.At time 2, the unstabilized weights are 1/P r(T |L), and the stabilized weights are P r(T )/P r(T |L).For instance, the stabilized weight in the first row of Table 1 is (16 + 8 + 8 + 9)/200/0.8= 0.26, and in the second row is (1 − (16 + 8 + 8 + 9)/200)/0.2= 3.98.These weights are summarized in the 11th column of Table 1.The weighted sample sizes N w and N sw are obtained by multiplying the original sample sizes by the corresponding weights and then normalize so that the total sample size remains as 200.These are summarized in the 10th and the 12th column.

Simulations
Simulation studies were conducted based on a modification of Example 1. Two hundred subjects were simulated.Genotypes were generated with P (G = 1) = 0.4.The effect of having G = 1 was assumed to increase a subject's probability of developing hypertension.Four scenarios were considered: subjects with G = 1 were simulated to be hypertensive (L = 1) at time 1 with probability 0.2, 0.3, 0.4 or 0.5, and subjects with G = 0 with probability 0.1.This is equivalent to that subjects with G = 1 on average had SBPs 2, 4, 6, or 8 mmHg higher than those with G = 0, provided that E(Y 1 |G, L) followed the 7th column of Table 1.
Treatments were assigned to subjects following probabilities P (T |L) in the 5th column of Table 1.Hypertensive subjects had higher probability to be treated.To account for a natural increasing trend of SBP as subjects get older, each subject was assumed to have underlying SBP 2 mmHg higher at time 2 than time 1.The treatment effect was generated from a normal distribution with mean -22 and standard deviation 3. The random error added to the underlying SBP at the two time points were generated from a bivariate normal distribution with mean 0, standard deviation 10 for both components, and correlation coefficient 0.5.Each experiment was repeated 1000 times.

Simulation results
Table 2 summarizes the results of estimating and testing the genetic effect.In all scenarios, the standard approaches (no adjustment, excluding subjects on treatment, and including treatment indicator as a covariate) underestimate the genetic effect.The bias increases with the strength of the genetic effect.Including treatment as a covariate has smaller bias than the other two approaches, but the bias is still non-ignorable.Both IPTW approaches with nonstabilized and stabilized weights provide estimates with minimal bias (absolute value ranging from 0.01 to 0.07), but the standard errors using stabilized weights are smaller than using nonstabilized weights.Comparing with the standard approaches, the IPTW approaches have larger standard errors.This reflects a tradeoff between the bias and the variance.Table 2 also shows the results of testing for genetic effect using Wald statistic based on sandwich variance estimate.The IPTW approach with stabilized weights has the largest power.Comparing to stabilized IPTW, the percentage loss of power of the other approaches ranges from 3% to 37%.The most popular standard approach of adjusting for treatment effect by including it as a covariate can have a power loss of 9.6% (scenario 2).
For estimating treatment effect, including treatment assignment as a covariate has large bias ranging from 8.95 to 9.47 (results not shown).The IPTW approach has negligible bias ranging from 0.01 to 0.15.The estimator of treatment effect behaves similarly across all scenarios regardless of the size of the genetic effect.

Data analysis results
We apply the proposed methods to analyze the SBP data in FHS.There were 5209 subjects recruited (2336 men and 2873 women) from Framingham, Massachusetts.FHS includes longitudinal observations from two cohorts: Cohort 1, the original Framingham cohort, was first examined in 1948 and had been examined every 2 years thereafter; Cohort 2, composed primarily of offspring of the original cohort and the spouses of these offspring, was examined first in 1971 and had been examined approximately every 4 years Dawber et al. (1951).
In the mid-1990s about 1800 members of the largest 330 pedigrees in FHS were selected for a genome scan conducted by the Mammalian Genotyping Service in Marshfield, Wisconsin (Cupples et al 2003).The genetic markers span the whole genome with a 10-cM density.Researchers in Levy et al. (2003) identified a marker, GATA25A04 (D17S1299) on chromosome 17 with a two point LOD score of 3.8 for the phenotype SBP.We focus on analyzing the genetic effect at this marker.We estimated the genetic effect using IPTW method and compared it with the standard methods.
We fitted three standard GEE regressions with marker alleles, age, sex and BMI as covariates: the first ignored that some subjects were treated by antihypertensive treatment; the second excluded treated subjects; and the third added the treatment as a covariate in addition to the original ones.We also applied the IPTW method, where the weights SW i were computed using (2.2).Here Z i includes sex and BMI, C i (k) includes the previously observed SBPs and age.The conditional probabilities in the numerator and the denominator of the weights were fitted by logistic regression models; for the numerator, the treatment assignment was outcome and sex and BMI were predictors; for the denominator, past SBPs, age, and age squared were added as predictors.
For all methods, each marker allele was coded as a binary variable of having at least one copy of the allele or not.We assumed that the observations from different families were independent and used independent working correlation matrix.Among 1701 subjects being genotyped, 1667 have valid data at the marker GATA25A04.The marker has eight alleles, with frequencies ranging from 0.1% to 38%.We combined the two alleles with least frequencies (alleles 184 and 212).Table 3 reports the results from the four methods.We found that the estimate of the genetic effect from the IPTW method was larger than those from the standard methods, except for the combination of alleles 184 and 212 where the frequency is 1%.The alleles 196 and 200 were not significant in the standard methods, but were significant in the IPTW method; for example, the allele 196 had an estimate of 3.40 (p = 0.01) in the IPTW method, but the standard methods failed to detect it.We also found that the allele 204 had a significant effect in all methods, but the IPTW provided the largest estimate.These results are consistent with the finding that the standard methods underestimate the genetic effect.

Discussions
Here we describe an IPTW-type method to provide asymptotically unbiased estimators for the genetic effect when the genetic effect is confounded by the treatment effect.By constructing proper weights, we break the correlation between the treatment and the past SBP and therefore the correlation between the treatment and genetic factors because genetic factors predict treatment assignment through SBP.For the method to be valid, the assumption of no unmeasured confounder is assumed.This assumption is debatable, but by adjusting for the observed confounders, the proposed methods can improve the standard methods such as GEE regression to some extent.For the SBP data in FHS, the assumption of no unmeasured confounder is that the treatment assignment depends completely on the measured past SBPs and other covariates.This assumption will not hold if the subjects in the study have their own physicians outside of the study who also monitor their SBPs and prescribe the antihypertensive treatment and the measurements taken outside of FHS are not available to the FHS investigators.However, since subjects are examined every two years in FHS, the SBP measured by an outside physician could be close to the one measured at the nearest preceding visit by the FHS researchers.In this case, it is still reasonable to assume that there are no unmeasured confounder.
Here we assume that the subjects are not taking the treatment at the beginning of the study.While this is true for Framingham study because it started in an era of no available antihypertensive treatment, in other applications this assumption may not hold.If some of the subjects are treated at the baseline because they already have high blood pressure and if their SBP measurements based on which the treatment is prescribed are not available, the weights SW i cannot be computed.These subjects will be excluded from the analysis.However, if the blood pressure measurements used to prescribe the treatments before the study begins are available, these subjects can still contribute to the analyses.
Using the IPTW estimator in (2.3) and its sandwich variance estimate, it is straightforward to construct a Wald test for the genetic effect.For multivariate or longitudinal data in family studies, to control for population stratification, a family-based association test using GEE regression (FBAT-GEE) is proposed in Lange et al. (2003).The test statistic is where is the candidate gene's genotype, and G i ∆ i is the slope of µ i (β) at the null hypothesis of no genetic effect, with µ i (β) and V i defined in (2.1).The expectation E(S|C) and variance V (S|C) are computed conditionally on the minimal sufficient statistic C under the null hypothesis; the minimal sufficient statistics are discussed in Rabinowitz and Laird (2000).Under the null hypothesis, the test statistic has a chi-square distribution with degrees of freedom equal to the number of the levels of G i minus one.
Although the FBAT approach has correct type-I error using any test statistic S, a carefully chosen statistic can be more powerful.For the SBP data, adjusting for the influence of antihypertensive treatment may improve testing power.To this aim, one can use an FBAT-type test based on IPTW (FBAT-IPTW).The test statistic is where , with G i , ∆ i and V i defined in (5.1) and SW i defined in (2.2).The expectation E(T |C) and variance V (T |C) are also computed conditionally on the minimal sufficient statistic C under the null hypothesis.Following the arguments in Lange et al. (2003) and Rabinowitz and Laird (2000), we see that the FBAT-IPTW is robust to population stratification.Note that because observed phenotypes are a part of C and the weights SW only involves observed phenotypes and covariates other than genotypes, the weights can be taken out of the conditional expectation.Therefore in practice, the FBAT-IPTW can be calculated using the software FBAT (http://www.biostat.harvard.edu/∼fbat/)by first computing a weighted phenotype SW i Y i and then inputting them as phenotype in the FBAT program.

Table 1 :
An example illustrating biases in various methodsG L T P (L|G) P (T |L) N um E(Y1|G, L) E(Y2|G, L, T ) W N w SW N sw

Table 2 :
Estimation of genetic effect (β) in four scenarios * SE † Power Loss ‡Note:* : Empirical Standard Error, * Empirical standard error, † Mean estimated standard error, ‡ Percentage decrease of power comparing with IPTW approach using stabilized weights.

Table 3 :
Estimating genetic effect of maker GATA25A04 * Allele frequency