A Bayesian Approach to Successive Comparisons

The present article discusses and compares multiple testing procedures (MTPs) for controlling the family wise error rate. Machekano and Hubbard (2006) have proposed empirical Bayes approach that is a resampling based multiple testing procedure asymptotically controlling the familywise error rate. In this paper we provide some additional work on their procedure, and we develop resampling based step-down procedure asymptotically controlling the familywise error rate for testing the families of one-sided hypotheses. We apply these procedures for making successive comparisons between the treatment effects under a simple-order assumption. For example, the treatment means may be a sequences of increasing dose levels of a drug. Using simulations, we demonstrate that the proposed step-down procedure is less conservative than the Machekano and Hubbard’s procedure. The application of the procedure is illustrated with an example.


Introduction
In some studies in which the effects of k treatments on a response variable are examined, the researcher may believe before observing the data that the true means are simply ordered (µ 1 ≤ . . .≤ µ k ).For example, the treatments may be a sequence of increasing dose level of a drug.In the framework of one way analysis of variance (ANOVA) the problem of testing the null hypothesis H 0 : µ 1 = . . .= µ k against the simply ordered alternative hypothesis H 1 : µ 1 ≤ . . .≤ µ k with at least one strict inequality has received considerable attention in the statistical literature.Barlow et al. (1972) and Robertson, Wright and Dykstra (1988).Recently, Hayter (1990) developed a one-sided studentized range test which provides simultaneous one-sided lower confidence bounds for the ordered pairwise comparisons µ i − µ j , 1 ≤ j < i ≤ k.These and other simultaneous inference procedures are discussed in Miller (1981), Hochberg and Tamhane (1987) and Hsu (1996).
However, under the ordering assumption µ 1 ≤ . . .≤ µ k an experimenter may want to look directly at only the successive difference between the treatment effects, namely the subset of pairwise differences Liu, Miwa and Hayter (2000) derived the simultaneous confidence intervals for the set of successive differences.Liu and Somerville (2004) developed a stepwise multiple testing procedure for successive comparisons of treatment effects which is uniformly more powerful than the simultaneous confidence interval procedure in terms of rejection of the null hypotheses.These procedures are not based on the assumption that µ i 's follow a simple ordering.However, if there is some prior information about the order of µ i 's, i.e., such as µ 1 ≤ . . .≤ µ k for treatment effects, then it can be used to improve the precision of inference.On the other hand, the computation of critical values is also more complicated for large k with respect to controlling the familywise error rate.Recognizing such limitations, we propose a Bayesian method of multiple testing for successive comparisons of ordered treatment effects.This Bayesian development incorporates ordering information using some suitable prior distribution for the parameters.The Bayesian approach to various statistical problems has become popular in recent years due to the advancement in computational power.In the context of comparing treatments, Machekano and Hubbard (2006) proposed a single-step procedure, with a common cut-off value, based on empirical Bayes approach to control familywise error rate for the multiple testing problem.It is performed by estimating the test statistics null distribution over the true nulls.The null distribution comes either from the distribution of the maximum of the k correlated test statistics or the minimum of the p-values.
In the present study, we consider testing whether the two means in a pairwise comparison in the one-way ANOVA are equal (µ i = µ j ) or satisfy increasing (µ i < µ j ) order.In fact, we develop a procedure for testing the families of hypotheses from Bayesian view point.When a test of hypothesis is to be conducted in a Bayesian framework and the parameter space under null hypothesis consists of a single point, a point-mass prior, which is typically a discrete-continuous mixture, may be used.On the successive differences of the means, we place independent prior distributions that are mixtures of a truncated normal distribution and a discrete distribution with entire mass at zero.For simultaneous testing of multiple point null hypotheses against one-sided alternatives, we propose a new stepdown multiple testing procedure that also controls the familywise error rate.In fact this procedure is step-down version of Machekano and Hubbard's single-step procedure.According to the proposed new step-down procedure, the ordered test statistics or the associated p-values are compared with a set of critical values in a stepwise fashion toward identifying the set of true and false null hypotheses.The critical value used in each step incorporates the decision made in the preceding steps.Therefore, it provides better control of the familywise error rate, hence compared to a single-step procedure where decision about true and false null hypotheses is made by simply comparing each test statistic or the corresponding p-value with a single critical value, it is more powerful.
In the remainder of this article, we describe the hierarchical model, prior specifications, and posterior computations.We then discuss the findings of a simulation study, comparing our procedure with other procedures that controls the familywise error rate for the multiple tests of successive comparisons of treatment effects.We demonstrate through this study that our proposed procedure provide much better control of familywise error rate than those that are currently available in the literature.

Multiple Testing Framework
Suppose the treatments are a sequence of increasing dose levels of a drug.An experimenter may want to look directly at only the successive differences between the treatment effects.In order to assess how the treatment effects change with increasing dose levels, we consider to test the following family of hypotheses, (2.1) Let T 1 , . . ., T k−1 be the corresponding test statistics and let p 1 , . . ., p k−1 be the associated p-values.In single-step hypotheses testing, H j0 is rejected if T j > c j (α), or if p j ≤ α, for some chosen α ∈ (0, 1), j = 1, . . ., k − 1.It is often assumed that if H j0 is true, T j comes from some standard theoretical distribution (e.g.normal or t-distribution), from which we get the critical values c j (α).
A step-down test, which is not a single-step procedure, based on k − 1 critical constants d 1 (α) ≤ . . .≤ d k−1 (α), proceeds as follows.Denote the ordered T i -values as T (1) ≤ . . .≤ T (k−1) and the corresponding null hypotheses as H (1)0 , . . ., H (k−1)0 .Start with the largest T -value, then stop and accept all the null hypotheses, otherwise reject H (k−1)0 and go to the next step.In general, if testing has continued to the ith step (1 , then stop testing and accept all the remaining hypotheses, H (1)0 , . . ., H (k−i)0 , otherwise reject H (k−i)0 and go to the (i+1)th step.Define S to be the set of all null hypotheses, S 0 the set of all true null hypotheses and let S 1 represent the set of true non-null hypotheses.In a multiple testing problem, we would like to accurately estimate the subset S 0 , thus its complement S 1 , while controlling probabilistically some error rate under an assumed significance level α.In this article, the critical constants are chosen so that the familywise error rate (FWER), the probability of making at least one error by rejecting a true null hypothesis among all tests, is controlled at a pre-specified level α.It is obvious that choosing the rejection regions, c j (α) and d j (α), j = 1, . . ., k − 1, are challenging, because the theoretical joint null distribution for the k − 1 test-statistics is not always obvious.Moreover, the difference between the theoretical null and empirical null distribution affects simultaneous inference.

Bayes Procedure
Single step maxT multiple testing procedure, with common cut-off, uses a null distribution based on the joint distribution of the test statistics.The null distribution comes either from the distribution of the maximum of the k −1 correlated test statistics or minimum of the p-values.This null distribution is used to define the rejection regions as well as the p-values.Common cut-offs that guarantee control of FWER are chosen from the maximum test statistic distribution.Machekano and Hubbard (2006) introduced a method for choosing the common cut-off c(α) which involves controlling the tail probability of a random variable V defined as This random variable represents a guessed number of false positives among rejections, defined by drawing T = ( T 1 , . . ., T k−1 ) from a null distribution for the test-statistic vector and independently drawing S 0n which represents a guess of the set of true null hypotheses S 0 .The distribution of S 0n and null distribution of T are chosen so that V asymptotically dominates in distribution the true number of false positives V where V = k−1 i=1 I(T i > c(α), i ∈ S 0 ).The Bayes utilizes a Byesian method to estimate the guessed set of null hypotheses S 0n .Machekano and Hubbard (2006) used empirical Bayes approach to estimate S 0n .In their procedure, the critical value c(α) is given by the max T distribution only over the guessed set of null hypotheses.We shall refer to this procedure as maxT procedure.In the next subsection we will use a hierarchical Bayes approach to estimate the guessed set of null hypotheses, and describe how the null distribution of T is drawn.Under this hierarchical Bayes model, we define the maxT procedure and then present the new step-down procedure.

Hierarchical model
Consider the balanced one-way analysis of variance model where the ε ij are independent N (0, σ 2 ) random variables.Let Xi , 1 ≤ i ≤ k be the ith sample mean based upon n observations, X and µ be the vector of Xi and µ i , 1 ≤ i ≤ k − 1, respectively and Σ 1 the covariance matrix of X.We define A to be a matrix such that AX = Y = ( X2 − X1 , . . ., Xk − Xk−1 ) and . The distribution of Y will thus be as The prior for the difference parameter, δ i = µ i+1 − µ i i = 1, . . ., k − 1, is a mixture of a truncated normal distribution and a discrete distribution with its entire mass at δ i = 0, in particular, with 1 ≤ i ≤ k − 1, where I {.} represents the ordinary indicator function.
In the next section for performing multiple comparisons we will use the posterior probability of δ i , (Pr(δ i = 0|y)), to define a random set S 0n which represents a guessed set of true null hypotheses S 0 .To compute the above probability, the Gibbs sampling is a special technique of implementing the Markov Chain Monte Carlo (MCMC) method, which generates observations from the target posterior distribution through iterative sampling.To use the Gibbs sampling, we must obtain the full conditional posterior distributions of all parameters and hyperparameters.
Due to conjugacy, the conditional posterior distribution of ρ and σ 2 can be easily obtained as The conditional posterior distribution of δ i , is a mixture of a point mass at zero and a normal distribution truncated at zero, for 1 where jm and e ji is the (j, i)th element of an upper-triangular matrix which is the Choleski decomposition of (AΣA ) −1 .The normalizing constant s i satisfies where Φ is the normal cumulative distribution function with mean p i and standard error σ(d i + 1) −1/2 .
For our current hierarchical model, Pr(δ i = 0|y) can be easily computed by counting the number of times δ i = 0 generated during the MCMC steps.We will use this posterior probability as the Bernoulli probability on H i0 being true.Now we define a random set S 0n which represents a guess of the set of true null hypotheses S 0 , as follows: As discussed in van der Laan, Brikner and Hubbard (2005), in order to control FWER, S 0n must contain the set of true null hypotheses i.e. S 0 ⊂ S 0n , due to the fact that if i ∈ S 0n while in reality i ∈ S c 0 , the cut-off chosen will be too large.To avoid the above problem, they replace the single guess of the set of true null hypotheses by a random guess from a distribution which is asymptotically degenerate at the set of true null hypotheses.In this study, we use the posterior probability, P r(δ i = 0|y), as the Bernoulli probability on H i0 being true which represents a random guess of the set of true null hypotheses.Our simulations show that using this method to generate S 0n instead of empirical Bayes method used by van der Laan, Brikner and Hubbard ( 2005), has no effects on the controlling of the familywise error rate at nominal level α.
T is computed by drawing a bootstrap sample X * 1 , . . ., X * nk from the empirical distribution P n of the original sample X 1 , . . ., X nk , or from a model based estimate P n of P such that P is the data generating distribution, and subsequently calculating the test statistics based on this bootstrap sample.This will be repeated B * times and will result in an (k −1)×B * matrix of test statistics vectors, each representing a draw from the test statistics vector under the empirical P n or the model based estimate P n .Subsequently, we compute the row means of the matrix, and the matrix is shifted (centered) by the respective means so that the row means after this shift are equal to the null-value.This matrix represents a sample of B * draws from a null distribution.Each row of this matrix will specify a draw of T = ( T j : j = 1, . . ., k −1).We shall refer to this matrix as T 0 .It can be also scaled columnwise so that the row-variances equal a null value (Dudoit, van der Laan and Pollard, 2004;Pollard and van der Laan, 2003).Machekano and Hubbard (2006) have used the single step maxT multiple testing procedure.This procedure is a single-step approach, with common cut-off, which uses a corrected test statistics null distribution as follows: 1. Generate k−1 Bernoulli random variables using probability Pr(δ i = 0|y), i = 1, . . ., k − 1.
3. Take the Hadamard product (entry-wise product) of B 0 and T 0 , then get the maximum in each column T max 1 , . . ., T max B * to get the maxT distribution only among the true null hypotheses.
4. The critical value c(α) is given as the 1−α quantile of the maxT distribution.
Reject H i0 if T i ≥ c(α), otherwise do not reject H i0 .

New step-down procedure
The families of hypotheses in (2.1) are non-hierarchal, that is, the truth or falsity of one hypothesis has no logical implication on the truth or falsity of any other hypotheses in the family.Liu (1996) discussed the determination of stepdown and step-up critical constant for a family of non-hierarchical hypotheses.
We assume that the design is balanced, means that the sample sizes are equal for k treatments.Since in our study the hypotheses are one-sided by using the above step-down procedure, we propose the following new step-down procedure which entails the first two steps of Machekano and Hubbard's method and changes other steps to compute the critical values 3. Take the Hadamard product (entry-wise product) of B 0 and T 0 , then calculate where V hj is (h, j)th element of the matrix which is Hadamard product of B 0 and T 0 .

The critical value d
, then stop testing and accept all the hypotheses, H (1)0 , . . ., H (k−i)0 , otherwise reject H (k−i)0 and go to the (i + 1)th step.
It is obvious from the computation of d i (α), i = 1, . . ., k − 1 in the new stepdown procedure and of c(α) in the maxT procedure that d i (α) ≤ c(α) for 1 ≤ i ≤ k − 1.Consequently, the maxT procedure rejects a set of the hypotheses which is a subset of the hypotheses rejected by the new step-down procedure.Thus, the new step-down procedure is less conservative than the maxT procedure.However, we know that the step-down procedure introduced by Liu and Somerville ( 2004) controls FWER, so the new step-down procedure introduced above guarantees asymptotic control of FWER at level less than or equal to α, given the distribution of null-centered rescaled T s dominates the distribution of observed test statistics (see Machekano and Hubbard, 2006;van der Laan, Brikner and Hubbard, 2005 for details).

Simulation Studies
The simulation study compares the procedure outlined above, new step-down, with the maxT procedure presented in Machekano and Hubbard (2006) and stepdown procedure presented in Liu and Somerville (2004).
In this simulation, the data are nk i.i.d.normally distributed 2 ), i = 1, . . ., k − 1 and j = 1, . . ., n. Recall that the prior for δ i = µ i+1 − µ i , i = 1, . . ., k − 1 is normal with mean zero and variance σ 2 and π(σ 2 ) ∝ 1/σ 2 .For k = 5, 8, 10 and 15, we assumed k − 2 hypotheses in the null group and for k = 21 we assumed 18 hypotheses in the null group, which means that the proportion of true null hypotheses approximately is 0.9.We observe that We select π(ρ) = BET A(9, 1), this prior makes P r(δ i = 0) = 0.9.σ 2 is generated uniformly from (0, 10000) and we then draw δ i and Y ij , i = 1, . . ., k − 1, j = 1, . . ., n, according to the above model.The above process is repeated N times.For each generated data set, we compute the numbers of type I error, We estimate the FWER for each procedure by computing the sample proportion of times the number of type I errors are greater than or equal to 1 in the simulations.
Using the steps described in Section 2.2, 100,000 MCMC samples are generated using the Gibbs sampling after discarding the first 3,000 as burn-in.Since the convergence is not slow, the number of burn-in samples is taken to be 3,000.
The above simulation was run for n = 13, n = 20, α = 0.05 and N = 8, 000.For bootstrap estimation of the null distribution of test statistics, we use B * = 3, 000 and non-parametric bootstrap method.That is, samples are drawn at random, with replacement from the observed data.The results of the simulations for n = 13 are presented in Table 1.It is clear from this Table all three procedures control the FWER at nominal level of α = 0.05 for various k.Clearly, as we mentioned previously, the new step-down procedure is less conservative than the maxT procedure.We also see the FWER of the new step-down procedure is closed to the FWER of the step-down procedure.In many cases the FWER of the new step-down method is almost equal to the nominal Type I error rate, which is ideal for a multiple testing procedure.
Table 2 contains simulations results for above configurations for the case of n = 20.In the situation denoted by " − ", we do not have computed FWER for the step-down procedure because in this case computation of the critical value for the step-down procedure is too complicated.As mentioned earlier our goal mostly is to compare the maxT procedure with the new step-down procedure and particularly showing the performance of the new step-down procedure relative to the maxT procedure.

Application
We take a problem from White and Froeb (1980) for our example.The effect of smoking on pulmonary health is investigated in a retrospective study in which subjects who had been evaluated during a physical fitness profile were assigned, based on their smoking habits, to one of five groups: non-smoker (NS), passive smoker (PS), light smoker (LS), moderate smoker (MS) and heavy smoker (HS).A sample of 1000 female subjects, 200 from each group, were selected and data on pulmonary function (forced vital capacity, FVC) were recorded.It is reasonable to suspect that, for these variables, the mean values decreases from NS to PS, from PS to LS, and so on.Reversing the order, we assume that, with 1=HS, 2=MS, 3=LS, 4=PS and 5=NS, the means are simply ordered: The sample averages are as x1 = 2. 55, x2 = 2.80, x3 = 3.15, x4 = 3.23, x5 = 3.35 and a pooled estimate S = 0.458 was obtained with v = 995 degrees of freedom.For this example, the one-sided step-down critical constants at α = 0.05 are found from Liu and Somerville (2004).The maxT procedure yields the decisions as follows: which means that the hypotheses H 10 : µ 2 − µ 1 = 0, H 20 : µ 3 − µ 2 = 0 and H 40 : µ 5 − µ 4 = 0 are rejected.On the other hand, using the new step-down procedure and the step-down procedure, all four hypotheses are rejected.The new step-down procedure therefore rejects more hypotheses than the maxT procedure.

Concluding Remarks
This paper has introduced a new step-down procedure for multiple testing of simply ordered means in balanced situation.This technique uses the generally valid null-value shifted re-sampling based null distribution for the test statistics, as generally proposed by Pollard and van der Laan (2003) and Dudoit, van der Lann and Pollard (2004).It is important that the set of guessed true null hypotheses must be contain the set of true null hypotheses to control the FWER at nominal level α by the maxT and new step-down procedure.van der Laan, Brikner and Hubbard (2005) have not recommended using the single guess of the true null hypotheses and they replace the single guess of the true null hypotheses by a random guess from a distribution which is asymptotically degenerate at the set of true null hypotheses.Our method uses Bayesian model to generate random guesses of the set of true null hypotheses, which asymptotically converges to mixture model that is used by van der Laan, Brikner and Hubbard (2005).Our simulation studies show that the new step-down procedure is less conservative compared to the mxaT procedure but still gives proper control (FWER≤ 0.05).Posterior distributions and consequently the posterior probabilities can be sensitive to the choice of the priors for unknown parameters, especially variance parameter in our studies.Gelman (2006) investigate various priors for variance parameters and recommended a uniform distribution on the standard deviation.In our study, we have also explored σ ∼ U IN F (0, a 0 ), with various value for a 0 .The results were close to those presented in tables 1 and 2.