Multiple Comparison Methods in Zero-dose Control Trials

In this paper, the problem of determining which treatments are statistically significant when compared with a zero-dose or placebo control in a dose-response study is considered. Nonparametric methods developed for the commonly used multiple comparison problem whenever the Jonckheere trend test (JT) is appropriate is extended to the multiple comparisons to control problem. We present four closed testing methods, of which two use an AUC regression model approach for determining the treatment arms that are statistically different from the zero-dose control. A simulation study is performed to compare the proposed methods with two existing rank-based nonparametric multiple comparison procedures. The method is further illustrated using a problem from a clinical setting.


Introduction
Multiple comparison methods are used to determine individual group dif-ferences once a global test indicates overall group differences.A method for using the Jonckheere trend test statistic as applied in the AUC regression set-ting is presented in Buros et al. (2017b).Their method is extended to a problem associated with dose-response clinical studies for which one is interested in determining which dose arms are statistically different from a zero-dose or placebo control.A related problem is to determine the smallest dose for which there is a significant difference from the zero-dose control.This dose is referred to as the Minimum Effective Dose (MED) Ruberg (1989).The literature has several parametric (Dunnett (1955), are completely separated, the AUC is 1.The AUC can be interpreted as   (  >   ̅ ) which is the probability that the score of a randomly chosen diseased subject is greater than the score for a randomly chosen non-diseased subject Bamber (1975).
The AUC can be estimated using the distribution functions of two groups using the relationship between the ROC and the survival functions given by   () =   (  −1 ()), where   (•) = 1 −   (•) is the survival function for group g, g ∈ {Y, Z} and t ∈ [0, 1] Pepe et al. (2009).The AUC is defined as The AUC has been shown to be related to the commonly used Mann-Whitney rank sum statistic Bamber (1975).Suppose that  1 , … ,   and  1 , … ,   are independent random samples from the two populations.The Mann-Whitney statistic is given by where (  >   ) = 1 if   >   , (  =   ) = 1/2 if   =   , and (  >   ) = 0 if   <   .The discrete form of (  >   ) in ( 3) is utilized as a generalized linear model by Dodd and Pepe (2003).Their semi-parametric regression model for the AUC enable one to have a covariate adjusted Mann-Whitney statistic.

AUC Regression Model
Let  1  , … ,    denote a random sample of n subjects from the treatment group and  1  ̅ , … ,    ̅ denote a random sample of m subjects control group.In the diagnostic testing literature the classifier (treatment) is said to be ineffective if  0 :  =   (   >    ̅ ) = 0.5.In the case where there are no covariates, a function of the Mann-Whitney statistic in (3) is an unbiased nonparametric estimate of the AUC Bamber (1975) given by Since the performance of a classifier is often dependent upon covariates, Dodd and Pepe (2003) proposed a semiparametric regression model for the AUC given by  () =    where g is a monotone link function and X is a vector of covariates where   = (   >    ̅ ).Solutions to the score equations can be found using standard GLM software.The covariate-specific AUC can be expressed as Since the binary variables in (5) are correlated, the estimates for the regression coefficients,  ̂, are correct, but their tandard errors are not.Dodd and Pepe suggested using the bootstrap to estimate the standard errors.A modification to a method given by DeLong et al. (1988) to compute an estimate for the variance of the Mann-Whitney statistic and to estimate the variance of the parameters using the delta method was proposed by Zhang et al. (2011).
The asymptotic one-sided (1 − )100% confidence intervals for the covariate adjusted AUC Bamber (1975) is given as The estimate for the AUC is obtained from a AUC regression model Dodd and Pepe (2003) where the standard error for the AUC is calculated using a combination of Delong's method and the delta method Zhang et al. (2011).

Jonckheere-Terpstra Statistic
The AUC regression model with at least one strict inequality.Jonckheere (1954) and Terpstra (1952) independently developed the test statistic for this hypothesis known as the Jonckheere-Terpstra statistic (JTS) given by The JTS is more powerful than the Kruskal-Wallis procedure when the alternative hypothesis is monotone Randles and Wolfe (1991).The limiting null distribution of V is normal with mean and variance given by The approach by Buros et al. (2017a) make use of an alternate method of calculating JTS introduced by Odeh (1971) as for where  1 * ,  2

Multiple Comparisons with JTS
A nonparametric multiple comparison procedure was developed by Buros et al. (2017b) based on AUC regression and the JT statistic in (11) to identify individual median differences whenever the global null hypothesis in ( 7) is rejected at the  level.In which case, the problem of interest is to determine where the strict inequalities (breaks) are located while preserving the family-wise error (FWE) at .In which case, there is a strict inequality between the K treatment groups and the control.
The objective is to find its location and to determine if there are any additional strict inequalities.The next step is, for each s and W is asymptotic normal Mann and Whitney (1947) with mean and variance 2. Let  1 be the smallest index such that   ≤ .In which case, group  1 is the smallest index value for which a strict inequality holds when testing (7).If  1 <  then continue to the next step, otherwise the procedure has identified the single strict inequality between groups ( − 1) and K.

Test the new hypothesis
at the /2 level.Repeat steps ( 1) and ( 2) with ( 16) to identify the index  2 >  1 as the smallest index value satisfying   ≤ /2.Note one must recompute   * since the first  1 − 1 groups are no longer used in testing (16).

Proposed Methods
Suppose that the hypothesis of interest is given by ( 7).This paper considers the following problems: 1. Determine the groups for which  0 <   .
2. Determine the MED as defined in Ruberg (1989) by finding the smallest index j such that  0 <   where < indicates statistical significance while controlling the FWE at .
Several methods for addressing these problems are presented where each method is contained within a family of closed null hypotheses Tamhane et al. (1996), where θi is the location parameter for treatment  = 1, … ,  .This family of null hypotheses are said to be closed under intersection if  0 ∈  and  0 ∈  implies that  The next two procedures use the asymptotic one-sided (1 − α)100% confidence intervals for the covariate adjusted AUC given in equation ( 6).

Method 2 -AUC Step-Down (aucd)
This procedure is similar to the Method 1 where the one-sided confidence interval on the AUC is used instead of the p-value for the Mann-Whitney statistic.Each comparison of a specified treatment arm versus the zero-dose (placebo) control is made by determining if the AUC interval from (6) at each discrete covariate level contains 0.5.The procedure is as follows: • STEP 1: Test ( 7) at significance level α.If H 0 is rejected,continue to step 2, otherwise stop.
-If p-value <  * stop and let j = i.

Method 3 -AUC Step-Up (aucu)
The step-up AUC procedure is similar to the step-down AUC procedure.Instead of stepping down sequentially from a comparison between the largest dose and control groups in Method 2, the step-up procedure starts with a comparison between the smallest dose group and the control, and proceeds with comparing the zero-dose control with increasing dose groups.The procedure is as follows: • STEP 1: Test (7) at significance level α.If H 0 is rejected,continue to step 2, otherwise stop.

Method 4 -Adjusted Buros (bur)
The  The first three procedures are alternatives to Shirley (1977) nonparamet-ric procedure for multiple comparisons to a control.The four procedures can be used to identify the MED.
Their performance in identifying the MED is compared to the method given in Jan and Shieh (2004).A description of the procedures found in Jan and Shieh (2004) Let  1 be the smallest index such that   ≤ .
• CONCLUDE: and the MED is 3.In scenario 3, a break is expected between each treatment group and the control group and the MED is 1.It should be noted that the relationship given by < is intended to indicate a statistical significant ordering.However, in some cases statistical significance at the desired breaks for each covariate levels is not realized.The results for the multiple comparisons to control are summarized in Figure 1 and for the identification of the minimum effective dose in Figure 2.
The simulation results for the multiple comparisons to the zero-dose control are given in Figure 1.In scenario 1, when there are no differences between any of the treatment arms and the control, each of the four procedures identified false breaks in less than 5% of the trials.For scenario 2 there should be a break between the third treatment group and the control as indicated by the 03 column.At covariate level 1, each of the four procedures correctly identifies the 03 break in 50% of the trials.In the other 50% of the trials, one could not reject the overall JT null hypothesis.When the covariate level is 2, one finds the break between the third treatment and control in about 75% of the trials; whereas in the other 25% of the trials the overall null could not be rejected.When the covariate level is 3, the number of breaks identified is about 90% with Shirley's method preforming the best with a small margin.In scenario 3, the breaks should be found between all three treatments and the control indicated by the 01, 02, and 03 columns.When the covariate level is 1, the step-up procedures (aucu and mwu) perform the best in identifying the smallest break from the control by identifying the 01 break in about 60% of the trials, followed by the step-down AUC procedure (aucd) with 50% of the trials.All three proposed methods outperform Shirley's method which finds the 01 break in about 30% of the trials.The difference between the methods are less pronounced in identification of the 02 and 03 breaks.The 02 break is identified in about 90% of the trials with the step-down AUC method performing the best.Each of the methods identifies the 03 break in 100% of the trials.At covariate levels 2 and 3 the trend is the same as what is seen at level 1.The step-up procedures perform the best in identifying the 01 break, and all three methods outperform Shirley's method.The proposed procedures finds the 01 break about 20% of the times more than Shirley's procedure at each covariate level.The other breaks at covariate levels 2 and 3 are each found in nearly 100% of the trials.
The simulation results for the identification of the MED are given in Figure 2. In scenario 1, there is no minimum effective dose with all treatments being equal to the control at all three covariate levels.The MED is correctly identified as the zero-dose by each of the methods in about 95% of the trials.For scenario 2, the MED is the third treatment group.
At covariate level 1, the MED is correctly identified as 3 in about 45% of the trials.Each method outperformed Shieh and Jan with a margin of about 10%.Recall in this scenario, the global JT null hypothesis is not rejected in 50% of the trials.
The percentage of trials the MED is identified as 3 increases to about 75% at the second covariate level, and to about 85% at the third covariate level.In scenario 3, the MED should be one.When the covariate level is 1, each of the procedures correctly identify the MED as 1 in about 50 60% of the trials where the step-up AUC method performs the best with a slight margin.At the second covariate level, the difference between treatment groups increases with an increase in the covariate effect.The MED is identified as 1 in about 80% of the trials.The percentage of times the MED is identified as 1 increases to about 95% of the times at the third covariate level.As a whole, the proposed methods identify the MED correctly and outperformed the Shieh and Jan method.

Type II Diabetes Mellitus Application
In this section, the proposed methods are illustrated using results from a clinical trial (NCT00749190) for Type 2 diabetes Mellitus as described in Section 2. The objectives of this study were to determine efficacy and safety of Empagliflozin in a Phase 2 trial with 5 increasing dosage levels and a zero-dose control.The proposed methods are used to determine the dosage levels that demonstrated a statistical improvement when compared to the zero-dose control and to identify the MED.
The results from the study are reproduced in Table 1.The summary statistics from the study were used to simulate the data presented in this section.The design for the data generation is similar to that given in Section 6 with an adjustment as described in Appendix A. The boxplots for the simulated data are depicted in Figure 3 where data are adjusted so that a larger response correspond to a more effective treatment.The dosage groups are represented from left to right in increasing order of placebo, 1mg, 5mg, 10mg, 25mg, and 50mg of Empagliflozin.The response, x, represents the negative change from baseline of HbA1c.An analysis of these data has  1 = 1 as the MED.These results are not shown.
In order to better illustrate the proposed methods, simulated data were modified by decreasing the sample size for each group to 20 and increasing the variability within each group.The boxplots of the adjusted simulated data are given in Figure 4.The summary statistics for the adjusted simulated data are given in Table 2.The objective of the study is to determine the dosage levels that demonstrate statistical improvement when compared to the control.A secondary objective      multaneous confidence intervals on the AUC using (6).For each comparison, the true AUC is within two standard errors from the estimate obtained using AUC regression.

Adjusted Buros (bur).
The next step in this procedure is to identify the smallest index, s, such that   ≤ 0.05 where P(W ≥   * | 0 is ture) =   procedure when the MED is expected at a low dose and to use the step-down AUC procedure when the MED is expected at a high dose.
An example of a Type II diabetes dose-response study is included to illustrate the ability of the methods in identifying the MED.In this example, the MED is expected at a low dose level.The step-up AUC procedure is able to identify the MED as the 5mg dose; whereas the step-down AUC procedure identifies the MED as one higher dose of 10mg.
In conclusion, four nonparametric multiple comparison methods to a control and for identifying the minimum effective dose are presented.The pairwise contrasts are defined within for each of K + 1 increasing dose levels.Let Y ij denote the response for treatment i and subject j.When comparing the i th treatment group to the control group, let R sj (i) denote the rank of Y sj observation within the combination of the first i treatment groups with the control group for i = 1, … , K, s = 0, … , i, andj = 1, … , n.Let R s (i) = ∑ R sj (i) n j=1 denote the sum of ranks for the s th dose level.
A pairwise contrast is defined as P i = R i (i) − R 0 (i) for i = 1, … , K.The proposed statistic to compare the i th dose level to the control is defined as where the null variance of P i is given by Var(P i ) = nN i (N i + 1)/6 with N i = (i + 1)n.
In the presence of ties, the null variance is adjusted by replacing N i + 1 with N i + 1 − ∑ t j g j=1 (t j 2 − 1)/[N i (N i − 1)].Let Z = (Z 1 , … , Z K ) ′ .If the global hypothesis hold, then Z~N K (0, R) where R is given by The MED can be found using the step-down closed testing scheme sug-gested by

4 .− 5 .
Repeat the above step until one can no longer reject the new null hypothesis at the / level for the  ℎ comparison.The breaks are located between groups   and   − 1 for i = 1, … M where M is the final comparison while controlling the FWE rate at .
HMarcus et al. (1976).A closed testing scheme strongly controls the familywise error rate (FWE)Marcus et al. (1976), where the FWE is the probability of rejecting at least one true  0Tamhane et al. (1996).Strong control of the FWE is defined as control of the FWE for any combination of true or false  0Hochberg and Tamhane (1987).The four methods are described in the next section.The first method is a simple modification of the Mann-Whitney statistics used in computing the JTS.The next two methods, a step up and a step down version, are obtained directly from the AUC regression model where the direction of the step would be determined by the relative location of the MED in much the same sense as using either the FORWARD or BACKWARD selection procedure in model selection methods.The fourth method is a modification of the procedure given byBuros et al. (2017b) for differences between the treatment groups and the control.

5. 1
Method 1 -MW Step-Up (mwu)This procedure utilizes the relationship between the AUC and Mann-Whitney statistic as suggested byZhang et al. (2011) where a step-up closed testing scheme suggested byTamhane et al. (1996) with a Sidak adjustment is used to control the FWE rate.At each stage in the step-up procedure the Mann-Whitney statistic is used to test for equality of the specified treatment arm versus the zero-dose (placebo) control.Let   denote the median of population i.The procedure is as follows: −)= ̂0(−) −  ( * ) s. e. ( ̂0(−) ).
adjusted Buros method utilizes the Buros et al. (2017a) and Buros et al. (2017b) multiple comparison procedure presented in Section 4. Recall that Odeh (1971) derived an alternative form for the Jonckheere-Terpstra statistic.The individual components of the JTS 1, … , K where   * is the Mann-Whitney statistic for comparing the  ℎ group with a group formed by combining the first ( − 1) groups with the control group.The alternative form for JTS is defined as the sum of the individual Mann-Whitney statistics given by  2 = ∑   *  =1 Buros et al. (2017a) utilizes the Mann-Whitney statistics defined in(18) to identify all possible differences between treatments in a step-up procedure.The MED is the first break identified by the Buros et al. (2017a) method.The procedure is as follows:

Figure 3 :
Figure 3: T2DM simulated treatment groups based on original summary statistics of negative change from baseline in HbA1C.

Figure 4 :
Figure 4: T2DM simulated treatment groups based on adjusted summary statistics of negative change from baseline in HbA1C.
Dodd and Pepe (2003) with an analytic solution for the standard errors of the AUC and Mann-Whitney Zhang et al. (2011) to adjust the Jonckheere Suppose that one has sample data from  + 1 > 2 populations where K is the number of active treatment arms and 0 denotes the zero-dose or non-active placebo arm.Let   , for  <  and  = 1, … ,  denote the Mann-Whitney statistic in (3) for the  ℎ and  ℎ groups.The test of interest becomes

Table 1 :
Summary statistics for the original negative change from baseline of HbA1c at week 12.

Table 2 :
Summary statistics for the adjusted simulated negative HbA1c change from

2 Jan and Shieh's Multiple Comparison (js)
Jan and Shieh (2004) the family-wise error rate, allows for adjustment of discrete covariates, and is competitive with available methods.where  , = [( + 1)/12(1/  + 1/)] 1/2 .The Shirley multiple comparison test compare each treatment level to the zero-dose control group using either equation (26) or (27) and the critical points given byWilliams (1971)B.Jan and Shieh (2004)propose a step-down closed testing procedure based on contrasts of the Kruskal-Wallis test to identify the MED.

Table 4 :
Critical Values for Jan and Shieh Procedure.