Inference for Semiparametric AUC Regression Models with Discrete Covariates

In this paper we consider clinical trials with two treatments and a non-normally distributed response variable. In addition, we focus on applications which include only discrete covariates and their interactions. For such applications, the semi-parametric Area Under the ROC Curve (AUC) regression model proposed by Dodd and Pepe (2003) can be used. However, because a logistic regression procedure is used to obtain parameter estimates and a bootstrapping method is needed for computing parameter standard errors, their method may be cumbersome to implement. In this paper we propose to use a set of AUC estimates to obtain parameter estimates and combine DeLong’s method and the delta method for computing parameter standard errors. Our new method avoids heavy computation associated with the Dodd and Pepe’s method and hence is easy to implement. We conduct simulation studies to show that the two methods yield similar results. Finally, we illustrate our new method using data from urinary incontinence clinical trials.


Introduction
The Wilcoxon-Mann-Whitney test is a widely used nonparametric method for comparing two treatments in clinical trials.In the presence of a discrete confounding stratum effect, the van Elteren (vE) test (van Elteren, 1960) is used to adjust for the stratum effect.However, the vE test does not handle the interaction between treatment and the stratum effect.In this case, Dodd and Pepe (2003) proposed an area under the curve (AUC) regression model which can test the interaction.Their method can also be applied to models with both discrete and continuous covariates.
The AUC regression model utilizes concepts relevant to the Receiver Operating Characteristic (ROC) curve.The ROC curve is a widely used statistical tool for assessing the performance of a binary classifier with continuous or ordinal variables.Its use has gained increased attention in various biostatistics areas such as evaluating diagnostic tests, finding potential biomarkers, or analyzing controlled clinical trials.The use of this statistical method has been extensively developed since 1990's.Pepe (2003) and Zhou et al. (2002) provide excellent review on the ROC and its use.
A useful application of ROC curve is in diagnostic testing when one has a continuous variable Y , which will be used to classify the subjects into either diseased (D) or non-diseased ( D) groups according to some classification rule: Y > c for the threshold c.The ROC(c) curve is the function given by plotting P (Y > c | D) vs P (Y > c | D) in a square from vertices (0, 0) to (1, 1) for all possible thresholds, which displays how the true positive rate (TPR) is changed as false positive rate (FPR) and provides a visible inspection of the accuracy of the diagnostic test to make decision about the optimal threshold for the output relying on the requirement for the relative importance of sensitivity and specificity in the application (Dodd and Pepe, 2003).In the two most extreme situations, if the distribution of Y in the diseased group is exactly overlapped with the distribution of Y in the non-diseased group, the ROC curve will be a diagonal line from vertices (0, 0) to (1, 1), which means that the test with output Y is same for FPR and TPR for all thresholds and is useless in classification; if the distributions of Y in the two groups are totally separated from each other, the ROC curve will be a curve passing through the vertex (0, 1), which means that the test can easily select the threshold to guarantee the best classification in sensitivity and specificity.
The commonly used summary index for ROC curve is the area under the ROC curve (AUC).It can be shown that AU C = P (Y D > Y D).Therefore, in the worst case, the AUC is 0.5, that is, the ability to classify a subject into a right group for this test is no more than by chance.While in the perfect case, the AUC equals to 1, which represents that the probability of correct classification for a proper threshold is 1.
The estimated AUC can be derived from the Mann-Whitney U statistic for testing the equality of two distributions.Based on this property Dodd and Pepe (2003) proposed an AUC regression model for data with a non-normally distributed response variable which can adjust for continuous and discrete covariates.In the model, the response variable is a cross-correlated bernoulli variable.Because the usual standard errors derived from a logistic regression model are incorrect, they proposed to use bootstrapping method to estimate the variance of non-parametric AUC and the model parameters.Because a logistic regression procedure is used to obtain parameter estimates and a bootstrapping method is needed for computing parameter standard errors, their method may be cumbersome to implement.
In practice, there are many applications which involves only discrete covariates and their interactions.Such applications include testing the interaction between treatment and subgroup variable in typical subgroup analysis in clinical trials.For such applications, we aim to alleviate the computation burden associated with Dodd and Pepe.We propose to use a set of AUC estimates to obtain parameter estimates and combine Delong's method and the delta method for computing parameter standard errors.
The remainder of this paper is organized as follows.The AUC regression model by Dodd and Pepe is introduced in Section 2. DeLong's method (DeLong, DeLong and Clarke, 1998) for standard error estimation of the unadjusted AUC is described in Section 3. Our newly proposed method for computing parameter estimates and standard errors is developed in Section 4. Simulation studies comparing Dodd and Pepe's method and our method are presented in Section 5. A real data example is shown in Section 6 to illustrate our new method.Some discussions can be found in Section 7.

Semi-Parametric AUC Regression Model
In this section we review the semi-parametric AUC regression model proposed by Dodd and Pepe.Assume that one needs to adjust the AUC for a covariate X, the covariate-specific AUC can be expressed as where Y D i is the i th response in diseased (or treatment) group with covariate value X i (i = 1, • • • , N D ) and Y D j is the j th response in non-diseased (or control) group with covariate value X j (j = 1, • • • , N D).Often one is interested in estimating the AUC at a specified covariate level, i.e.P Y D i > Y D j | X i = X j = X .Dodd and Pepe applied this model to the Generalized Linear Model (GLM) framework which allows one to model the AUC with covariates, in which case their model can be written as, where g is a monotone link function such as the probit or logit link, X ij is a vector function of X i and X j , and β is a vector fixed and unknown parameters to be estimated.
Note that Thus, for estimating the parameters in the model, Dodd and Pepe proposed the use of the logistic regression model where the response variable is a Bernoulli variable I Y D i > Y D j .Dodd and Pepe demonstrated that the estimates of parameters are found as solution to the usual score equations given by where ).Therefore, one obtains this estimate using standard statistical software, such as SAS PROC GENMOD or PROC LOGISTIC.However, the usual standard errors of the estimates can not be used since the binary variables I ij in equation (2.2) are not independent.Dodd and Pepe recommended the bootstrap for obtaining the needed standard errors.Their procedure can be summarized in the following steps in the presence of covariates: 1. Stratify the range of the covariate variable as S strata.If the covariate is discrete, each level of the covariate becomes a stratum.While for continuous variable, it is impossible to make each covariate value a stratum.Cluster the adjacent values into a stratum to ensure enough fitting data in each stratum.
2. For discrete covariate, within each stratum s (s = 1, • • • , S), generate all of the 0 or 1 indicator data as In this case, the model is g(AU C ij ) = β 0 + β s .
3. If there is a continuous covariate in addition to the discrete covariate, other parameters should be included in the model in order to fit the data obtained by comparing two responses from different covariate values, such as I(Y D is > Y D js ) (the i th and j th outputs are from different covariate value but in the same stratum s).The model can be expressed as 4. Use the standard logistic regression procedure to fit the data with strata as covariate to obtain parameter estimates.
5. Bootstrap the original data within each stratum to compute the parameter standard errors.
The above procedure involves bootstrap so it is difficult to implement.For models with only discrete covariates and their interactions, we aim to simplify the above model fitting procedure.We propose a new algorithm which involves computing variance of a non-parametric AUC estimate which was first proposed by the DeLong, DeLong and Clarke (1998).This algorithm is described in detail in the next section.
3. DeLong's Method for Computing the Variance of Unadjusted AUC Several approaches have been proposed to compute the variance of the unadjusted AUC.See Hanley and Hajian-Tilakin (1997) for a review.Among them, the method provided by DeLong, DeLong, and Clarke ( 1988) is most widely used and has the plain analytical structure, which completely relies on the Mann-Whitney statistic.Bamber (1975) provided the equivalence between Mann-Whitney two-sample rank sum statistic and the empirical estimate of AUC.When the outputs in disease group and non-diseased group have ties, the nonparametric AUC can be expressed as: where The variance for (3.1) by DeLong's method involves in two components which are defined as and V D i is the percentage of Y D's that Y D i is bigger or equal to.It measures the relative rank of the i th output of diseased group in the non-diseased group (i.e. its relative percentile when Y D i is put into the non-diseased group).The explanation for V D j is similar.An estimate of the variance of the nonparametric AUC is where s 2 D and s 2 D are the sample variances of

A New Algorithm for Estimating Parameters and Standard Errors for the AUC Regression Model
Our new algorithm is best described using an example.Let the covariate be X with 2 levels specified as 0 and 1.The logistic regression model with link function g can be expressed as where AU C 0 and AU C 1 are computed using (3.1) and subseting observations with X = 0 and X = 1, respectively.We see that the parameter estimates are explicit functions of the AUC estimates at each stratum.Therefore, our new method avoids the logistic regression procedure necessary by Dodd and Pepe's method.
In the following we describe how to compute the standard errors for these parameter estimates β0 and β1 .Because the observations from two strata are independent, the variance of β0 and β1 are, respectively, and The above variance can be estimated by combining the delta method and (3.2).In what follows we provide the variance estimates (standard errors) of the parameter estimates for logit and probit links.When g is logit function, .
Let f (•) and Φ(•) be the PDF and the CDF of the standard normal distribution, respectively.When g is probit function, .
The above procedure can be readily generalized to models with more than one discrete covariates and their interactions.First, computing AUC estimates using (3.1) for each stratum resulting from all possible combinations of the covariates.Then, by equating the model parameters to these AUC estimates through the link function, we can solve for parameter estimates.Finally, by combining (3.2) and the delta method we obtain standard error estimates.In spirit, this approach is very similar to the analysis of variance model in the normal-theory linear models.Therefore, we term our method as nonparametric analysis of variance method (NAOV).

Simulation Study
We conduct simulation studies to compare the Dodd and Pepe method and our new NAOV method for estimating model parameters and their standard errors.In addition, we compare the two methods in terms of coverage probabilities of the 95% confidence intervals for each parameter.Data are generated from models with probit link and logit link, respectively.For each link function, we illustrate the model using a discrete covariate with 3 strata.

Probit Link
When the link function is probit, data is generated such that , where µ 1i = δ 0 + δ 2i and µ 2i = (δ 0 + δ 1 ) + (δ 2i + δ 3i ).The parameters of the model with probit link can be derived based on: (5.1) where We choose δ 0 = 0, δ 1 = 0.15, δ 2i = 0, δ 32 = 0.5, δ 33 = 1, σ 1 = 1 and σ 2 = 1.2 to compare the two methods.The simulation size is 1,000 and the number of bootstrap samples is 200.Table 1 gives the comparison results for n = 30 and n = 100.We see that both methods produce almost identical parameter estimates and very similar standard error estimates.In addition, the coverage probabilities of the 95% CIs are close to the nominal levels for both methods.
The parameters in the model can be derived based on: where Balakrishnan and Nevzorov, 2003).
When i = 1, let δ 2i = 0, then where β 0 = δ 0 and We choose δ 0 = 0.15, δ 1i = 0, δ 22 = 0.5 and δ 23 = 1.The number of bootstrap samples is 200.Table 2 gives the comparison results for n = 30 and n = 100 in each group in each level, respectively.We see that both methods produce almost identical parameter estimates and very similar standard error estimates.In addition, the coverage probabilities of the 95% CIs are close to the nominal levels for both methods.

Real Data Example
In this section we illustrate our new NAOV method using real data from clinical trials.The purpose of the clinical trials is to investigate the efficacy of an active drug to treat stress urinary incontinence in women by comparing with placebo.The data are provided by a pharmaceutical company and not publicly available.The response variable is the per cent (relative) reduction in incontinence episode frequency (PIEF) from baseline to the last postbaseline visit.There is a variable reflecting the disease severity at the baseline (BIEF) with 4 strata.The variable BIEF takes values from 1 (mild) to 4 (severe).Also, the consistency of treatment effect across another covariate of interest HORM50 needs to be considered.The variable HORM50 indicates whether a woman has hormone replacement (yes or no).In summary, we are interested in analyzing the joint predictive and prognostic effects of BIEF and HORM50.For this analysis a total of 4940 subjects with nonmissing PIEF, BIEF, and HORM50 are included.
Table 3 shows the AUC estimates, 95% CI of the AUC, and the expression of AUC in terms of the model parameters by the combination levels of the two covariates.By equating the third column and the fifth column of Table 3, we can solve for estimates for the model parameters β 0 , • • • , β 7 .These parameter estimates are presented in the third column of Table 4.The standard errors and 95% CIs for the model parameters are computed using the procedure described in Section 4 and displayed in the fourth and fifth columns of Table 4.By examining these CIs we see that the main effect of HORM50 and the interaction effects are not significant at .05 level, indicating there are no interactions between BIEF and HORM 50 and HORM50 is not predictive of treatment effect.The CI for β 3 does not include 0, indicating BIEF is predictive of treatment effect.
We fit another model with only BIEF main effect and the results are shown in Table 5.The CIs indicate that β 1 and β 2 are significantly different from 0 at .05 level, which means BIEF is a predictive of treatment effect.Note that β 3 is not significantly different from 0 at .05 level, possibly due to thet fact that only a small percentage of patients are in that very severe stratum.Parameters β 1 to β 3 have meaningful interpretations.For example, e β 1 is the odds ratio that the active drug is better than placebo in the second stratum of BIEF compared with that in the first stratum of BIEF.In this case, that odds ratio is estimated as e β1 = 1.4.

Discussion
In this article, we developed an analytical NAOV method to computing parameter estimates and standard errors for the semi-parametric AUC regression model with only discrete covariates.The NAOV method involves only straightforward computations and is much easier to implement than the Dodd and Pepe method.Simulation studies have shown that both methods yield similar results.
The NAOV method involves computing AUC estimates and standard errors at each cell of combination levels of all the covariates.Therefore, it requires a reasonable amount of observations at each cell.When there are empty cells, a saturated model can not be fitted and so some terms need to be dropped from the model.This is the same problem suffered by linear models and the Dodd and Pepe's method.
Note that in Tables 1 and 2, we set the true parameters values to be different than zero.However, even when the sample size is 100, some of the 95% confidence intervals of these parameters contain zero.This may yield a conclusion that the parameters are not significantly away from zero.We remark that this is a phenomenon we often encounter when the sample size is not sufficiently large.So if we increase the sample size, then these confidence intervals will become narrower and exclude zero.
Note that Tables 1 and 2 are based on different models, so it may not be appropriate to compare results between Table 1 and 2. In our experience, for the same model, using probit or logit links gives very similar results.This is also true from the literature for generalized linear models.In general, researchers tend to use logit links because it is easier to interpret the parameters.
Although Dodd and Pepe's method can be used for models with both discrete and continuous covariates, their method involves somewhat arbitrary grouping of observations in the presence of continuous covariates.For future research, we intend to generalize the NAOV method to models with both discrete and continuous covariates.

Table 1 :
Comparison of parameter estimates, standard errors, and 95% CIs for the model with probit link with 30 or 100 samples each group each level.Results represent 1000 realizations of the model and 200 bootstrap samples each

Table 2 :
Comparison of parameter estimates, standard errors, and 95% CIs for the model with logit link with 30 or 100 samples each group each level.Results represent 1000 realizations of the model and 200 bootstrap samples each

Table 3 :
Estimates of AUC for the example

Table 4 :
Parameter estimates and 95% CI by NOAV with interactions for the example