Abstract: Nowadays, extensive amounts of data are stored which require the development of specialized methods for data analysis in an understandable way. In medical data analysis many potential factors are usually introduced to determine an outcome response variable. The main objective of variable selection is enhancing the prediction performance of the predictor variables and identifying correctly and parsimoniously the faster and more cost-effective predictors that have an important influence on the response. Various variable selection techniques are used to improve predictability and obtain the “best” model derived from a screening procedure. In our study, we propose a variable subset selection method which extends to the classification case the idea of selecting variables and combines a nonparametric criterion with a likelihood based criterion. In this work, the Area Under the ROC Curve (AUC) criterion is used from another viewpoint in order to determine more directly the important factors. The proposed method revealed a modification (BIC) of the modified Bayesian Information Criterion (mBIC). The comparison of the introduced BIC to existing variable selection methods is performed by some simulating experiments and the Type I and Type II error rates are calculated. Additionally, the proposed method is applied successfully to a high-dimensional Trauma data analysis, and its good predictive properties are confirmed.
Abstract: In this paper we consider clinical trials with two treatments and a non-normally distributed response variable. In addition, we focus on ap plications which include only discrete covariates and their interactions. For such applications, the semi-parametric Area Under the ROC Curve (AUC) regression model proposed by Dodd and Pepe (2003) can be used. However, because a logistic regression procedure is used to obtain parameter estimates and a bootstrapping method is needed for computing parameter standard errors, their method may be cumbersome to implement. In this paper we propose to use a set of AUC estimates to obtain parameter estimates and combine DeLong’s method and the delta method for computing parameter standard errors. Our new method avoids heavy computation associated with the Dodd and Pepe’s method and hence is easy to implement. We conduct simulation studies to show that the two methods yield similar results. Finally, we illustrate our new method using data from urinary incontinence clinical trials.
In this article, we considered the analysis of data with a non-normally distributed response variable. In particular, we extended an existing Area Under the Curve (AUC) regression model that handles only two discrete covariates to a general AUC regression model that can be used to analyze data with unrestricted number of discrete covariates. Comparing with other similar methods which require iterative algorithms and bootstrap procedure, our method involved only closed-form formulae for parameter estimation. Additionally, we also discussed the issue of model identifiability. Our model has broad applicability in clinical trials due to the ease of interpretation on model parameters. We applied our model to analyze a clinical trial evaluating the effects of educational brochures for preventing Fetal Alcohol Spectrum Disorders (FASD). Finally, for a variety of simulation scenarios, our method produced parameter estimates with small biases and confidence intervals with nominal coverage probabilities.