We introduce the stepp packages for R and Stata that implement the subpopulation treatment effect pattern plot (STEPP) method. STEPP is a nonparametric graphical tool aimed at examining possible heterogeneous treatment effects in subpopulations defined on a continuous covariate or composite score. More pecifically, STEPP considers overlapping subpopulations defined with respect to a continuous covariate (or risk index) and it estimates a treatment effect for each subpopulation. It also produces confidence regions and tests for treatment effect heterogeneity among the subpopulations. The original method has been extended in different directions such as different survival contexts, outcome types, or more efficient procedures for identifying the overlapping subpopulations. In this paper, we also introduce a novel method to determine the number of subjects within the subpopulations by minimizing the variability of the sizes of the subpopulations generated by a specific parameter combination. We illustrate the packages using both synthetic data and publicly available data sets. The most intensive computations in R are implemented in Fortran, while the Stata version exploits the powerful Mata language.
Abstract: Nowadays, extensive amounts of data are stored which require the development of specialized methods for data analysis in an understandable way. In medical data analysis many potential factors are usually introduced to determine an outcome response variable. The main objective of variable selection is enhancing the prediction performance of the predictor variables and identifying correctly and parsimoniously the faster and more cost-effective predictors that have an important influence on the response. Various variable selection techniques are used to improve predictability and obtain the “best” model derived from a screening procedure. In our study, we propose a variable subset selection method which extends to the classification case the idea of selecting variables and combines a nonparametric criterion with a likelihood based criterion. In this work, the Area Under the ROC Curve (AUC) criterion is used from another viewpoint in order to determine more directly the important factors. The proposed method revealed a modification (BIC) of the modified Bayesian Information Criterion (mBIC). The comparison of the introduced BIC to existing variable selection methods is performed by some simulating experiments and the Type I and Type II error rates are calculated. Additionally, the proposed method is applied successfully to a high-dimensional Trauma data analysis, and its good predictive properties are confirmed.
Abstract: In this paper we consider clinical trials with two treatments and a non-normally distributed response variable. In addition, we focus on ap plications which include only discrete covariates and their interactions. For such applications, the semi-parametric Area Under the ROC Curve (AUC) regression model proposed by Dodd and Pepe (2003) can be used. However, because a logistic regression procedure is used to obtain parameter estimates and a bootstrapping method is needed for computing parameter standard errors, their method may be cumbersome to implement. In this paper we propose to use a set of AUC estimates to obtain parameter estimates and combine DeLong’s method and the delta method for computing parameter standard errors. Our new method avoids heavy computation associated with the Dodd and Pepe’s method and hence is easy to implement. We conduct simulation studies to show that the two methods yield similar results. Finally, we illustrate our new method using data from urinary incontinence clinical trials.
Abstract: Various statistical models have been proposed to analyze fMRI data. The usual goal is to make inferences about the effects that are related to an external stimulus. The primary focus of this paper is on those statistical methods that enable one to detect ‘significantly activated’ regions of the brain due to event-related stimuli. Most of these methods share a common property, requiring estimation of the hemodynamic response function (HRF) as part of the deterministic component of the statistical model. We propose and investigate a new approach that does not require HRF fits to detect ‘activated’ voxels. We argue that the method not only avoids fitting a specific HRF, but still takes into account that the unknown response is delayed and smeared in time. This method also adapts to differential responses of the BOLD response across different brain regions and experimental sessions. The maximum cross-correlation between the kernel-smoothed stimulus sequence and shifted (lagged) values of the observed response is the proposed test statistic. Using our recommended approach we show through realistic simulations and with real data that we obtain better sensitivity than simple correlation methods using default values of SPM2. The simulation experiment incorporates different HRFs empirically determined from real data. The noise models are also different AR(3) fits and fractional Gaussians estimated from real data. We conclude that our proposed method is more powerful than simple correlation procedures, because of its robustness to variation in the HRF.