Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 550–580
Abstract
The COVID-19 pandemic has triggered explosive activities in searching for cures, including vaccines against the SARS-CoV-2 infection. As of April 30, 2020, there are at least 102 COVID-19 vaccine development programs worldwide, the majority of which are in preclinical development phases, five are in phase I trial, and three are in phase I/II trial. Experts caution against rushing COVID-19 vaccine development, not only because the knowledge about SARS-CoV-2 is lacking (albeit rapidly accumulating), but also because vaccine development is a complex, lengthy process with its own rules and timelines. Clinical trials are critically important in vaccine development, usually starting from small-scale phase I trials and gradually moving to the next phases (II and III) after the primary objectives are met. This paper is intended to provide an overview on design considerations for vaccine clinical trials, with a special focus on COVID-19 vaccine development. Given the current pandemic paradigm and unique features of vaccine development, our recommendations from statistical design perspective for COVID-19 vaccine trials include: (1) novel trial design (e.g., master protocol) to expedite the simultaneous evaluation of multiple candidate vaccines or vaccine doses, (2) human challenge studies to accelerate clinical development, (3) adaptive design strategies (e.g., group sequential designs) for early termination due to futility, efficacy, and/or safety, (4) extensive modeling and simulation to characterize and establish long-term efficacy based on early-phase or short-term follow-up data, (5) safety evaluation as one of the primary focuses throughout all phases of clinical trials, (6) leveraging real-world data and evidence in vaccine trial design and analysis to establish vaccine effectiveness, and (7) global collaboration to form a joint development effort for more efficient use of resource and expertise and data sharing.
Abstract:A new generalized two-parameter Lindley distribution which offers more flexibility in modeling lifetime data is proposed and some of its mathematical properties such as the density function, cumulative distribution function, survival function, hazard rate function, mean residual life function, moment generating function, quantile function, moments, Renyi entropy and stochastic ordering are obtained. The maximum likelihood estimation method was used in estimating the parameters of the proposed distribution and a simulation study was carried out to examine the performance and accuracy of the maximum likelihood estimators of the parameters. Finally, an application of the proposed distribution to a real lifetime data set is presented and its fit was compared with the fit attained by some existing lifetime distributions.
Abstract: This paper discusses the selection of the smoothing parameter necessary to implement a penalized regression using a nonconcave penalty function. The proposed method can be derived from a Bayesian viewpoint, and the resultant smoothing parameter is guaranteed to satisfy the sufficient conditions for the oracle properties of a one-step estimator. The results of simulation and application to some real data sets reveal that our proposal works efficiently, especially for discrete outputs.
Abstract: We analyze the cross-correlation between logarithmic returns of 1108 stocks listed on the Shanghai and Shenzhen Stock Exchange of China in the period 2005 to 2010. The results suggest that the estimated distribution of correlation coefficients is right shifted in the tumble time of Chinese stock market. Due to the large share of maximum eigenvalue, the principal correlation component in Chinese stock market is dominant and other components only have trivial effects on the market condition. The same-signed corresponding vector elements enable us to propose the maximum eigenvalue series as an indicator for collective behavior in the equity market. We provide the evidence that the largest eigenvalue series can be used as an effective indicative parameter to describe the collective behavior of stock returns, which is found to be positively correlated to market volatility. By using time-varying windows, we find the positive correlation diminishes when the market volatility reaches both highest and lowest level. By defining a stability rate, we display that the collective behavior of stocks tends to be more homogeneous in the context of crisis than the regular time. This study has implications for the arising discussions on correlation risk.
Abstract: Recently, He and Zhu (2003) derived an omnibus goodness-of-fit test for linear or nonlinear quantile regression models based on a CUSUM process of the gradient vector, and they suggested using a particular sim ulation method for determining critical values for their test statistic. But despite the speed of modern computers, execution time can be high. One goal in this note is to suggest a slight modification of their method that eliminates the need for simulations among a collection of important and commonly occurring situations. For a broader range of situations, the modi fication can be used to determine a critical value as a function of the sample size (n), the number of predictors (q), and the quantile of interest (γ). This is in contrast to the He and Zhu approach where the critical value is also a function of the observed values of the q predictors. As a partial check on the suggested modification in terms of controlling the Type I error probability, simulations were performed for the same situations considered by He and Zhu, and some additional simulations are reported for a much wider range of situations.
Abstract: The present article discusses and compares multiple testing procedures (MTPs) for controlling the family wise error rate. Machekano and Hubbard (2006) have proposed empirical Bayes approach that is a resampling based multiple testing procedure asymptotically controlling the familywise error rate. In this paper we provide some additional work on their procedure, and we develop resampling based step-down procedure asymptotically controlling the familywise error rate for testing the families of one-sided hypotheses. We apply these procedures for making successive comparisons between the treatment effects under a simple-order assumption. For example, the treatment means may be a sequences of increasing dose levels of a drug. Using simulations, we demonstrate that the proposed step-down procedure is less conservative than the Machekano and Hubbard’s procedure. The application of the procedure is illustrated with an example.
Abstract: This paper aims to propose a suitable statistical model for the age distribution of prostate cancer detection. Descriptive studies suggest the onset of prostate cancer after 37 years of age with maximum diagnosis age at around 70 years. The major deficiency of descriptive studies is that the results cannot be generalized for all types of populations usually having non-identical environmental conditions. The proposition follows by checking the suitability of the model through different statistical tools like Akaike Information Criterion, Kolmogorov Smirnov distance, Bayesian Information Criterion and χ2 statistic. The Maximum likelihood estimate of the parameters of the proposed model along with their asymptotic confidence intervals have been obtained for the considered real data set.
Abstract: The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.
Abstract: Through a series of carefully chosen illustrations from biometry and biomedicine, this note underscores the importance of using appropriate analytical techniques to increase power in statistical modeling and testing. These examples also serve to highlight some of the important recent devel opments in applied statistics of use to practitioner
Abstract: Supervised classifying of biological samples based on genetic information, (e.g., gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate, which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and variable selection methods are com petitive. Particularly, they lead to both compact and interpretable feature sets. Program files to be used with the statistical software R implementing the variable selection approaches presented in this article are available from my homepage: http://b-klaus.de.