Abstract: We investigate whether the posterior predictive p-value can detect unknown hierarchical structure. We select several common discrepancy measures (i.e., mean, median, standard deviation, and χ2 goodness-of-fit) whose choice is not motivated by knowledge of the hierarchical structure. We show that if we use the entire data set these discrepancy measures do not detect hierarchical structure. However, if we make use of the subpopulation structure many of these discrepancy measures are effective. The use of this technique is illustrated by studying the case where the data come from a two-stage hierarchical regression model while the fitted model does not include this feature.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 409–432
Abstract
We develop a health informatics toolbox that enables timely analysis and evaluation of the timecourse dynamics of a range of infectious disease epidemics. As a case study, we examine the novel coronavirus (COVID-19) epidemic using the publicly available data from the China CDC. This toolbox is built upon a hierarchical epidemiological model in which two observed time series of daily proportions of infected and removed cases are generated from the underlying infection dynamics governed by a Markov Susceptible-Infectious-Removed (SIR) infectious disease process. We extend the SIR model to incorporate various types of time-varying quarantine protocols, including government-level ‘macro’ isolation policies and community-level ‘micro’ social distancing (e.g. self-isolation and self-quarantine) measures. We develop a calibration procedure for underreported infected cases. This toolbox provides forecasts, in both online and offline forms, as well as simulating the overall dynamics of the epidemic. An R software package is made available for the public, and examples on the use of this software are illustrated. Some possible extensions of our novel epidemiological models are discussed.
The generalized gamma model has been used in several applied areas such as engineering, economics and survival analysis. We provide an extension of this model called the transmuted generalized gamma distribution, which includes as special cases some lifetime distributions. The proposed density function can be represented as a mixture of generalized gamma densities. Some mathematical properties of the new model such as the moments, generating function, mean deviations and Bonferroni and Lorenz curves are provided. We estimate the model parameters using maximum likelihood. We prove that the proposed distribution can be a competitive model in lifetime applications by means of a real data set.
Abstract: Different models are used in practice for describing a binary lon gitudinal data. In this paper we consider the joint probability models, the marginal models, and the combined models for describing such data the best. The combined model consists of a joint probability model and a marginal model at two different levels. We present some striking empirical observa tions on the closeness of the estimates and their standard errors for some parameters of the models considered in describing a data from Fitzmaurice and Laird (1993) and consequently giving new insight from this data. We present the data in a complete factorial arrangement with 4 factors at 2 levels. We introduce the concept of “data representing a model completely” and explain “data balance” as well as “chance balance”. We also consider the best model selection problem for describing this data and use the Search Linear Model concepts known in Fractional Factorial Design research (Sri vastava (1975)).
This article discusses the estimation of the Generalized Power Weibull parameters using the maximum product spacing (MPS) method, the maximum likelihood (ML) method and Bayesian estimation method under squares error for loss function. The estimation is done under progressive type-II censored samples and a comparative study among the three methods is made using Monte Carlo Simulation. Markov chain Monte Carlo (MCMC) method has been employed to compute the Bayes estimators of the Generalized Power Weibull distribution. The optimal censoring scheme has been suggested using two different optimality criteria (mean squared of error, Bias and relative efficiency). A real data is used to study the performance of the estimation process under this optimal scheme in practice for illustrative purposes. Finally, we discuss a method of obtaining the optimal censoring scheme.
Abstract: It is always useful to have a confidence interval, along with a single estimate of the parameter of interest. We propose a new algorithm for kernel based interval estimation of a density, with an aim to minimize the coverage error. The bandwidth used in the estimator is chosen by minimizing a bootstrap estimate of the absolute value of the coverage error. The resulting confidence interval seems to perform well, in terms of coverage accuracy and length, especially for large sample size. We illustrate our methodology with data on the eruption durations for the Old Faithful geyser in USA. It seems to be the first bandwidth selector in the literature for kernel based interval estimation of a density.
Abstract: Various statistical models have been proposed to analyze fMRI data. The usual goal is to make inferences about the effects that are related to an external stimulus. The primary focus of this paper is on those statistical methods that enable one to detect ‘significantly activated’ regions of the brain due to event-related stimuli. Most of these methods share a common property, requiring estimation of the hemodynamic response function (HRF) as part of the deterministic component of the statistical model. We propose and investigate a new approach that does not require HRF fits to detect ‘activated’ voxels. We argue that the method not only avoids fitting a specific HRF, but still takes into account that the unknown response is delayed and smeared in time. This method also adapts to differential responses of the BOLD response across different brain regions and experimental sessions. The maximum cross-correlation between the kernel-smoothed stimulus sequence and shifted (lagged) values of the observed response is the proposed test statistic. Using our recommended approach we show through realistic simulations and with real data that we obtain better sensitivity than simple correlation methods using default values of SPM2. The simulation experiment incorporates different HRFs empirically determined from real data. The noise models are also different AR(3) fits and fractional Gaussians estimated from real data. We conclude that our proposed method is more powerful than simple correlation procedures, because of its robustness to variation in the HRF.