Abstract: Of interest in this paper is the development of a model that uses inverse sampling of binary data that is subject to false-positive misclassification in an effort to estimate a proportion. From this model, both the proportion of success and false positive misclassification rate may be estimated. Also, three first-order likelihood based confidence intervals for the proportion of success are mathematically derived and studied via a Monte Carlo simulation. The simulation results indicate that the score and likelihood ratio intervals are generally preferable over the Wald interval. Lastly, the model is applied to a medical data set.
Abstract: Quick identification of severe injury crashes can help Emergency Medical Services (EMS) better allocate their scarce resources to improve the survival of severely injured crash victims by providing them with a fast and timely response. Data broadcast from a vehicle’s Event Data Recorder (EDR) provide an opportunity to capture crash information and send them to EMS near real-time. A key feature of EDR data is a longitudinal measure of crash deceleration. We used functional data analysis (FDA) to ascertain key features of the deceleration trajectories (absolute integral, absolute in- tegral of its slope, and residual variance) to develop and verify a risk predic- tion model for serious (AIS 3+) injuries. We used data from the 2002-2012 EDR reports and the National Highway and National Automotive Sampling System (NASS) Crashworthiness Data System (CDS) datasets available on the National Transportation Safety Administration (NHTSA) website. We consider a variety of approaches to model deceleration data, including non- penalized and penalized splines and a variable selection method, ultimately obtaining a model with a weighted AUC of 0.93. A novel feature of our approach is the use of residual variance as a measure of predictive risk. Our model can be viewed as an important first step towards developing a real- time prediction model capable of predicting the risk of severe injury in any motor vehicle crash.
Abstract:Air pollution shows itself as a serious problem in big cities in Turkey, especially for winter seasons. Particulate atmospheric pollution in urban areas is considered to have significant impact on human health. Therefore, the ability to make accurate predictions of particulate ambient concentrations is important to improve public awareness and air quality management. Ambient PM10 (i.e particulate diameter less than 10um in size) pollution has negative impacts on human health and it is influenced by meteorological conditions. In this study, partial least squares regression, principal component regression, ridge regression and multiple linear regression methods are compared in modeling and predicting daily mean PM10 concentrations on the base of various meteorological parameters obtained for the city of Ankara, in Turkey. The analysed period is February 2007. The results show that while multiple linear regression and ridge regression yield somewhat better results for fitting to this dataset, principal component regression and partial least squares regression are better than both of them in terms of prediction of PM10 values for future datasets. In addition, partial least squares regression is the remarkable method in terms of predictive ability as it has a close performance with principal component regression even with less number of factors.
Abstract:In this paper we propose a new five parameter bivariate distribution obtained by taking geometric maximum of generalized exponential distributions. Several properties of this new bivariate distribution and its marginals have been investigated. It is observed that the maximum likelihood estimators of the unknown parameters cannot be obtained in closed form. Five non-linear equations need to be solved simultaneously to compute the maximum likelihood estimators of the unknown parameters. We propose to use the EM algorithm to compute the maximum likelihood estimators of the unknown parameters, and it is computationally quite tractable. We performed extensive simulations study to see the effectiveness of the proposed algorithm, and the performance is quite satisfactory. We analyze one data set for illustrative purposes. Finally we propose some open problems.
Abstract: In recent years, many modifications of the Weibull distribution have been proposed. Some of these modifications have a large number of parameters and so their real benefits over simpler modifications are questionable. Here, we use two data sets with modified unimodal (unimodal followed by increasing) hazard function for comparing the exponentiated Weibull and generalized modified Weibull distributions. We find no evidence that the generalized modified Weibull distribution can provide a better fit than the exponentiated Weibull distribution for data sets exhibiting the modified unimodal hazard function.In a related issue, we consider Carrasco et al. (2008), a widely cited paper, proposing the generalized modified Weibull distribution, and illustrating two real data applications. We point out that some of the results in both real data applications in Carrasco et al. (2008) 1 are incorrect.
Abstract: A new distribution, called Odds Generalized Exponential-Exponential distribution (OGEED) is proposed for modeling lifetime data. A comprehensive account of the mathematical properties of the new distribution including estimation and simulation issues is presented. A data set has been analyzed to illustrate its applicability.
Abstract: In this paper, we introduce a Bayesian analysis for bivariate geometric distributions applied to lifetime data in the presence of covariates, censored data and cure fraction using Markov Chain Monte Carlo (MCMC) methods. We show that the use of a discrete bivariate geometric distribution could bring us some computational advantages when compared to standard existing bivariate exponential lifetime distributions introduced in the literature assuming continuous lifetime data as for example, the exponential Block and Basu bivariate distribution. Posterior summaries of interest are obtained using the popular OpenBUGS software. A numerical illustration is introduced considering a medical data set related to the analysis of a diabetic retinopathy data set.
Abstract: This paper extends the analysis of the bivariate Seemingly Unrelated (SUR) Tobit by modeling its nonlinear dependence structure through copula and assuming non-normal marginal error distributions. For model estimation, the use of copula methods enables the use of the (classical) Inference Function for Margins (IFM) method by Joe and Xu (1996), which is more computationally attractive (feasible) than the full maximum likelihood approach. However, our simulation study shows that the IFM method provides a biased estimate of the copula parameter in the presence of censored observations in both margins. In order to obtain an unbiased estimate of the copula association parameter, we propose/develop a modified version of the IFM method, which we refer to as Inference Function for Augmented Margins (IFAM). Since the usual asymptotic approach, that is the computation of the asymptotic covariance matrix of the parameter estimates, is troublesome, we propose the use of resampling procedures (bootstrap methods) to obtain confidence intervals for the copula-based SUR Tobit model parameters. The satisfactory results from the simulation and empirical studies indicate the adequate performance of our proposed model and methods. We illustrate our procedure using bivariate data on consumption of salad dressings and lettuce by U.S. individuals.
Abstract:In clinical studies, subjects or patients might be exposed to a succession of diagnostic tests or medication over time and interest is on determining whether there is progressive remission of conditions, disease or symptoms that have measured collectively as quality of life or outcome scores. In addition, subjects or study participants may be required, perhaps early in an experiment, to improve significantly in their performance rates at the current trial relative to an immediately preceding trial, otherwise the decision of withdrawal or dropping out is ineviTable. The common research interest would then be to determine some critical minimum marginal success rate to guide the management in decision making for implementing certain policies. Success rates lower than the minimum expected value would indicate a need for some remedial actions. In this article, a method of estimating these rates is proposed assuming the requirement is at the second trial of any particular study. Pairwise comparisons of proportions of success or failure by subjects is considered in repeated outcome measure situation to determine which subject or combinations is responsible for the rejection of the null hypothesis. The proposed method is illustrated with the help of a dataset on palliative care outcome scores (POS) of cancer patients.
Abstact:The problem of estimating lifetime distribution parameters under general progressive censoring originated in the context of reliability. But traditionally it is assumed that the available data from this censoring scheme are performed in exact numbers. However, in many life testing and reliability studies, it is not possible to obtain the measurements of a statistical experiment exactly, but is possible to classify them into fuzzy sets. This paper deals with the estimation of lifetime distribution parameters under general progressive Type-II censoring scheme when the lifetime observations are reported by means of fuzzy numbers. A new method is proposed to determine the maximum likelihood estimates of the parameters of interest. The methodology is illustrated with two popular models in lifetime analysis, the Rayleigh and Lognormal lifetime distributions.