The unknown or unobservable risk factors in the survival analysis cause heterogeneity between individuals. Frailty models are used in the survival analysis to account for the unobserved heterogeneity in individual risks to disease and death. To analyze the bivariate data on related survival times, the shared frailty models were suggested. The most common shared frailty model is a model in which frailty act multiplicatively on the hazard function. In this paper, we introduce the shared inverse Gaussian frailty model with the reversed hazard rate and the generalized inverted exponential distribution and the generalized exponential distribution as baseline distributions. We introduce the Bayesian estimation procedure using Markov Chain Monte Carlo(MCMC) technique to estimate the parameters involved in the models. We present a simulation study to compare the true values of the parameters with the estimated values. Also we apply the proposed models to the Australian twin data set and a better model is suggested.
Abstract:This paper has been proposed to estimate the parameters of Markov based logistic model by Bayesian approach for analyzing longitudinal binary data. In Bayesian estimation selection of appropriate loss function and prior density are most important ingredient. Symmetric and asymmetric loss functions have been used for estimating parameters of two state Markov model and better performance has been observed by Bayesian estimate under squared error loss function.
Abstract: Using financial ratio data from 2006 and 2007, this study uses a three-fold cross validation scheme to compare the classification and pre diction of bankrupt firms by robust logistic regression with the Bianco and Yohai (BY) estimator versus maximum likelihood (ML) logistic regression. With both the 2006 and 2007 data, BY robust logistic regression improves both the classification of bankrupt firms in the training set and the prediction of bankrupt firms in the testing set. In an out of sample test, the BY robust logistic regression correctly predicts bankruptcy for Lehman Brothers; however, the ML logistic regression never predicts bankruptcy for Lehman Brothers with either the 2006 or 2007 data. Our analysis indicates that if the BY robust logistic regression significantly changes the estimated regression coefficients from ML logistic regression, then the BY robust logistic regression method can significantly improve the classification and prediction of bankrupt firms. At worst, the BY robust logistic regression makes no changes in the estimated regression coefficients and has the same classification and prediction results as ML logistic regression. This is strong evidence that BY robust logistic regression should be used as a robustness check on ML logistic regression, and if a difference exists, then BY robust logistic regression should be used as the primary classifier.
Abstract: The motivation behind this paper is to investigate the use of Softmax model for classification. We show that Softmax model is a nonlinear generalization for the logistic discrimination, that can approximate the posterior probabilities of classes where other Artificial neural network (ANN) models don't have this ability. We show that Softmax model has more flexibility than logistic discrimination in terms of correct classification. To show the performance of Softmax model a medical data set on thyroid gland state is used. The result is that Softmax model may suffer from overfitting.
Abstract: Background: Brass developed a procedure for converting proportions dead of children ever born reported by women in childbearing ages into estimates of the probability of dying before attaining certain exact childhood ages. The method has become very popular in less developed countries where direct mortality estimation is not possible due to incomplete death registration. However, the estimates of q(x), the probability of dying before age x, obtained by Trussell’s variant of Brass method are sometimes unrealistic, q(x) being not monotonically increasing for increasing x. Method: State level child mortality estimates obtained by Trussell’s variant of Brass method from 1991 and 2001 Indian census data were made monotonically increasing by logit smoothing. Using two of the smoothed child mortality estimates, infant mortality estimate is obtained by fitting a two parameter Weibull survival function. Results: It has been found that in many states and union territories infant mortality rates have increased between 1991 and 2001. Cross checking with the 1991 and 2001 census data on the increase/decrease of percentage of children died establishes the reliability of the estimates. Conclusion: We have reason to suspect the trend of declining infant mortality as shown by the different agencies and researchers.
Abstract: Modeling the Internet has been an active research in the past ten years. From the “rich get richer” behavior to the “winners don’t take all” property, the models depend on the explicit attributes described in the net work. This paper discusses the modeling of non-scale-free network subsets like bulletin forums. A new evolution mechanism, driven by some implicit at tributes “hidden” in the network, leads to a slightly increase in the page sizes of front rank forum. Due to the complication of quantifying these implicit attributes, two potential models are suggested. The first model introduces a content ratio and it is patched to the lognormal model, while the second model truncates the data into groups according to their regional specialties and data within groups are fitted by power-law models. A Taiwan-based bulletin forum is used for illustration and data are fitted via four models. Statistical Diagnostics show that two suggested models perform better than the traditional models in data fitting and predictions. In particular, the second model performs better than the first model in general.
Abstract: Registrations in epidemiological studies suffer from incomplete ness, thus a general consensus is to use capture-recapture models. Inclusion of covariates which relate to the capture probabilities has been shown to improve the estimate of population size. The covariates used have to be measured by all the registrations. In this article, we show how multiple im putation can be used in the capture-recapture problem when some lists do not measure some of the covariates or alternatively if some covariates are unobserved for some individuals. The approach is then applied to data on neural tube defects from the Netherlands
Abstract: This paper considers the statistical problems of editing and imputing data of multiple time series generated by repetitive surveys. The case under study is that of the Survey of Cattle Slaughter in Mexico’s Municipal Abattoirs. The proposed procedure consists of two phases; firstly the data of each abattoir are edited to correct them for gross inconsistencies. Secondly, the missing data are imputed by means of restricted forecasting. This method uses all the historical and current information available for the abattoir, as well as multiple time series models from which efficient estimates of the missing data are obtained. Some empirical examples are shown to illustrate the usefulness of the method in practice.
This paper presents an empirical study of a recently compiled workforce analytics data-set modeling employment outcomes of Engineering students. The contributions reported in this paper won the data challenge of the ACM IKDD 2016 Conference on Data Science. Two problems are addressed - regression using heterogeneous information types and the extraction of insights/trends from data to make recommendations; these goals are supported by a range of visualizations. Whereas the data-set is specific to a nation, the underlying techniques and visualization methods are generally applicable. Gaussian processes are proposed to model and predict salary as a function of heterogeneous independent attributes. Key novelties the GP approach brings to the domain of understanding workforce analytics are (a) statistically sound notion of uncertainty of prediction that is data dependent, (b) automatic relevance determination of various independent attributes to the dependent variable (salary),(c) seamless incorporation of both numeric and string attributes within the same regression frame- work without dichotomization; specifically, string attributes include single-word or categorical (e.g. gender) or nominal attributes (e.g. college tier) or multi-word attributes (e.g. specialization) and (d) treatment of all data as being correlated towards making predictions. Insights from both predictive modeling approaches and data analysis were used to suggest factors, that if improved, might lead to better starting salaries for Engineering students. A range of visualization techniques were used to extract key employment patterns from the data.