Abstract: A randomly truncated sample appears when the independent variables T and L are observable if L < T. The truncated version Kaplan-Meier estimator is known to be the standard estimation method for the marginal distribution of T or L. The inverse probability weighted (IPW) estimator was suggested as an alternative and its agreement to the truncated version Kaplan-Meier estimator has been proved. This paper centers on the weak convergence of IPW estimators and variance decomposition. The paper shows that the asymptotic variance of an IPW estimator can be decom posed into two sources. The variation for the IPW estimator using known weight functions is the primary source, and the variation due to estimated weights should be included as well. Variance decomposition establishes the connection between a truncated sample and a biased sample with know prob abilities of selection. A simulation study was conducted to investigate the practical performance of the proposed variance estimators, as well as the relative magnitude of two sources of variation for various truncation rates. A blood transfusion data set is analyzed to illustrate the nonparametric inference discussed in the paper.
A graphical tool for choosing the number of nodes for a neural network is introduced. The idea is to fit the neural network with a range of numbers of nodes at first, and then generate a jump plot using a transformation of the mean square errors of the resulting residuals. A theorem is proven to show that the jump plot will select several candidate numbers of nodes among which one is the true number of nodes. Then a single node only test, which has been theoretically justified, is used to rule out erroneous candidates. The method has a sound theoretical background, yields good results on simulated datasets, and shows wide applicability to datasets from real research.
In this paper, we advance new families of bivariate copulas constructed by distributional distortions of existing bivariate copulas. The distortions under consideration are based on the unit gamma distribution of two forms. When the initial copula is Archimedean, the induced copula is also Archimedean under the admissible parameter space. Properties such as Kendall’s tau coefficient, tail dependence coefficients and tail orders for the new families of copulas are derived. An empirical application to economic indicator data is presented.
Abstract:Air pollution shows itself as a serious problem in big cities in Turkey, especially for winter seasons. Particulate atmospheric pollution in urban areas is considered to have significant impact on human health. Therefore, the ability to make accurate predictions of particulate ambient concentrations is important to improve public awareness and air quality management. Ambient PM10 (i.e particulate diameter less than 10um in size) pollution has negative impacts on human health and it is influenced by meteorological conditions. In this study, partial least squares regression, principal component regression, ridge regression and multiple linear regression methods are compared in modeling and predicting daily mean PM10 concentrations on the base of various meteorological parameters obtained for the city of Ankara, in Turkey. The analysed period is February 2007. The results show that while multiple linear regression and ridge regression yield somewhat better results for fitting to this dataset, principal component regression and partial least squares regression are better than both of them in terms of prediction of PM10 values for future datasets. In addition, partial least squares regression is the remarkable method in terms of predictive ability as it has a close performance with principal component regression even with less number of factors.
Abstract: A new extension of the generalized gamma distribution with six parameter called the Kummer beta generalized gamma distribution is introduced and studied. It contains at least 28 special models such as the beta generalized gamma, beta Weibull, beta exponential, generalized gamma, Weibull and gamma distributions and thus could be a better model for analyzing positive skewed data. The new density function can be expressed as a linear combination of generalized gamma densities. Various mathematical properties of the new distribution including explicit expressions for the ordinary and incomplete moments, generating function, mean deviations, entropy, density function of the order statistics and their moments are derived. The elements of the observed information matrix are provided. We discuss the method of maximum likelihood and a Bayesian approach to fit the model parameters. The superiority of the new model is illustrated by means of three real data sets.
We propose distributed generalized linear models for the purpose of incorporating lagged effects. The model class provides a more accurate statistical measure of the relationship between the dependent variable and a series of covariates. The estimators from the proposed procedure are shown to be consistent. Simulation studies not only confirm the asymptotic properties of the estimators, but exhibit the adverse effects of model misspecification in terms of accuracy of model estimation and prediction. The application is illustrated by analyzing the presidential election data of 2016.
The analysis of sports data, especially cricket is an interesting field for the statisticians. Every year, a large number of cricket tournaments take place among the cricket playing nations. It is of interest to study their performance when they play with each other in a one-day international (ODI) match or a test match. In this study, we assess the performance of top ten cricket teams in the ODI cricket match and make a comparison among them. The abilities of teams change over time. As a result, not a single team dominates the game over a long period. Therefore, a paired comparison method is more reliable and appropriate to compare more than two teams at the same time based on the outcomes of the matches they play. Arguably, a team’s performance also depends on whether they play at home or away. In this study, we consider Bradley-Terry model, a widely accepted model for pairwise comparison. In that, we consider home and away effect to demonstrate how the home advantages differ among these teams.
Abstract: Breast cancer is the second most common type of cancer in the world (World Cancer Report, 2014 a, b). The evolution of breast cancer treatment usually allows a longer life of patients as well in many cases a relapse of the disease. Usually medical researchers are interested to analyze data denoting the time until the occurrence of an event of interest such as the time of death by cancer in presence of right censored data and some covariates. In some situations, we could have two lifetimes associated to the same patient, as for example, the time free of the disease until recurrence and the total lifetime of the patient. In this case, it is important to assume a bivariate lifetime distribution which describes the possible dependence between the two observations. We consider as an application, different parametric bivariate lifetime distributions to analyze a breast cancer data set considering continuous or discrete data. Inferences of interest are obtained under a statistical Bayesian approach. We get the posterior summaries of interest using existing MCMC (Markov Chain Monte Carlo) methods. The main goal of the study, is to compare the bivariate continuous and discrete distributions that better describes the breast cancer lifetimes.
Abstract: Price limits are applied to control risks in various futures mar kets. In this research, we proposed an adapted autoregressive model for the observed futures return by introducing dummy variables that represent limit moves. We also proposed a stochastic volatility model with dummy variables. These two models are used to investigate the existence of price de layed discovery effect and volatility spillover effect from price limits. We give an empirical study of the impact of price limits on copper and natural rubble futures in Shanghai Futures Exchange (SHFE) by using MCMC method. It is found that price limits are efficient in controlling copper futures price, but the rubber futures price is distorted significantly. This implies that the effects of price limits are significant for products with large fluctuation and frequent limits hit.
Abstract: Nowadays, extensive amounts of data are stored which require the development of specialized methods for data analysis in an understandable way. In medical data analysis many potential factors are usually introduced to determine an outcome response variable. The main objective of variable selection is enhancing the prediction performance of the predictor variables and identifying correctly and parsimoniously the faster and more cost-effective predictors that have an important influence on the response. Various variable selection techniques are used to improve predictability and obtain the “best” model derived from a screening procedure. In our study, we propose a variable subset selection method which extends to the classification case the idea of selecting variables and combines a nonparametric criterion with a likelihood based criterion. In this work, the Area Under the ROC Curve (AUC) criterion is used from another viewpoint in order to determine more directly the important factors. The proposed method revealed a modification (BIC) of the modified Bayesian Information Criterion (mBIC). The comparison of the introduced BIC to existing variable selection methods is performed by some simulating experiments and the Type I and Type II error rates are calculated. Additionally, the proposed method is applied successfully to a high-dimensional Trauma data analysis, and its good predictive properties are confirmed.