A new four parameter extreme value distribution is defined and studied. Various structural properties of the proposed distribution including ordinary and incomplete moments, generating functions, residual and reversed residual life functions, order statistics are investigated. Some useful characterizations based on two truncated moments as well as based on the reverse hazard function and on certain functions of the random variable are presented. The maximum likelihood method is used to estimate the model parameters. Further, we propose a new extended regression model based on the logarithm of the new distribution. The new distribution is applied to model three real data sets to prove empirically its flexibility.
In the linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate assigning weights to components based on their correlations with the response, which may lead to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.
Abstract: Cancer is a complex disease where various types of molecular aber rations drive the development and progression of malignancies. Among the diverse molecular aberrations, inherited and somatic mutations on DNA se quences are considered as major drivers for oncogenesis. The complexity of somatic alterations is revealed from large-scale investigations of cancer genomes and robust methods for interring the function of genes. In this review, we will describe sequence mutations of several cancer-related genes and discuss their functional implications in cancer. In addition, we will in troduce the on-line resources for accessing and analyzing sequence mutations in cancer. We will also provide an overview of the statistical and computa tional approaches and future prospects to conduct comprehensive analyses of the somatic alterations in cancer genomes.
Abstract: A randomly truncated sample appears when the independent variables T and L are observable if L < T. The truncated version Kaplan-Meier estimator is known to be the standard estimation method for the marginal distribution of T or L. The inverse probability weighted (IPW) estimator was suggested as an alternative and its agreement to the truncated version Kaplan-Meier estimator has been proved. This paper centers on the weak convergence of IPW estimators and variance decomposition. The paper shows that the asymptotic variance of an IPW estimator can be decom posed into two sources. The variation for the IPW estimator using known weight functions is the primary source, and the variation due to estimated weights should be included as well. Variance decomposition establishes the connection between a truncated sample and a biased sample with know prob abilities of selection. A simulation study was conducted to investigate the practical performance of the proposed variance estimators, as well as the relative magnitude of two sources of variation for various truncation rates. A blood transfusion data set is analyzed to illustrate the nonparametric inference discussed in the paper.
A graphical tool for choosing the number of nodes for a neural network is introduced. The idea is to fit the neural network with a range of numbers of nodes at first, and then generate a jump plot using a transformation of the mean square errors of the resulting residuals. A theorem is proven to show that the jump plot will select several candidate numbers of nodes among which one is the true number of nodes. Then a single node only test, which has been theoretically justified, is used to rule out erroneous candidates. The method has a sound theoretical background, yields good results on simulated datasets, and shows wide applicability to datasets from real research.
In this paper, we advance new families of bivariate copulas constructed by distributional distortions of existing bivariate copulas. The distortions under consideration are based on the unit gamma distribution of two forms. When the initial copula is Archimedean, the induced copula is also Archimedean under the admissible parameter space. Properties such as Kendall’s tau coefficient, tail dependence coefficients and tail orders for the new families of copulas are derived. An empirical application to economic indicator data is presented.
Abstract:Air pollution shows itself as a serious problem in big cities in Turkey, especially for winter seasons. Particulate atmospheric pollution in urban areas is considered to have significant impact on human health. Therefore, the ability to make accurate predictions of particulate ambient concentrations is important to improve public awareness and air quality management. Ambient PM10 (i.e particulate diameter less than 10um in size) pollution has negative impacts on human health and it is influenced by meteorological conditions. In this study, partial least squares regression, principal component regression, ridge regression and multiple linear regression methods are compared in modeling and predicting daily mean PM10 concentrations on the base of various meteorological parameters obtained for the city of Ankara, in Turkey. The analysed period is February 2007. The results show that while multiple linear regression and ridge regression yield somewhat better results for fitting to this dataset, principal component regression and partial least squares regression are better than both of them in terms of prediction of PM10 values for future datasets. In addition, partial least squares regression is the remarkable method in terms of predictive ability as it has a close performance with principal component regression even with less number of factors.
Abstract: A new extension of the generalized gamma distribution with six parameter called the Kummer beta generalized gamma distribution is introduced and studied. It contains at least 28 special models such as the beta generalized gamma, beta Weibull, beta exponential, generalized gamma, Weibull and gamma distributions and thus could be a better model for analyzing positive skewed data. The new density function can be expressed as a linear combination of generalized gamma densities. Various mathematical properties of the new distribution including explicit expressions for the ordinary and incomplete moments, generating function, mean deviations, entropy, density function of the order statistics and their moments are derived. The elements of the observed information matrix are provided. We discuss the method of maximum likelihood and a Bayesian approach to fit the model parameters. The superiority of the new model is illustrated by means of three real data sets.
We propose distributed generalized linear models for the purpose of incorporating lagged effects. The model class provides a more accurate statistical measure of the relationship between the dependent variable and a series of covariates. The estimators from the proposed procedure are shown to be consistent. Simulation studies not only confirm the asymptotic properties of the estimators, but exhibit the adverse effects of model misspecification in terms of accuracy of model estimation and prediction. The application is illustrated by analyzing the presidential election data of 2016.
The analysis of sports data, especially cricket is an interesting field for the statisticians. Every year, a large number of cricket tournaments take place among the cricket playing nations. It is of interest to study their performance when they play with each other in a one-day international (ODI) match or a test match. In this study, we assess the performance of top ten cricket teams in the ODI cricket match and make a comparison among them. The abilities of teams change over time. As a result, not a single team dominates the game over a long period. Therefore, a paired comparison method is more reliable and appropriate to compare more than two teams at the same time based on the outcomes of the matches they play. Arguably, a team’s performance also depends on whether they play at home or away. In this study, we consider Bradley-Terry model, a widely accepted model for pairwise comparison. In that, we consider home and away effect to demonstrate how the home advantages differ among these teams.