Abstract: In maximum likelihood exploratory factor analysis, the estimates of unique variances can often turn out to be zero or negative, which makes no sense from a statistical point of view. In order to overcome this difficulty, we employ a Bayesian approach by specifying a prior distribution for the variances of unique factors. The factor analysis model is estimated by EM algorithm, for which we provide the expectation and maximization steps within a general framework of EM algorithms. Crucial issues in Bayesian factor analysis model are the choice of adjusted parameters including the number of factors and also the hyper-parameters for the prior distribution. The choice of these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating a Bayesian factor analysis model. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. We observe that our modeling procedure prevents the occurrence of improper solutions and also chooses the appropriate number of factors objectively.
We introduce the four-parameter Kumaraswamy Gompertz distribution. We obtain the moments, generating and quantilefunctions, Shannon and Rényi entropies, mean deviations and Bonferroni and Lorenz curves. We provide a mixture representation for the density function of the order statistics. We discuss the estimation of the model parameters by maximum likelihood. We provide an application a real data set that illustrates the usefulness of the new model.
Abstract: In epidemiological studies where subjects are seen periodically on follow-up visits, interval-censored data occur naturally. The exact time the change of state (such as HIV seroconversion) occurs is not known exactly, only that it occurred sometime within a specific time interval. This paper considers estimation of parameters when HIV infection times are intervalcensored and correlated. It is assumed that each sexual partnership has a specific unobservable random effect that induces association between infection times. Parameters are estimated using the expectation-maximization algorithm and the Gibbs sampler. The results from the two methods are compared. Both methods yield fixed effects and baseline hazard estimates that are comparable. However, standard errors and frailty variance estimates are underestimated in the expectation-maximization algorithm compared to those from the Gibbs sampler. The Gibbs sampler is considered a plausible alternative to the expectation-maximization algorithm.
Abstract: In this paper we propose a new three-parameters lifetime distribu tion with decreasing hazard function, the long-term exponential geometric distribution. The new distribution arises on latent competing risks scenarios, where the lifetime associated with a particular risk is not observable, rather we observe only the minimum lifetime value among all risks, and there is presence of long-term survival. The properties of the proposed distribution are discussed, including its probability density function and explicit algebraic formulas for its survival and hazard functions, order statistics, Bonferroni function and the Lorenz curve. The parameter estimation is based on the usual maximum likelihood approach. We compare the new distribution with its particular case, the long-term exponential distribution, as well as with the long-term Weibull distribution on two real datasets, observing its poten tial and competitiveness in comparison with an usual lifetime distribu
In this paper, we considered a new generalization of the paralogistic distribution which we called the three-parameter paralogistic distribution. Some properties of the new distribution which includes the survival function, hazard function, quantile function, moments, Renyi entropy and the maximum likelihood estimation (MLE) of its parameters are obtained. A simulation study shows that the MLE of the parameters of the new distribution is consistent and asymptotically unbiased. An applicability of the new three-parameter paralogistic distribution was subject to a real lifetime data set alongside with some related existing distributions such as the Paralogistic, Gamma, Transformed Beta, Log-logistic and Inverse paralogistic distributions. The results obtained show that the new three-parameter paralogistic distribution was superior to other aforementioned distributions in terms of the Akaike information criterion (AIC) and K-S Statistic values. This claim was further supported by investigating the density plots, P-P plots and Q-Q plots of the distributions for the data set under study.
A new four-parameter model called the Marshall-Olkin extended generalized Gompertz distribution is introduced. Its hazard rate function can be constant, increasing, decreasing, upside-down bathtub or bathtub-shaped depending on its parameters. Some mathematical properties of this model such as expansion for the density function, moments, moment generating function, quantile function, mean deviations, mean residual life, order statistics and Rényi entropy are derived. The maximum likelihood technique is used to estimate the unknown model parameters and the observed information matrix is determined. The applicability of the proposed model is shown by means of a real data set.
Abstract: In recent years Singular Spectrum Analysis (SSA), used as a powerful technique in time series analysis, has been developed and applied to many practical problems. In this paper, the performance of the SSA tech nique has been considered by applying it to a well-known time series data set, namely, monthly accidental deaths in the USA. The results are com pared with those obtained using Box-Jenkins SARIMA models, the ARAR algorithm and the Holt-Winter algorithm (as described in Brockwell and Davis (2002)). The results show that the SSA technique gives a much more accurate forecast than the other methods indicated above.
Subsampling the data is used in this paper as a learning method about the influence of the data points for drawing inference on the parameters of a fitted logistic regression model. The alternative, alternative regularized, alternative regularized lasso, and alternative regularized ridge estimators are proposed for the parameter estimation of logistic regression models and are then compared with the maximum likelihood estimators. The proposed alternative regularized estimators are obtained by using a tuning parameter but the proposed alternative estimators are not regularized. The proposed alternative regularized lasso estimators are the averaged standard lasso estimators and the alternative regularized ridge estimators are also the averaged standard ridge estimators over subsets of groups where the number of subsets could be smaller than the number of parameters. The values of the tuning parameters are obtained to make the alternative regularized estimators very close to the maximum likelihood estimators and the process is explained with two real data as well as a simulated study. The alternative and alternative regularized estimators always have the closed form expressions in terms of observations that the maximum likelihood estimators do not have. When the maximum likelihood estimators do not have the closed form expressions, the alternative regularized estimators thus obtained provide the approximate closed form expressions for them.
Abstract: In any sport competition, there is a strong interest in knowing which team shall be the champion at the end of the championship. Besides this, the end result of a match, the chance of a team to be qualified for a specific tournament, the chance of being relegated, the best attack, the best defense, among others, are also subject of interest. In this paper we present a simple method with good predictive quality, easy implementation, low computational effort, which allows the calculation of all the interesting quantities above. Following Lee (1997), we estimate the average goals scored by each team by assuming that the number of goals scored by a team in a match follows a univariate Poisson distribution but we consider linear models that express the sum and the difference of goals scored in terms of five covariates: the goal average in a match, the home-team advantage, the team’s offensive power, the opponent team’s defensive power and a crisis indicator. The methodology is applied to the 2008-2009 English Premier League.
Abstract: Information regarding small area prevalence of chronic disease is important for public health strategy and resourcing equity. This paper develops a prevalence model taking account of survey and census data to derive small area prevalence estimates for diabetes. The application involves 32000 small area subdivisions (zip code census tracts) of the US, with the prevalence estimates taking account of information from the US-wide Behavioral Risk Factor Surveillance System (BRFSS) survey on population prevalence differentials by age, gender, ethnic group and education. The effects of such aspects of population composition on prevalence are widely recognized. However, the model also incorporates spatial or contextual influences via spatially structured effects for each US state; such contextual effects are allowed to differ between ethnic groups and other demographic categories using a multivariate spatial prior. A Bayesian estimation approach is used and analysis demonstrates the considerably improved fit of a fully specified compositional-contextual model as compared to simpler ‘standard’ approaches which are typically limited to age and area effects.