Abstract: Relative entropy identities yield basic decompositions of cat egorical data log-likelihoods. These naturally lead to the development of information models in contrast to the hierarchical log-linear models. A recent study by the authors clarified the principal difference in the data likelihood analysis between the two model types. The proposed scheme of log-likelihood decomposition introduces a prototype of linear information models, with which a basic scheme of model selection can be formulated accordingly. Empirical studies with high-way contingency tables are exem plified to illustrate the natural selections of information models in contrast to hierarchical log-linear models.
Abstract: We apply model-based cluster analysis to data concerning types of democracies, creating an instrument for typologies. Noting several ad vantages of model-based clustering over traditional clustering methods, we fit a normal mixture model for types of democracy in the context of the majoritarian-consensus contrast using Lijphart’s (1999) data on ten variables for 36 democracies. The model for the full period (1945-1996) finds four types of democracies: two types representing a majoritarian-consensus contrast, and two mixed ones lying between the extremes. The four-cluster solution shows that most of the countries have high cluster membership probabilities, and the solution is found to be quite stable with respect to possible measurement error in the variables included in the model. For the recent-period (1971-1996) data, most countries remain in the same clusters as for the full-period data.
Abstract: In this paper we introduce bivariate Weibull distributions derived from copula functions in presence of cure fraction, censored data and covariates. Two copla functions are explored: the FGM (Farlie - Gumbel Morgenstern) copula and the Gumbel copula. Inferences for the proposed models are obtained under the Bayesian approach, using standard MCMC (Markov Chain Monte Carlo) methods. An illustration of the proposed methodology is given considering a medical data set.
Abstract: DNA fingerprinting is a microbiological technique widely used to find a DNA sequence specific for a microbe. It involves slicing the genomes of the microbe into DNA fragments with manageable sizes, sorting the DNA pieces by length and finally identifying a DNA sequence unique to the mi crobe, using probe-based assays. This unique DNA is referred to as DNA fingerprint of the microbe under study. In this paper, we introduce a proba bilistic model to estimate the chance of identifying the DNA fingerprint from the genome of a microbe when the DNA fingerprinting method is employed. We derive a closed-form functional relationship between the chance of find ing the fingerprint and factors that can be experimentally controlled either in part, fully or not at all. Because the odds of finding a specific DNA fin gerprint can only be improved by experimental design to a certain degree, in a broader sense, we show that the discovery of a DNA fingerprint is a process governed more by chance than by design. Nevertheless, the results can be potentially used to guide experiments in maximizing the chance of finding a DNA fingerprint of interest.
Football, or soccer, is considered one of the most important col- lective sports in the world. Managers, specialists and fans are always trying to find out the important keys to have a good team. The evaluation of the team quality may present many variables and subjective concepts, and for this reason, it is not simple to answer the following question: How to define quality? Another point that should be considered is the importance of aspects such as offensive and defensive: Which one is more important to measure quality of a football team? For this task, we propose the use of a causal model with latent variables as a tool to measure the subjectivity of the team quality and how it can be affected by other aspects. Information from the four most important football leagues in the world (England, Germany, Italy and Spain) during three seasons (2011-2012; 2012-2013; 2013-2014) was collected. We defined the latent variables in the model and evaluated the relationships among them. The results show that the offensive aspect exert more influence on team quality than defensive aspect, which reflects directly on the players market strategies.
Summary: Longitudinal binary data often arise in clinical trials when repeated measurements, positive or negative to certain tests, are made on the same subject over time. To account for the serial corre lation within subjects, we propose a marginal logistic model which is implemented using the Generalized Estimating Equation (GEE) ap proach with working correlation matrices adopting some widely used forms. The aim of this paper is to seek some robust working correla tion matrices that give consistently good fit to the data. Model-fit is assessed using the modified expected utility of Walker & Guti´errez Pe˜na (1999). To evaluate the effect of the length of time series and the strength of serial correlation on the robustness of various working correlation matrices, the models are demonstrated using three data sets containing respectively all short time series, all long time series and time series of varying length. We identify factors that affect the choice of robust working correlation matrices and give suggestions under different situations.
Abstract: This investigation utilized a robust logistic regression method (BYLOGREG) to investigate CEO bonuses prior to the 2007-2009 financial crisis. The robust logistic regression analysis determined that the year and CEO tenure affected the probability that a CEO received a bonus in the 2004-2006 study period. The analysis refuted that “management entrenchment” widely influenced CEO bonus compensation because the probability of receiving a bonus was negatively related to CEO tenure. The probability of receiving of bonus declined during the 2004-2006 study period because the percentage of CEOs that received a bonus was lowest in 2006. The robust logistic regression analysis found that the current year stock return was positively and statistically significantly related to the probability that a CEO received a bonus. The analysis also showed that managerial (financial) performance in the areas of growth of sales, ROE, and growth in earnings per share increased the probability that a CEO received a bonus. In this investigation, the size of the firm and the growth rate of equity were not statistically significant. Overall, robust logistic regression correctly classified 77% of the observations on the basis of the model variables, which indicated that most CEO bonuses could be explained by firm, CEO, and financial variables. The BY robust logistic regression proved to be robust to outliers in the CEO bonus sample studied. Interestingly, the relationship between stock return and the probability of a bonus was completely missed by a maximum likelihood (ML) logistic regression with the full CEO bonus sample, which contained outliers. After trimming the CEO bonus data set to remove outliers, the ML logistic regression coefficients changed dramatically. However, the BY robust logistic regression coefficients changed very little. Use of the residuals from the BY robust logistic equation should facilitate further inquiry into CEOs that received a bonus but were predicted to have a low probability of a bonus.
Abstract: Copulas have recently emerged as practical methods for multivari ate modeling. To our knowledge, only a limited amount of work has been done to apply copula-based modeling in context analysis. In this study, we generalized Clayton copula under the appropriate weighted function. In some examples, bivariate distributions by using the weighted Clayton cop ula are generalized. Also the properties of generalized Clayton copula are provided. The Clayton copula and weighted Clayton model cannot be used for negative dependence. These have been used to study left tail depen dence. This property is stronger in weighted Clayton model with respect to ordinary Clayton copula. It will also be shown that the generalized Clayton copula is suitable for the probable modeling of the hydrology data.