Abstract: Student retention is an important issue for all university policy makers due to the potential negative impact on the image of the university and the career path of the dropouts. Although this issue has been thoroughly studied by many institutional researchers using parametric techniques, such as regression analysis and logit modeling, this article attempts to bring in a new perspective by exploring the issue with the use of three data mining techniques, namely, classification trees, multivariate adaptive regression splines (MARS), and neural networks. Data mining procedures identify transferred hours, residency, and ethnicity as crucial factors to retention. Carrying transferred hours into the university implies that the students have taken college level classes somewhere else, suggesting that they are more academically prepared for university study than those who have no transferred hours. Although residency was found to be a crucial predictor to retention, one should not go too far as to interpret this finding that retention is affected by proximity to the university location. Instead, this is a typical example of Simpson’s Paradox. The geographical information system analysis indicates that non-residents from the east coast tend to be more persistent in enrollment than their west coast schoolmates.
Abstract: Some scientists prefer to exercise substantial judgment in formulating a likelihood function for their data. Others prefer to try to get the data to tell them which likelihood is most appropriate. We suggest here that one way to reduce the judgment component of the likelihood function is to adopt a mixture of potential likelihoods and let the data determine the weights on each likelihood. We distinguish several different types of subjectivity in the likelihood function and show with examples how these subjective elements may be given more equitable treatment.
Abstract: The scheme of doubly type-II censored sampling is an important method of obtaining data in lifetime studies. Statistical analysis of life time distributions under this censoring scheme is based on precise lifetime data. However, some collected lifetime data might be imprecise and are represented in the form of fuzzy numbers. This paper deals with the prob lem of estimating the scale parameter of Rayleigh distribution under doubly type-II censoring scheme when the lifetime observations are fuzzy and are assumed to be related to underlying crisp realization of a random sample. We propose a new method to determine the maximum likelihood estimate of the parameter of interest. The asymptotic variance of the ML estimate is then derived by using the missing information principle. Their performance is then assessed through Monte Carlo simulations. Finally, an illustrative example with real data concerning 25 ball bearings in a life test is presented.
Abstract: As an extension to previous research efforts, the PPM is applied to the identification of multiple change points in the parameter that indexes the regular exponential family. We define the PPM for Yao’s prior cohesions and contiguous blocks. Because the exponential family provides a rich set of models, we also present the PPM for some particular members of this family in both continuous and discrete cases and the PPM is applied to identify multiple change points in real data. Firstly, multiple changes are identified in the rates of crimes in one of the biggest cities in Brazil. In order to illustrate the continuous case, multiple changes are identified in the volatility (variance) and in the expected return (mean) of some Latin America emerging markets return series.
We propose a lifetime distribution with flexible hazard rate called cubic rank transmuted modified Burr III (CRTMBIII) distribution. We develop the proposed distribution on the basis of the cubic ranking transmutation map. The density function of CRTMBIII is symmetrical, right-skewed, left-skewed, exponential, arc, J and bimodal shaped. The flexible hazard rate of the proposed model can accommodate almost all types of shapes such as unimodal, bimodal, arc, increasing, decreasing, decreasing-increasing-decreasing, inverted bathtub and modified bathtub. To show the importance of proposed model, we present mathematical properties such as moments, incomplete moments, inequality measures, residual life function and stress strength reliability measure. We characterize the CRTMBIII distribution via techniques. We address the maximum likelihood method for the model parameters. We evaluate the performance of the maximum likelihood estimates (MLEs) via simulation study. We establish empirically that the proposed model is suitable for strengths of glass fibers. We apply goodness of fit statistics and the graphical tools to examine the potentiality and utility of the CRTMBIII distribution.
This paper presents a new generalization of the extended Gompertz distribution. We defined the so-called exponentiated generalized extended Gompertz distribution, which has at least three important advantages: (i) Includes the exponential, Gompertz, extended exponential and extended Gompertz distributions as special cases; (ii) adds two parameters to the base distribution, but does not use any complicated functions to that end; and (iii) its hazard function includes inverted bathtub and bathtub shapes, which are particularly important because of its broad applicability in real-life situations. The work derives several mathematical properties for the new model and discusses a maximum likelihood estimation method. For the main formulas related to our model, we present numerical studies that demonstrate the practicality of computational implementation using statistical software. We also present a Monte Carlo simulation study to evaluate the performance of the maximum likelihood estimators for the EGEG model. Three real- world data sets were used for applications in order to illustrate the usefulness of our proposal.
In this paper, the problem of determining which treatments are statistically significant when compared with a zero-dose or placebo control in a dose-response study is considered. Nonparametric meth- ods developed for the commonly used multiple comparison problem whenever the Jonckheere trend test (JT) is appropriate is extended to the multiple comparisons to control problem. We present four closed testing methods, of which two use an AUC regression model approach for determining the treatment arms that are statistically different from the zero-dose control. A simulation study is performed to compare the proposed methods with two existing rank-based nonparametric mul- tiple comparison procedures. The method is further illustrated using a problem from a clinical setting.
Abstract: Relative entropy identities yield basic decompositions of cat egorical data log-likelihoods. These naturally lead to the development of information models in contrast to the hierarchical log-linear models. A recent study by the authors clarified the principal difference in the data likelihood analysis between the two model types. The proposed scheme of log-likelihood decomposition introduces a prototype of linear information models, with which a basic scheme of model selection can be formulated accordingly. Empirical studies with high-way contingency tables are exem plified to illustrate the natural selections of information models in contrast to hierarchical log-linear models.
Abstract: We apply model-based cluster analysis to data concerning types of democracies, creating an instrument for typologies. Noting several ad vantages of model-based clustering over traditional clustering methods, we fit a normal mixture model for types of democracy in the context of the majoritarian-consensus contrast using Lijphart’s (1999) data on ten variables for 36 democracies. The model for the full period (1945-1996) finds four types of democracies: two types representing a majoritarian-consensus contrast, and two mixed ones lying between the extremes. The four-cluster solution shows that most of the countries have high cluster membership probabilities, and the solution is found to be quite stable with respect to possible measurement error in the variables included in the model. For the recent-period (1971-1996) data, most countries remain in the same clusters as for the full-period data.