Influential observations do posed a major threat on the performance of regression model. Different influential statistics including Cook’s Distance and DFFITS have been introduced in literatures using Ordinary Least Squares (OLS). The efficiency of these measures will be affected with the presence of multicollinearity in linear regression. However, both problems can jointly exist in a regression model. New diagnostic measures based on the Two-Parameter Liu-Ridge Estimator (TPE) defined by Ozkale and Kaciranlar (2007) was proposed as alternatives to the existing ones. Approximate deletion formulas for the detection of influential cases for TPE are proposed. Finally, the diagnostic measures are illustrated with two real life dataset.
Compositional data consist of known compositions vectors whose components are positive and defined in the interval (0,1) representing proportions or fractions of a “whole”. The sum of these components must be equal to one. Compositional data is present in different knowledge areas, as in geology, economy, medicine among many others. In this paper, we propose a new statistical tool for volleyball data, i.e., we introduce a Bayesian anal- ysis for compositional regression applying additive log-ratio (ALR) trans- formation and assuming uncorrelated and correlated errors. The Bayesian inference procedure based on Markov Chain Monte Carlo Methods (MCMC). The methodology is applied on an artificial and a real data set of volleyball.
In this paper, we introduce a new generalized family of distri- butions from bounded support (0,1), namely, the Topp-Leone-G family.Some of mathematical properties of the proposed family have been studied. The new density function can be symmetrical, left-skewed, right-skewed or reverse-J shaped. Furthermore, the hazard rate function can be constant, in- creasing, decreasing, J or bathtub hazard rate shapes. Three special models are discussed. We obtain simple expressions for the ordinary and incomplete moments, quantile and generating functions, mean deviations and entropies. The method of maximum likelihood is used to estimate the model parame- ters. The flexibility of the new family is illustrated by means of three real data sets.
In this article, we introduce a new class of five-parameter model called the Exponentiated Weibull Lomax arising from the Exponentiated Weibull generated family. The new class contains some existing distributions as well as some new models. Explicit expressions for its moments, distribution and density functions, moments of residual life function are derived. Furthermore, Rényi and q–entropies, probability weighted moments, and order statistics are obtained. Three suggested procedures of estimation, namely, the maximum likelihood, least squares and weigthed least squares are used to obtain the point estimators of the model parameters. Simulation study is performed to compare the performance of different estimates in terms of their relative biases and standard errors. In addition, an application to two real data sets demonstrate the usefulness of the new model comparing with some new models.
In this paper, the problem of determining which treatments are statistically significant when compared with a zero-dose or placebo control in a dose-response study is considered. Nonparametric meth- ods developed for the commonly used multiple comparison problem whenever the Jonckheere trend test (JT) is appropriate is extended to the multiple comparisons to control problem. We present four closed testing methods, of which two use an AUC regression model approach for determining the treatment arms that are statistically different from the zero-dose control. A simulation study is performed to compare the proposed methods with two existing rank-based nonparametric mul- tiple comparison procedures. The method is further illustrated using a problem from a clinical setting.
In this work, we study the odd Lindley Burr XII model initially introduced by Silva et al. [29]. This model has the advantage of being capable of modeling various shapes of aging and failure criteria. Some of its statistical structural properties including ordinary and incomplete moments, quantile and generating function and order statistics are derived. The odd Lindley Burr XII density can be expressed as a simple linear mixture of BurrXII densities. Useful characterizations are presented. The maximum likelihood method is used to estimate the model parameters. Simulation results to assess the performance of the maximum likelihood estimators are discussed. We prove empirically the importance and flexibility of the new model in modeling various types of data. Bayesian estimation is performed by obtaining the posterior marginal distributions as well as using the simulation method of Markov Chain Monte Carlo (MCMC) by the Metropolis-Hastings algorithm in each step of Gibbs algorithm. The trace plots and estimated conditional posterior distributions are also presented.
Getting a machine to understand the meaning of language is a largely important goal to a wide variety of fields, from advertising to entertainment. In this work, we focus on Youtube comments from the top twohundred trending videos as a source of user text data. Previous Sentiment Analysis Models focus on using hand-labelled data or predetermined lexicon-s.Our goal is to train a model to label comment sentiment with emoticons by training on other user-generated comments containing emoticons. Naive Bayes and Recurrent Neural Network models are both investigated and im- plemented in this study, and the validation accuracies for Naive Bayes model and Recurrent Neural Network model are found to be .548 and .812.
In this paper, we introduce some new families of generalized Pareto distributions using the T-R{Y} framework. These families of distributions are named T-Pareto{Y} families, and they arise from the quantile functions of exponential, log-logistic, logistic, extreme value, Cauchy and Weibull distributions. The shapes of these T-Pareto families can be unimodal or bimodal, skewed to the left or skewed to the right with heavy tail. Some general properties of the T-Pareto{Y} family are investigated and these include the moments, modes, mean deviations from the mean and from the median, and Shannon entropy. Several new generalized Pareto distributions are also discussed. Four real data sets from engineering, biomedical and social science are analyzed to demonstrate the flexibility and usefulness of the T-Pareto{Y} families of distributions.
In this paper, we introduce a new four-parameter distribution called the transmuted Weibull power function (TWPF) distribution which e5xtends the transmuted family proposed by Shaw and Buckley [1]. The hazard rate function of the TWPF distribution can be constant, increasing, decreasing, unimodal, upside down bathtub shaped or bathtub shape. Some mathematical properties are derived including quantile functions, expansion of density function, moments, moment generating function, residual life function, reversed residual life function, mean deviation, inequality measures. The estimation of the model parameters is carried out using the maximum likelihood method. The importance and flexibility of the proposed model are proved empirically using real data sets.
In DEA framework there are many techniques for finding a common set of efficient weights depend on inputs and outputs values in a set of peer DecisionMaking Units (DMUs). In a lot of papers, has been discussed multiple criteria decision-making techniques and multiple objective-decision criteria for modeling. We know the objective function of a common set of weights is defined like an individual efficiency of one DMU with a basic difference: "trying to maximize the efficiency of all DMUs simultaneously, with unchanged restrictions". An ideal solution for a common set of weights can be the closest set to the derived individual solution of each DMU. Now one question can be: "are the closest set and minimized set, which is found in most of the techniques, are different?" The answer can be: "They are different when the variance between the generated weights of a specific input (output) from n DMUs is big". In this case, we will apply Singular Value Decomposition (SVD) such that, first, the degree of importance weights for each input (output) will be defined and found, then, the Common Set of Weights (CSW) will be found by the closest set to these weights. The degree of importance values will affect the CSW of each DMU directly.