The normal distribution is the most popular model in applications to real data. We propose a new extension of this distribution, called the Kummer beta normal distribution, which presents greater flexibility to model scenarios involving skewed data. The new probability density function can be represented as a linear combination of exponentiated normal pdfs. We also propose analytical expressions for some mathematical quantities: Ordinary and incomplete moments, mean deviations and order statistics. The estimation of parameters is approached by the method of maximum likelihood and Bayesian analysis. Likelihood ratio statistics and formal goodnessof-fit tests are used to compare the proposed distribution with some of its sub-models and non-nested models. A real data set is used to illustrate the importance of the proposed model.
Providing a new distribution is always precious for statisticians. A new three parameter distribution called the gamma normal distribution is defined and studied. Various structural properties of the new distribution are derived, including some explicit expressions for the moments, quantile and generating functions, mean deviations, probability weighted moments and two types of entropy. We also investigate the order statistics and their moments. Maximum likelihood techniques are used to fit the new model and to show its potentiality by means of two examples of real data. Based on three criteria, the proposed distribution provides a better fit then the skew-normal distribution.
Technological advances in software development effectively handled technical details that made life easier for data analysts, but also allowed for nonexperts in statistics and computer science to analyze data. As a result, medical research suffers from statistical errors that could be otherwise prevented such as errors in choosing a hypothesis test and assumption checking of models. Our objective is to create an automated data analysis software package that can help practitioners run non-subjective, fast, accurate and easily interpretable analyses. We used machine learning to predict the normality of a distribution as an alternative to normality tests and graphical methods to avoid their downsides. We implemented methods for detecting outliers, imputing missing values, and choosing a threshold for cutting numerical variables to correct for non-linearity before running a linear regression. We showed that data analysis can be automated. Our normality prediction algorithm outperformed the Shapiro-Wilk test in small samples with Matthews correlation coefficient of 0.5 vs. 0.16. The biggest drawback was that we did not find alternatives for statistical tests to test linear regression assumptions which are problematic in large datasets. We also applied our work to a dataset about smoking in teenagers. Because of the opensource nature of our work, these algorithms can be used in future research and projects.