A new flexible extension of the inverse Rayleigh model is proposed and studied. Some of its fundamental statistical properties are derived. We assessed the performance of the maximum likelihood method via a simulation study. The importance of the new model is shown via three applications to real data sets. The new model is much better than other important competitive models.
Technological advances in software development effectively handled technical details that made life easier for data analysts, but also allowed for nonexperts in statistics and computer science to analyze data. As a result, medical research suffers from statistical errors that could be otherwise prevented such as errors in choosing a hypothesis test and assumption checking of models. Our objective is to create an automated data analysis software package that can help practitioners run non-subjective, fast, accurate and easily interpretable analyses. We used machine learning to predict the normality of a distribution as an alternative to normality tests and graphical methods to avoid their downsides. We implemented methods for detecting outliers, imputing missing values, and choosing a threshold for cutting numerical variables to correct for non-linearity before running a linear regression. We showed that data analysis can be automated. Our normality prediction algorithm outperformed the Shapiro-Wilk test in small samples with Matthews correlation coefficient of 0.5 vs. 0.16. The biggest drawback was that we did not find alternatives for statistical tests to test linear regression assumptions which are problematic in large datasets. We also applied our work to a dataset about smoking in teenagers. Because of the opensource nature of our work, these algorithms can be used in future research and projects.
Abstract: It is important to estimate transmissibility of influenza virus during its growing phase for understanding the propagation of the virus. The estimation procedures of the transmissibility are usually based on the data generated in flu seasons. The data-generating process of the outbreak of influenza has many features. The data is generated by not only a biological process but also control measures such as flu vaccination. The estimation is discussed by considering the aspects of the data-generating process and using the model to capture the essential characteristics of flu transmission during the growing phase of a flu season.
Abstract: In this small note we have established some new explicit expressions for ratio and inverse moments of lower generalized order statistics for the Marshall-Olkin extended Burr type XII distribution. These explicit expressions can be used to develop the relationship for moments of ordinary order statistics, record statistics and other ordered random variable techniques. Further, a characterization result of this distribution has been considered on using the conditional moment of the lower generalized order statistics.
Abstract: We study a new five-parameter model called the extended Dagum distribution. The proposed model contains as special cases the log-logistic and Burr III distributions, among others. We derive the moments, generating and quantile functions, mean deviations and Bonferroni, Lorenz and Zenga curves. We obtain the density function of the order statistics. The parameters are estimated by the method of maximum likelihood. The observed information matrix is determined. An application to real data illustrates the importance of the new model.
Abstract: Several statistical approaches have been proposed to consider circumstances under which one universal distribution is not capable of fit ting into the whole domain. This paper studies Bayesian detection of mul tiple interior epidemic/square waves in the interval domain, featured by two identical statistical distributions at both ends. We introduce a simple dimension-matching parameter proposal to implement the sampling-based posterior inference for special cases where each segmented distribution on a circle has the same set of regulating parameters. Molecular biology research reveals that, cancer progression may involve DNA copy number alteration at genome regions and connection of two biologically inactive chromosome ends results in a circle holding multiple epidemic/square waves. A slight modification of a simple novel Bayesian change point identification algo rithm, random grafting-pruning Markov chain Monte Carlo (RGPMCMC), is proposed by adjusting the original change point birth/death symmetric transition probability with a differ-by-one change point number ratio. The algorithm performance is studied through simulations with connection to DNA copy number alteration detection, which promises potential applica tion to cancer diagnosis at the genome level.
Abstract: Data systems collecting information from different sources or over long periods of time can receive multiple reports from the same indi vidual. An important example is public health surveillance systems that monitor conditions with long natural histories. Several state-level systems for surveillance of one such condition, the human immunodeficiency virus (HIV), use codes composed of combinations of non-unique personal charac teristics such as birth date, soundex (a code based on last name), and sex as patient identifiers. As a result, these systems cannot distinguish between several different individuals having identical codes and a unique individual erroneously represented several times. We applied results for occupancy models to estimate the potential magnitude of duplicate case counting for AIDS cases reported to the Centers for Disease Control and Prevention with only non-unique partial personal identifiers. Occupancy models with equal and unequal occupancy probabilities are considered. Unbiased estimators for the numbers of true duplicates within and between case reporting areas are provided. Formulas to calculate estimators’ variances are also provided. These results can be applied to evaluating duplicate reporting in other data systems that have no unique identifier for each individual.
Abstract: The Weibull distribution is the most important distribution for problems in reliability. We study some mathematical properties of the new wider Weibull-G family of distributions. Some special models in the new family are discussed. The properties derived hold to any distribution in this family. We obtain general explicit expressions for the quantile function, or dinary and incomplete moments, generating function and order statistics. We discuss the estimation of the model parameters by maximum likelihood and illustrate the potentiality of the extended family with two applications to real data.
Some specific random fields have been studied by many researchers whose finite-dimensional marginal distributions are multivariate closed skewnormal or multivariate extended skew-t, in time and spatial domains. In this paper, a necessary and sufficient condition is provided for applicability of such random field in spatial interpolation, based on the marginal distributions. Two deficiencies of the random fields generated by some well-known multivariate distributions are pointed out and in contrast, a suitable skew and heavy tailed random field is proposed. The efficiency of the proposed random field is illustrated through the interpolation of a real data.
Abstract: Interval estimation for the proportion parameter in one-sample misclassified binary data has caught much interest in the literature. Re cently, an approximate Bayesian approach has been proposed. This ap proach is simpler to implement and performs better than existing frequen tist approaches. However, because a normal approximation to the marginal posterior density was used in this Bayesian approach, some efficiency may be lost. We develop a closed-form fully Bayesian algorithm which draws a posterior sample of the proportion parameter from the exact marginal posterior distribution. We conducted simulations to show that our fully Bayesian algorithm is easier to implement and has better coverage than the approximate Bayesian approach.