Matlab, Python and R have all been used successfully in teaching college students fundamentals of mathematics & statistics. In today’s data driven environment, the study of data through big data analytics is very powerful, especially for the purpose of decision making and using data statistically in this data rich environment. MatLab can be used to teach introductory mathematics such as calculus and statistics. Both Python and R can be used to make decisions involving big data. On the one hand, Python is perfect for teaching introductory statistics in a data rich environment. On the other hand, while R is a little more involved, there are many customizable programs that can make somewhat involved decisions in the context of prepackaged, preprogrammed statistical analysis.
Abstract: A family of distribution is proposed by using Kumaraswamy-G ( Kw − G ) distribution as the base line distribution in the generalized Marshall-Olkin (GMO) construction. By expanding the probability density function and the survival function as infinite series the proposed family is seen as infinite mixtures of the Kw − G distribution. Series expansions of the density function for order statistics are also obtained. Moments, moment generating function, Rényi entropy, quantile function, random sample generation, asymptotes, shapes and stochastic orderings are also investigated. Maximum likelihood estimation, their large sample standard error, confidence intervals and method of moment are presented. Three real life illustrations of comparative data modeling applications with some of the important sub mode
Semi-parametric Cox regression and parametric methods have been used to analyze survival data of cancer; however, no study has focused on the comparison of survival models in genetic association analysis of age at onset (AAO) of cancer. The Hepatocyte nuclear factor-1- beta (HNF1B) gene has been associated with risk of endometrial and prostate cancers; however, no study has focused on the effect of HNF1B gene on the AAO of cancer. This study examined 23 single nucleotide polymorphisms (SNPs) within the HNF1B gene in the Marshfield sample with 716 cancer cases and 2,848 non-cancer controls. Cox proportional hazards models in PROC PHREG and parametric survival models (including exponential, Weibull, log-normal, log-logistic, and gamma models) in PROC LIFEREG in SAS 9.4 were used to detect the genetic association of HNF1B gene with the AAO. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to compare the Cox models and parametric survival models. Both AIC and BIC values showed that the Weibull distribution is the best model for all the 23 SNPs and the Gamma distribution is the second best. The top two SNPs are rs4239217 and rs7501939 with time ratio (TR) =1.08 (p<0.0001 for the AA and AG genotypes, respectively) and 1.07 (p=0.0004 and 0.0002 for CC and CT genotypes, respectively) based on the Weibull model, respectively. This study shows that the parametric Weibull distribution is the best model for the genetic association of AAO of cancer and provides the first evidence of several genetic variants within the HNF1B gene associated with AAO of cancer.
We introduce a new class of distributions called the generalized odd generalized exponential family. Some of its mathematical properties including explicit expressions for the ordinary and incomplete moments, quantile and generating functions, R𝑒́nyi, Shannon and q-entropies, order statistics and probability weighted moments are derived. We also propose bivariate generalizations. We constructed a simple type Copula and intro-duced a useful stochastic property. The maximum likelihood method is used for estimating the model parameters. The importance and flexibility of the new family are illustrated by means of two applications to real data sets. We assess the performance of the maximum likelihood estimators in terms of biases and mean squared errors via a simulation study.
We introduce a four-parameter distribution, called the Zografos-Balakrishnan Burr XII distribution. Our purpose is to provide a Burr XII generalization that may be useful to still more complex situations. The new distribution may be an interesting alternative to describe income distributions and can also be applied in actuarial science, finance, bioscience, telecommunications and modelling lifetime data, for example. It contains as special models some well-known distributions, such as the log-logistic, Weibull, Lomax and Burr XII distributions, among others. Some of its structural properties are investigated. The method of maximum likelihood is used for estimating the model parameters and a simulation study is conducted. We provide two application to real data to demonstrate the usefulness of the proposed distribution. Since the Risti´c-Balakrishnan Burr XII distribution has a similar structure to the studied distribution, we also present some of its properties and expansions.
Recent decades have witnessed a series of damages in the financial sector due to the unpleasant movements of prices beyond certain limits. These movements are commonly termed as Financial Bubbles. The formation and burst of a bubble creates huge damage in the field of finance. Hence in order to prevent the market from facing damages, the detection and modeling of financial bubble is very essential. We proposed improved test procedures for detecting financial bubbles by combining the existing Max test and Supremum Augmented Dickey Fuller (SADF) test generally used for detecting bubbles. The performance of proposed test is compared with existing tests via Monte Carlo simulation. It is observed that the proposed test have higher power compared to the existing tests, for detecting collapsible bubble irrespective of window length and collapsible probability. Further the power of proposed test increases as window size decreases. The empirical study of S&P 500 monthly data from January 2006 to December 2010 is carried out to demonstrate the advantages of proposed test procedures over existing tests.
This article addresses the various mathematical and statistical properties of the Burr type XII distribution (such as quantiles, moments, moment generating function, hazard rate, conditional moments, mean residual lifetime, mean past lifetime, mean deviation about mean and median, stochasic ordering, stress-strength parameter, various entropies, Bonferroni and Lorenz curves and order statistics) are derived. We discuss some exact expressions and recurrence relations for the single and product moments of upper record values. Further, using relations of single moments, we have tabulated the means and variances of upper record values from samples of sizes up to 10 for various values of the α and β. Finally a characterization of this distribution based on conditional moments of record values and recurrence relation of kth record values is presented.
In this paper, we introduce a new lifetime model, called the Gen- eralized Weibull-Burr XII distribution. We discuss some of its mathematical properties such as density, hazard rate functions, quantile function and mo- ments. Maximum likelihood method is used to estimate model parameters. A simulation study is performed to assess the performance of maximum like- lihood estimators by means of biases, mean squared errors. Finally, we prove that the proposed distribution is a very competitive model to other classical models by means of application on real data set.
This paper presents an empirical study of a recently compiled workforce analytics data-set modeling employment outcomes of Engineering students. The contributions reported in this paper won the data challenge of the ACM IKDD 2016 Conference on Data Science. Two problems are addressed - regression using heterogeneous information types and the extraction of insights/trends from data to make recommendations; these goals are supported by a range of visualizations. Whereas the data-set is specific to a nation, the underlying techniques and visualization methods are generally applicable. Gaussian processes are proposed to model and predict salary as a function of heterogeneous independent attributes. Key novelties the GP approach brings to the domain of understanding workforce analytics are (a) statistically sound notion of uncertainty of prediction that is data dependent, (b) automatic relevance determination of various independent attributes to the dependent variable (salary),(c) seamless incorporation of both numeric and string attributes within the same regression frame- work without dichotomization; specifically, string attributes include single-word or categorical (e.g. gender) or nominal attributes (e.g. college tier) or multi-word attributes (e.g. specialization) and (d) treatment of all data as being correlated towards making predictions. Insights from both predictive modeling approaches and data analysis were used to suggest factors, that if improved, might lead to better starting salaries for Engineering students. A range of visualization techniques were used to extract key employment patterns from the data.