We define and study a three-parameter model with positive real support called the exponentiated generalized extended Pareto distribution. We provide a comprehensive mathematical treatment and prove that the formulas related to the new model are simple and manageable. We study the behaviour of the maximum likelihood estimates for the model parameters using Monte Carlo simulation. We take advantage of applied studies and offer two applications to real data sets that proves empirically the power of adjustment of the new model when compared to another twelve lifetime distributions.
Abstract: The five parameter Kumaraswamy generalized gamma model (Pas coa et al., 2011) includes some important distributions as special cases and it is very useful for modeling lifetime data. We propose an extended version of this distribution by assuming that a shape parameter can take negative values. The new distribution can accommodate increasing, decreasing, bath tub and unimodal shaped hazard functions. A second advantage is that it also includes as special models reciprocal distributions such as the recipro cal gamma and reciprocal Weibull distributions. A third advantage is that it can represent the error distribution for the log-Kumaraswamy general ized gamma regression model. We provide a mathematical treatment of the new distribution including explicit expressions for moments, generating function, mean deviations and order statistics. We obtain the moments of the log-transformed distribution. The new regression model can be used more effectively in the analysis of survival data since it includes as sub models several widely-known regression models. The method of maximum likelihood and a Bayesian procedure are used for estimating the model pa rameters for censored data. Overall, the new regression model is very useful to the analysis of real data.
Forward regression has been criticised heavily and one of the many reasons is regarding its speed and its stopping criteria. The main focus of this paper is on demonstrating how to make it efficient, using R. Our method worksfor continuous predictor variables only, as the use of the partial correlation plays the most important role.
Abstract: This paper extends the analysis of the bivariate Seemingly Unrelated (SUR) Tobit by modeling its nonlinear dependence structure through copula and assuming non-normal marginal error distributions. For model estimation, the use of copula methods enables the use of the (classical) Inference Function for Margins (IFM) method by Joe and Xu (1996), which is more computationally attractive (feasible) than the full maximum likelihood approach. However, our simulation study shows that the IFM method provides a biased estimate of the copula parameter in the presence of censored observations in both margins. In order to obtain an unbiased estimate of the copula association parameter, we propose/develop a modified version of the IFM method, which we refer to as Inference Function for Augmented Margins (IFAM). Since the usual asymptotic approach, that is the computation of the asymptotic covariance matrix of the parameter estimates, is troublesome, we propose the use of resampling procedures (bootstrap methods) to obtain confidence intervals for the copula-based SUR Tobit model parameters. The satisfactory results from the simulation and empirical studies indicate the adequate performance of our proposed model and methods. We illustrate our procedure using bivariate data on consumption of salad dressings and lettuce by U.S. individuals.
Abstract: Despite the unreasonable feature independence assumption, the naive Bayes classifier provides a simple way but competes well with more sophisticated classifiers under zero-one loss function for assigning an observation to a class given the features observed. However, it has been proved that the naive Bayes works poorly in estimation and in classification for some cases when the features are correlated. To extend, researchers had developed many approaches to free of this primary but rarely satisfied assumption in the real world for the naive Bayes. In this paper, we propose a new classifier which is also free of the independence assumption by evaluating the dependence of features through pair copulas constructed via a graphical model called D-Vine tree. This tree structure helps to decompose the multivariate dependence into many bivariate dependencies and thus makes it possible to easily and efficiently evaluate the dependence of features even for data with high dimension and large sample size. We further extend the proposed method for features with discrete-valued entries. Experimental studies show that the proposed method performs well for both continuous and discrete cases.
In this study we have considered different methods of estimation of the unknown parameters of a two-parameter unit-Gamma (UG) distribution from the frequentists point of view. First, we briefly describe different frequentists approaches: maximum likelihood estimators, moments estimators, least squares estimators, maximum product of spacings estimators, method of Cramer-von-Mises, methods of AndersonDarling and four variants of Anderson-Darling test and compare them using extensive numerical simulations. Monte Carlo simulations are performed to compare the performances of the proposed methods of estimation for both small and large samples. The performances of the estimators have been compared in terms of their bias and root mean squared error using simulated samples. Also, for each method of estimation, we consider the interval estimation using the bootstrap method and calculate the coverage probability and the average width of the bootstrap confidence intervals. The study reveals that the maximum product of spacing estimators and Anderson-Darling 2 (AD2) estimators are highly competitive with the maximum likelihood estimators in small and large samples. Finally, two real data sets have been analyzed for illustrative purposes.
Abstract: This paper presents a permutation test for the incomplete pairs setting. This situation arises in both observational and experimental studies when some of the data are in the form of a paired sample and the rest of the data comprise two independent samples. The proposed method uses the data from the two types of samples to test the difference between the mean responses. Our test statistic combines the observed mean difference for the complete pairs with the difference between the two means of the independent samples. The randomizations are carried out as is typically done with standard permutation tests for paired and independent samples. We show by a simulation study that our statistic performs well in comparison to other methods.
The study of semiparametric families is useful because it provides methods of extending families for adding flexibility in fitting data. The main aim of this paper is to introduce a class of bivariate semiparametric families of distributions. One especial bivariate family of the introduced semiparametric families is discussed in details with its sub-models and different properties. In most of the cases the joint probability distribution, joint distribution and joint hazard functions can be expressed in compact forms. The maximum likelihood and Bayesian estimation are considered for the vector of the unknown parameters. For illustrative purposes a data set has been re-analyzed and the performances are quite satisfactory. A simulation study is performed to see the performances of the estimators.
Abstract: Observational studies of relatively large data can have potentially hidden heterogeneity with respect to causal effects and propensity scores–patterns of a putative cause being exposed to study subjects. This underlying heterogeneity can be crucial in causal inference for any observational studies because it is systematically generated and structured by covariates which influence the cause and/or its related outcomes. Addressing the causal inference problem in view of data structure, machine learning techniques such as tree analysis can be naturally necessitated. Kang, Su, Hitsman, Liu and Lloyd-Jones (2012) proposed Marginal Tree (MT) procedure to explore both the confounding and interacting effects of the covariates on causal inference. In this paper, we extend the MT method to the case of binary responses along with a clear exposition of its relationship with established causal odds ratio. We assess the causal effect of dieting on emotional distress using both a real data set from the Lalonde’s National Supported Work Demonstration Analysis (NSW) and a simulated data set from the National Longitudinal Study of Adolescent Health (Add Health).
A technique is proposed to estimate the conception rate using the distribution of first birth interval of recently married women. The proposed technique adjusts the truncation and selection effects present in a crosssectional data. Real data from NFHS-3 and NFHS-4 are used for illustration.