Abstract: This paper extends the analysis of the bivariate Seemingly Unrelated (SUR) Tobit by modeling its nonlinear dependence structure through copula and assuming non-normal marginal error distributions. For model estimation, the use of copula methods enables the use of the (classical) Inference Function for Margins (IFM) method by Joe and Xu (1996), which is more computationally attractive (feasible) than the full maximum likelihood approach. However, our simulation study shows that the IFM method provides a biased estimate of the copula parameter in the presence of censored observations in both margins. In order to obtain an unbiased estimate of the copula association parameter, we propose/develop a modified version of the IFM method, which we refer to as Inference Function for Augmented Margins (IFAM). Since the usual asymptotic approach, that is the computation of the asymptotic covariance matrix of the parameter estimates, is troublesome, we propose the use of resampling procedures (bootstrap methods) to obtain confidence intervals for the copula-based SUR Tobit model parameters. The satisfactory results from the simulation and empirical studies indicate the adequate performance of our proposed model and methods. We illustrate our procedure using bivariate data on consumption of salad dressings and lettuce by U.S. individuals.
Abstract: Despite the unreasonable feature independence assumption, the naive Bayes classifier provides a simple way but competes well with more sophisticated classifiers under zero-one loss function for assigning an observation to a class given the features observed. However, it has been proved that the naive Bayes works poorly in estimation and in classification for some cases when the features are correlated. To extend, researchers had developed many approaches to free of this primary but rarely satisfied assumption in the real world for the naive Bayes. In this paper, we propose a new classifier which is also free of the independence assumption by evaluating the dependence of features through pair copulas constructed via a graphical model called D-Vine tree. This tree structure helps to decompose the multivariate dependence into many bivariate dependencies and thus makes it possible to easily and efficiently evaluate the dependence of features even for data with high dimension and large sample size. We further extend the proposed method for features with discrete-valued entries. Experimental studies show that the proposed method performs well for both continuous and discrete cases.
In this study we have considered different methods of estimation of the unknown parameters of a two-parameter unit-Gamma (UG) distribution from the frequentists point of view. First, we briefly describe different frequentists approaches: maximum likelihood estimators, moments estimators, least squares estimators, maximum product of spacings estimators, method of Cramer-von-Mises, methods of AndersonDarling and four variants of Anderson-Darling test and compare them using extensive numerical simulations. Monte Carlo simulations are performed to compare the performances of the proposed methods of estimation for both small and large samples. The performances of the estimators have been compared in terms of their bias and root mean squared error using simulated samples. Also, for each method of estimation, we consider the interval estimation using the bootstrap method and calculate the coverage probability and the average width of the bootstrap confidence intervals. The study reveals that the maximum product of spacing estimators and Anderson-Darling 2 (AD2) estimators are highly competitive with the maximum likelihood estimators in small and large samples. Finally, two real data sets have been analyzed for illustrative purposes.
Abstract: This paper presents a permutation test for the incomplete pairs setting. This situation arises in both observational and experimental studies when some of the data are in the form of a paired sample and the rest of the data comprise two independent samples. The proposed method uses the data from the two types of samples to test the difference between the mean responses. Our test statistic combines the observed mean difference for the complete pairs with the difference between the two means of the independent samples. The randomizations are carried out as is typically done with standard permutation tests for paired and independent samples. We show by a simulation study that our statistic performs well in comparison to other methods.
The study of semiparametric families is useful because it provides methods of extending families for adding flexibility in fitting data. The main aim of this paper is to introduce a class of bivariate semiparametric families of distributions. One especial bivariate family of the introduced semiparametric families is discussed in details with its sub-models and different properties. In most of the cases the joint probability distribution, joint distribution and joint hazard functions can be expressed in compact forms. The maximum likelihood and Bayesian estimation are considered for the vector of the unknown parameters. For illustrative purposes a data set has been re-analyzed and the performances are quite satisfactory. A simulation study is performed to see the performances of the estimators.
Abstract: Observational studies of relatively large data can have potentially hidden heterogeneity with respect to causal effects and propensity scores–patterns of a putative cause being exposed to study subjects. This underlying heterogeneity can be crucial in causal inference for any observational studies because it is systematically generated and structured by covariates which influence the cause and/or its related outcomes. Addressing the causal inference problem in view of data structure, machine learning techniques such as tree analysis can be naturally necessitated. Kang, Su, Hitsman, Liu and Lloyd-Jones (2012) proposed Marginal Tree (MT) procedure to explore both the confounding and interacting effects of the covariates on causal inference. In this paper, we extend the MT method to the case of binary responses along with a clear exposition of its relationship with established causal odds ratio. We assess the causal effect of dieting on emotional distress using both a real data set from the Lalonde’s National Supported Work Demonstration Analysis (NSW) and a simulated data set from the National Longitudinal Study of Adolescent Health (Add Health).
A technique is proposed to estimate the conception rate using the distribution of first birth interval of recently married women. The proposed technique adjusts the truncation and selection effects present in a crosssectional data. Real data from NFHS-3 and NFHS-4 are used for illustration.
Abstract: In this paper, we introduce a Bayesian analysis for bivariate geometric distributions applied to lifetime data in the presence of covariates, censored data and cure fraction using Markov Chain Monte Carlo (MCMC) methods. We show that the use of a discrete bivariate geometric distribution could bring us some computational advantages when compared to standard existing bivariate exponential lifetime distributions introduced in the literature assuming continuous lifetime data as for example, the exponential Block and Basu bivariate distribution. Posterior summaries of interest are obtained using the popular OpenBUGS software. A numerical illustration is introduced considering a medical data set related to the analysis of a diabetic retinopathy data set.
Abstract: Ranked set sampling and some of its variants have been applied successfully in different areas of applications such as industrial statistics, economics, environmental and ecological studies, biostatistics, and statistical genetics. Ranked set sampling is a sampling method that more efficient than simple random sampling. Also, it is well known that Fisher information of a ranked set sample (RSS) is larger than Fisher information of a simple random sample (SRS) of the same size about the unknown parameter of the underlying distribution in parametric inference. In this paper, we consider the Farlie-Gumbel-Morgenstern (FGM) family and study the information measures such as Shannon’s entropy, Rényi entropy, mutual information, and Kullback-Leibler (KL) information of RSS data. Also, we investigate their properties and compare them with a SRS data.
The odd inverse Pareto-Weibull distribution is introduced as a new lifetime distribution based on the inverse Pareto and the T-X family. Some mathematical properties of the new distribution are studied. The method of maximum likelihood is used for estimating the model parameters and the observed Fisher’s information matrix is derived. The importance and flexibility of the proposed model are assessed using a real data.