On the classical estimation of bivariate copula-based Seemingly unrelated tobit models through the proposed inference function for augmented margins method
Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used to generate new samples for the minority class. However, in this paper, we challenge the common assumption that data augmentation is necessary to improve predictions on imbalanced datasets. Instead, we argue that adjusting the classifier cutoffs without data augmentation can produce similar results to oversampling techniques. Our study provides theoretical and empirical evidence to support this claim. Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data, and help researchers and practitioners make informed decisions about which methods to use for a given task.
Inspired by the impressive successes of compress sensing-based machine learning algorithms, data augmentation-based efficient Gibbs samplers for Bayesian high-dimensional classification models are developed by compressing the design matrix to a much lower dimension. Ardent care is exercised in the choice of the projection mechanism, and an adaptive voting rule is employed to reduce sensitivity to the random projection matrix. Focusing on the high-dimensional Probit regression model, we note that the naive implementation of the data augmentation-based Gibbs sampler is not robust to the presence of co-linearity in the design matrix – a setup ubiquitous in $n\lt p$ problems. We demonstrate that a simple fix based on joint updates of parameters in the latent space circumnavigates this issue. With a computationally efficient MCMC scheme in place, we introduce an ensemble classifier by creating R ($\sim 25$–50) projected copies of the design matrix, and subsequently running R classification models with the R projected design matrix in parallel. We combine the output from the R replications via an adaptive voting scheme. Our scheme is inherently parallelizable and capable of taking advantage of modern computing environments often equipped with multiple cores. The empirical success of our methodology is illustrated in elaborate simulations and gene expression data applications. We also extend our methodology to a high-dimensional logistic regression model and carry out numerical studies to showcase its efficacy.
Abstract: The association between bivariate binary responses has been studied using Pearson’s correlation coefficient, odds ratio, and tetrachoric correlation coefficient. This paper introduces a copula to model the association. Numerical comparisons between the proposed method and the existing methods are presented. Results show that these methods are comparative. However, the copula method has a clearer interpretation and is easier to extend to bivariate responses with three or more ordinal categories. In addition, a goodness-of-fit test for the selection of a model is performed. Applications of the method on two real data sets are also presented.
Abstract: Copulas have recently emerged as practical methods for multivari ate modeling. To our knowledge, only a limited amount of work has been done to apply copula-based modeling in context analysis. In this study, we generalized Clayton copula under the appropriate weighted function. In some examples, bivariate distributions by using the weighted Clayton cop ula are generalized. Also the properties of generalized Clayton copula are provided. The Clayton copula and weighted Clayton model cannot be used for negative dependence. These have been used to study left tail depen dence. This property is stronger in weighted Clayton model with respect to ordinary Clayton copula. It will also be shown that the generalized Clayton copula is suitable for the probable modeling of the hydrology data.