Improvement of statistical learning models to increase efficiency in solving classification or regression problems is a goal pursued by the scientific community. Particularly, the support vector machine model has become one of the most successful algorithms for this task. Despite the strong predictive capacity from the support vector approach, its performance relies on the selection of hyperparameters of the model, such as the kernel function that will be used. The traditional procedures to decide which kernel function will be used are computationally expensive, in general, becoming infeasible for certain datasets. In this paper, we proposed a novel framework to deal with the kernel function selection called Random Machines. The results improved accuracy and reduced computational time, evaluated over simulation scenarios, and real-data benchmarking.
Abstract: Bivariate data analysis plays a key role in several areas where the variables of interest are obtained in a paired form, leading to the con sideration of possible association measures between them. In most cases, it is common to use known statistics measures such as Pearson correlation, Kendall’s and Spearman’s coefficients. However, these statistics measures may not represent the real correlation or structure of dependence between the variables. Fisher and Switzer (1985) proposed a rank-based graphical tool, the so called chi-plot, which, in conjunction with its Monte Carlo based confidence interval can help detect the presence of association in a random sample from a continuous bivariate distribution. In this article we construct the asymptotic confidence interval for the chi-plot. Via a Monte Carlo simulation study we discovery the coverage probabilities of the asymptotic and the Monte Carlo based confidence intervals are similar. A immediate advantage of the asymptotic confidence interval over the Monte Carlo based one is that it is computationally less expensive providing choices of any confidence level. Moreover, it can be implemented straightforwardly in the existing statistical softwares. The chi-plot approach is illustrated in on the average intelligence and atheism rates across nations data.
Abstract: In this paper we propose a new bivariate long-term distribution based on the Farlie-Gumbel-Morgenstern copula model. The proposed model allows for the presence of censored data and covariates in the cure parameter. For inferential purpose a Bayesian approach via Markov Chain Monte Carlo (MCMC) is considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we develop a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated on artificial and real HIV data.
Abstract: In this paper we propose a new three-parameters lifetime distribu tion with decreasing hazard function, the long-term exponential geometric distribution. The new distribution arises on latent competing risks scenarios, where the lifetime associated with a particular risk is not observable, rather we observe only the minimum lifetime value among all risks, and there is presence of long-term survival. The properties of the proposed distribution are discussed, including its probability density function and explicit algebraic formulas for its survival and hazard functions, order statistics, Bonferroni function and the Lorenz curve. The parameter estimation is based on the usual maximum likelihood approach. We compare the new distribution with its particular case, the long-term exponential distribution, as well as with the long-term Weibull distribution on two real datasets, observing its poten tial and competitiveness in comparison with an usual lifetime distribu
Abstract: In any sport competition, there is a strong interest in knowing which team shall be the champion at the end of the championship. Besides this, the end result of a match, the chance of a team to be qualified for a specific tournament, the chance of being relegated, the best attack, the best defense, among others, are also subject of interest. In this paper we present a simple method with good predictive quality, easy implementation, low computational effort, which allows the calculation of all the interesting quantities above. Following Lee (1997), we estimate the average goals scored by each team by assuming that the number of goals scored by a team in a match follows a univariate Poisson distribution but we consider linear models that express the sum and the difference of goals scored in terms of five covariates: the goal average in a match, the home-team advantage, the team’s offensive power, the opponent team’s defensive power and a crisis indicator. The methodology is applied to the 2008-2009 English Premier League.
Abstract: This paper extends the analysis of the bivariate Seemingly Unrelated (SUR) Tobit by modeling its nonlinear dependence structure through copula and assuming non-normal marginal error distributions. For model estimation, the use of copula methods enables the use of the (classical) Inference Function for Margins (IFM) method by Joe and Xu (1996), which is more computationally attractive (feasible) than the full maximum likelihood approach. However, our simulation study shows that the IFM method provides a biased estimate of the copula parameter in the presence of censored observations in both margins. In order to obtain an unbiased estimate of the copula association parameter, we propose/develop a modified version of the IFM method, which we refer to as Inference Function for Augmented Margins (IFAM). Since the usual asymptotic approach, that is the computation of the asymptotic covariance matrix of the parameter estimates, is troublesome, we propose the use of resampling procedures (bootstrap methods) to obtain confidence intervals for the copula-based SUR Tobit model parameters. The satisfactory results from the simulation and empirical studies indicate the adequate performance of our proposed model and methods. We illustrate our procedure using bivariate data on consumption of salad dressings and lettuce by U.S. individuals.
Compositional data consist of known compositions vectors whose components are positive and defined in the interval (0,1) representing proportions or fractions of a “whole”. The sum of these components must be equal to one. Compositional data is present in different knowledge areas, as in geology, economy, medicine among many others. In this paper, we propose a new statistical tool for volleyball data, i.e., we introduce a Bayesian anal- ysis for compositional regression applying additive log-ratio (ALR) trans- formation and assuming uncorrelated and correlated errors. The Bayesian inference procedure based on Markov Chain Monte Carlo Methods (MCMC). The methodology is applied on an artificial and a real data set of volleyball.
In this paper a new two-parameter distribution is proposed. This new model provides more flexibility to modeling data with increasing and bathtub hazard rate function. Several statistical and reliability properties of the proposed model are also presented in this paper, such as moments, moment generating function, order statistics and stress-strength reliability. The maximum likelihood estimators for the parameters are discussed as well as a bias corrective approach based on bootstrap techniques. A numerical simulation is carried out to examine the bias and the mean square error of the proposed estimators. Finally, an application using a real data set is presented to illustrate our model.