Abstract: In this study, we compared various block bootstrap methods in terms of parameter estimation, biases and mean squared errors (MSE) of the bootstrap estimators. Comparison is based on four real-world examples and an extensive simulation study with various sample sizes, parameters and block lengths. Our results reveal that ordered and sufficient ordered non-overlapping block bootstrap methods proposed by Beyaztas et al. (2016) provide better results in terms of parameter estimation and its MSE compared to conventional methods. Also, sufficient non-overlapping block bootstrap method and its ordered version have the smallest MSE for the sample mean among the others.
It is well known that under certain regularity conditions the boot- strap sampling distributions of common statistics are consistent with their true sampling distributions. However, the consistency results rely heavily on the underlying regularity conditions and in fact, a failure to satisfy some of these may lead us to a serious departure from consistency. Consequently, the ‘sufficient bootstrap’ method (which only uses distinct units in a bootstrap sample in order to reduce the computational burden for larger sample sizes) based sampling distributions will also be inconsistent. In this paper, we combine the ideas of sufficient and m-out-of-n (m/n) bootstrap methods to regain consistency. We further propose the iterated version of this bootstrap method in non-regular cases and our simulation study reveals that similar or even better coverage accuracies than percentile bootstrap confidence inter- vals can be obtained through the proposed iterated sufficient m/n bootstrap with less computational time each case.
Abstract: We propose two classes of nonparametric point estimators of θ = P(X < Y ) in the case where (X, Y ) are paired, possibly dependent, absolutely continuous random variables. The proposed estimators are based on nonparametric estimators of the joint density of (X, Y ) and the distri bution function of Z = Y − X. We explore the use of several density and distribution function estimators and characterise the convergence of the re sulting estimators of θ. We consider the use of bootstrap methods to obtain confidence intervals. The performance of these estimators is illustrated us ing simulated and real data. These examples show that not accounting for pairing and dependence may lead to erroneous conclusions about the rela tionship between X and Y .
Ensemble techniques have been gaining strength among machine learning models, considering supervised tasks, due to their great predictive capacity when compared with some traditional approaches. The random forest is considered to be one of the off-the-shelf algorithms due to its flexibility and robust performance to both regression and classification tasks. In this paper, the random machines method is applied over simulated data sets and benchmarking datasets in order to be compared with the consolidated random forest models. The results from simulated models show that the random machines method has a better predictive performance than random forest in most of the investigated data sets. Three real data situations demonstrate that the random machines may be used to solve real-world problems with competitive payoff.
Improvement of statistical learning models to increase efficiency in solving classification or regression problems is a goal pursued by the scientific community. Particularly, the support vector machine model has become one of the most successful algorithms for this task. Despite the strong predictive capacity from the support vector approach, its performance relies on the selection of hyperparameters of the model, such as the kernel function that will be used. The traditional procedures to decide which kernel function will be used are computationally expensive, in general, becoming infeasible for certain datasets. In this paper, we proposed a novel framework to deal with the kernel function selection called Random Machines. The results improved accuracy and reduced computational time, evaluated over simulation scenarios, and real-data benchmarking.