Sampling-based Gaussian Mixture Regression for Big Data
Volume 21, Issue 1 (2023), pp. 158–172
Pub. online: 9 August 2022
Type: Statistical Data Science
Open Access
Received
29 May 2022
29 May 2022
Accepted
2 July 2022
2 July 2022
Published
9 August 2022
9 August 2022
Abstract
This paper proposes a nonuniform subsampling method for finite mixtures of regression models to reduce large data computational tasks. A general estimator based on a subsample is investigated, and its asymptotic normality is established. We assign optimal subsampling probabilities to data points that minimize the asymptotic mean squared errors of the general estimator and linearly transformed estimators. Since the proposed probabilities depend on unknown parameters, an implementable algorithm is developed. We first approximate the optimal subsampling probabilities using a pilot sample. After that, we select a subsample using the approximated subsampling probabilities and compute estimates using the subsample. We evaluate the proposed method in a simulation study and present a real data example using appliance energy data.
Supplementary material
Supplementary Material
•
Software: R codes used for the proposed methods and algorithms are available on GitHub https://github.com/pedigree07/OPTMixture.
•
Supplementary document: The supplementary document provides the proofs of the theorems.
References
Lee J, Schifano ED, Wang H (2021). Fast optimal subsampling probability approximation for generalized linear models. Econometrics and Statistics, doi: https://doi.org/10.1016/j.ecosta.2021.02.007.
Wang H, Kim JK (2020). Maximum sampled conditional likelihood for informative subsampling. arXiv preprint: https://arxiv.org/abs/2011.05988.