Exact Inference for Transformed Large-Scale Varying Coefficient Models with Applications
Pub. online: 23 April 2025
Type: Statistical Data Science
Open Access
Received
16 August 2024
16 August 2024
Accepted
20 March 2025
20 March 2025
Published
23 April 2025
23 April 2025
Abstract
Studying migration patterns driven by extreme environmental events is crucial for building a sustainable society and stable economy. Motivated by a real dataset about human migrations, this paper develops a transformed varying coefficient model for origin and destination (OD) regression to elucidate the complex associations of migration patterns with spatio-temporal dependencies and socioeconomic factors. Existing studies often overlook the dynamic effects of these factors in OD regression. Furthermore, with the increasing ease of collecting OD data, the scale of current OD regression data is typically large, necessitating the development of methods for efficiently fitting large-scale migration data. We address the challenge by proposing a new Bayesian interpretation for the proposed OD models, leveraging sufficient statistics for efficient big data computation. Our method, inspired by migration studies, promises broad applicability across various fields, contributing to refined statistical analysis techniques. Extensive numerical studies are provided, and insights from real data analysis are shared.
Supplementary material
Supplementary MaterialWe provide more technical details, simulation results, and real data analysis as the pdf file in the supplemental material. Data files and simulation code used in the article can also be found in the supplemental material.
References
Beine M, Bertoli S, Fernández-Huertas Moraga J (2016). A practitioners’ guide to gravity models of international migration. World Economy, 39: 496–512. https://doi.org/10.1111/twec.12265
Box GEP, Cox DR (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 26: 211–243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x
Cai Z, Fan J, Li R (2000). Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association, 95: 888–902. https://doi.org/10.1080/01621459.2000.10474280
Claeskens G, Krivobokova T, Opsomer JD (2009). Asymptotic properties of penalized spline estimators. Biometrika, 96: 529–544. https://doi.org/10.1093/biomet/asp035
Dharamshi A, Neufeld A, Motwani K, Gao LL, Witten D, Bien J (2025). Generalized data thinning using sufficient statistics. Journal of the American Statistical Association, 120: 511–523. https://doi.org/10.1080/01621459.2024.2353948
Eilers PH, Marx BD (1996). Flexible smoothing with b-splines and penalties. Statistical Science, 11: 89–121. https://doi.org/10.1214/ss/1038425655
Fields GS (1979). Place-to-place migration: Some new evidence. Review of Economics and Statistics, 61: 21–32. https://doi.org/10.2307/1924827
Flötteröd G, Liu R (2014). Disaggregate path flow estimation in an iterated dynamic traffic assignment microsimulation. Journal of Intelligent Transportation Systems, 18(2): 204–214. https://doi.org/10.1080/15472450.2013.806854
Hastie T, Tibshirani R (1993). Varying coefficient models. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 55: 757–779. https://doi.org/10.1111/j.2517-6161.1993.tb01939.x
Hung NYT, Lin LH, Calhoun VD (2025). Deep p-spline: Theory, fast tuning, and application. arXiv preprint: https://arxiv.org/abs/2501.01376.
Wang C, Chen MH, Schifano E, Wu J, Yan J (2016). Statistical methods and computing for big data. Statistics and its Interface, 9: 399. https://doi.org/10.4310/SII.2016.v9.n4.a1
Zhang T, Yang B (2017). Box-Cox transformation in big data. Technometrics, 59: 189–201. https://doi.org/10.1080/00401706.2016.1156025