Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 23, Issue 2 (2025): Special Issue: the 2024 Symposium on Data Science and Statistics (SDSS)
  4. Exact Inference for Transformed Large-Sc ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Exact Inference for Transformed Large-Scale Varying Coefficient Models with Applications
Volume 23, Issue 2 (2025): Special Issue: the 2024 Symposium on Data Science and Statistics (SDSS), pp. 353–369
Tianyu Chen   Robert Habans   Thomas Douthat     All authors (6)

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1181
Pub. online: 23 April 2025      Type: Statistical Data Science      Open accessOpen Access

Received
16 August 2024
Accepted
20 March 2025
Published
23 April 2025

Abstract

Studying migration patterns driven by extreme environmental events is crucial for building a sustainable society and stable economy. Motivated by a real dataset about human migrations, this paper develops a transformed varying coefficient model for origin and destination (OD) regression to elucidate the complex associations of migration patterns with spatio-temporal dependencies and socioeconomic factors. Existing studies often overlook the dynamic effects of these factors in OD regression. Furthermore, with the increasing ease of collecting OD data, the scale of current OD regression data is typically large, necessitating the development of methods for efficiently fitting large-scale migration data. We address the challenge by proposing a new Bayesian interpretation for the proposed OD models, leveraging sufficient statistics for efficient big data computation. Our method, inspired by migration studies, promises broad applicability across various fields, contributing to refined statistical analysis techniques. Extensive numerical studies are provided, and insights from real data analysis are shared.

Supplementary material

 Supplementary Material
We provide more technical details, simulation results, and real data analysis as the pdf file in the supplemental material. Data files and simulation code used in the article can also be found in the supplemental material.

References

 
Ashok K (1996). Estimation and prediction of time-dependent origin-destination flows, Ph.D. thesis, Massachusetts Institute of Technology.
 
Beine M, Bertoli S, Fernández-Huertas Moraga J (2016). A practitioners’ guide to gravity models of international migration. World Economy, 39: 496–512. https://doi.org/10.1111/twec.12265
 
Box GEP, Cox DR (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 26: 211–243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x
 
Box GEP, Cox DR (1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77: 209–210.
 
Cai Z, Fan J, Li R (2000). Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association, 95: 888–902. https://doi.org/10.1080/01621459.2000.10474280
 
Casas I, Fernandez-Casal R (2023). tvReg: Time-varying coefficient linear regression for single and multi-equations in R. R package version 0.5.9.
 
Casella G, Berger R (2024). Statistical Inference. Chapman and Hall/CRC, Boca Raton, FL.
 
Claeskens G, Krivobokova T, Opsomer JD (2009). Asymptotic properties of penalized spline estimators. Biometrika, 96: 529–544. https://doi.org/10.1093/biomet/asp035
 
Dambon J, Sigrist F, Furrer R (2022). varycoef: Modeling Spatially Varying Coefficients. R package version 0.3.4.
 
Daniel F, Weston S, Tenenbaum D (2022). Parallel: Foreach Parallel Adaptor for the ‘parallel’. Package. R package version 3.6.2.
 
De Boor C, De Boor C (1978). A Practical Guide to Splines, volume 27. Springer, New York, NY.
 
Dharamshi A, Neufeld A, Motwani K, Gao LL, Witten D, Bien J (2025). Generalized data thinning using sufficient statistics. Journal of the American Statistical Association, 120: 511–523. https://doi.org/10.1080/01621459.2024.2353948
 
Diamond R, McQuade T, Qian F (2019). The effects of rent control expansion on tenants, landlords, and inequality: Evidence from San Francisco. American Economic Review, 109(9): 3365–3394.
 
Dobson AJ, Barnett AG (2018). An Introduction to Generalized Linear Models. Chapman and Hall/CRC, Boca Raton, FL.
 
Dziakm J, Coffman DL, Li R, Litson K, Yajnaseni C (2023). varycoef: Modeling Spatially Varying Coefficients. R package version 1.4.1.
 
Eilers PH, Marx BD (1996). Flexible smoothing with b-splines and penalties. Statistical Science, 11: 89–121. https://doi.org/10.1214/ss/1038425655
 
Eilers PH, Marx BD (2021). Practical Smoothing: The Joys of P-Splines. Cambridge University Press, Cambridge, United Kingdom.
 
Fan J, Zhang W (1999). Statistical estimation in varying coefficient models. The Annals of Statistics, 27: 1491–1518.
 
Fields GS (1979). Place-to-place migration: Some new evidence. Review of Economics and Statistics, 61: 21–32. https://doi.org/10.2307/1924827
 
Flötteröd G, Liu R (2014). Disaggregate path flow estimation in an iterated dynamic traffic assignment microsimulation. Journal of Intelligent Transportation Systems, 18(2): 204–214. https://doi.org/10.1080/15472450.2013.806854
 
Gurak DT, Caces F (1992). Migration networks and the shaping of migration systems. In: International Migration Systems: A Global Approach (Mary M. Kritz, Lin Lean Lim, Hania Zlotnik, eds.), Chapter 9: 150–176.
 
Habans R, Douthat T (2024). Past and Future Migration in Coastal Louisiana. Inter-university Consortium for Political and Social Research [distributor], Ann Arbor, MI.
 
Hastie T, Tibshirani R (1993). Varying coefficient models. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 55: 757–779. https://doi.org/10.1111/j.2517-6161.1993.tb01939.x
 
Hung NYT, Lin LH, Calhoun VD (2025). Deep p-spline: Theory, fast tuning, and application. arXiv preprint: https://arxiv.org/abs/2501.01376.
 
Hung Y, Lin LH, Wu CJ (2022). Varying coefficient frailty models with applications in single molecular experiments. Biometrics, 78: 474–486.
 
Karemera D, Oguledo VI, Davis B (2000). A gravity model analysis of international migration to North America. Applied Economics, 32: 1745–1755.
 
LeSage JP, Fischer MM (2009). Spatial econometric methods for modeling origin-destination flows. In: Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications (Manfred M Fischer, Arthur Getis, eds.), 409–433. Springer, New York, NY.
 
Levine RA, Casella G (2001). Implementations of the Monte Carlo em algorithm. Journal of Computational and Graphical Statistics, 10(3): 422–439.
 
Lin LH, Roshan Joseph V (2020). Transformation and additivity in Gaussian processes. Technometrics, 62: 525–535.
 
Lu Y, Zhang R, Zhu L (2008). Penalized spline estimation for varying-coefficient models. Communications in Statistics - Theory and Methods, 37(14): 2249–2261.
 
Marx BD (2009). P-spline varying coefficient models for complex data. In: Statistical Modelling and Regression Structures, (Thomas Kneib, Gerhard Tutz, eds.), 19–43. Springer, New York, NY.
 
Noursalehi P, Koutsopoulos HN, Zhao J (2021). Dynamic origin-destination prediction in urban rail systems: A multi-resolution spatio-temporal deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 23(6): 5106–5115.
 
Pamuła T, Żochowska R (2023). Estimation and prediction of the od matrix in uncongested urban road network based on traffic flows using deep learning. Engineering Applications of Artificial Intelligence, 117: 105550.
 
Phillips DC (2020). Measuring housing stability with consumer reference data. Demography, 57(4): 1323–1344.
 
Scott SL, Blocker AW, Bonassi FV, Chipman HA, George EI, McCulloch RE (2016). Bayes and big data: The consensus Monte Carlo algorithm. International Journal of Management Science and Engineering Management, 11: 78–88. Routledge: Oxfordshire, United Kingdom.
 
Simonoff JS (2012). Smoothing Methods in Statistics. Springer, New York, NY.
 
Szabó Z, Sriperumbudur BK (2018). Characteristic and universal tensor product kernels. Journal of Machine Learning Research, 18: 1–29.
 
Team RC (2024). Parallel Package. R package version 1.0.17.
 
Tune P, Roughan M, Haddadi H, Bonaventure O (2013). Internet traffic matrices: A primer. Recent Advances in Networking, 1: 1–56.
 
Wang C, Chen MH, Schifano E, Wu J, Yan J (2016). Statistical methods and computing for big data. Statistics and its Interface, 9: 399. https://doi.org/10.4310/SII.2016.v9.n4.a1
 
Wilks SS (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9: 60–62.
 
Wood J, Dykes J, Slingsby A (2010). Visualisation of origins, destinations and flows with od maps. The Cartographic Journal, 47: 117–129.
 
Zhang T, Yang B (2017). Box-Cox transformation in big data. Technometrics, 59: 189–201. https://doi.org/10.1080/00401706.2016.1156025

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
big data computation dynamic dependencies nonparametric regression

Funding
This study was supported by the U.S. Department of the Treasury through the Louisiana Coastal Protection and Restoration Authority’s Center of Excellence Research Grants Program under the Resources and Ecosystems Sustainability, Tourist Opportunities, and Revived Economies of the Gulf Coast States Act of 2012 (RESTORE Act) (Award No. 1 RCEGR260007-01-00). The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of the Department of the Treasury.

Metrics
since February 2021
62

Article info
views

21

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy