Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. A Scalable Spatial Decorrelation Preproc ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

A Scalable Spatial Decorrelation Preprocessing Approach for Machine and Deep Learning
Matthew J. Heaton ORCID icon link to view author Matthew J. Heaton details   Andrew Millane   Jake S. Rhodes  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1210
Pub. online: 9 December 2025      Type: Statistical Data Science      Open accessOpen Access

Received
13 June 2025
Accepted
25 November 2025
Published
9 December 2025

Abstract

Spatial data display correlation between observations collected at nearby locations. Generally, machine and deep learning methods either do not account for this correlation or do so indirectly through correlated features. To account for spatial correlation, we propose preprocessing the data using a spatial decorrelation transform motivated from properties of a multivariate Gaussian distribution and Vecchia approximations. The preprocessed, transformed data can then be ported into a machine or deep learning tool. After model fitting on the transformed data, the output can be spatially re-correlated via the corresponding inverse transformation. We show that including this spatial adjustment results in higher predictive accuracy on simulated and real spatial datasets.

Supplementary material

 Supplementary Material
This material is based upon work supported by the National Aeronautics and Space Administration under Grant/Contract/Agreement No. 10053957-01 and by the National Science Foundation under Grant No. 2053188. R and Python implementations of the proposed spatial whitening transformation are available as a zip file or at https://github.com/amillane/spatialtransform. The contents are organized as follows: • README.md: A brief overview of the repository structure and usage instructions. • R Function/ – Functions/TransformFunctions.R: R implementation of the whitening and inverse-whitening transformations. – demo.R: Example code demonstrating use of the R transformation functions. – SimulatedData1.RData: Example simulated dataset for demonstration. – SimulatedData2.RData: Second example simulated dataset. • Python Function/ – Functions/SpatialTransform.py: Python implementation of the whitening and inverse-whitening transformations. – Functions/matern.py: Matern covariance utility functions. – Functions/mknnIndx.py: Nearest-neighbor index construction for Vecchia approximation. – demo.ipynb: Jupyter notebook illustrating how to use the Python implementation. – NonLinSimDataSet17.json: Example nonlinear simulated dataset used in demonstrations. Together, these materials provide complete code and example data needed to reproduce the spatial whitening transformation and the analyses described in the manuscript.

References

 
Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2018). Exageostat: A high performance unified software for geostatistics on manycore systems. IEEE Transactions on Parallel and Distributed Systems, 29(12): 2771–2784. https://doi.org/10.1109/TPDS.2018.2850749
 
Arbia G, Espa G, Giuliani D (2021). Spatial Microeconometrics. Routledge.
 
Banerjee S, Carlin BP, Gelfand AE (2014). Hierarchical Modeling and Analysis for Spatial Data. CRC press.
 
Berrett C, Calder CA (2016). Bayesian spatial binary classification. Spatial Statistics, 16: 72–102. https://doi.org/10.1016/j.spasta.2016.01.004
 
Bishop CM (2006). Pattern Recognition and Machine Learning. Springer, New York, NY, USA, 1st edition.
 
Bradley JR, Cressie N, Shi T (2011). Selection of rank and basis functions in the spatial random effects model. In: Proceedings of the 2011 Joint Statistical Meetings, 3393–3406. American Statistical Association, Alexandria, VA.
 
Bradley JR, Cressie N, Shi T (2016). A comparison of spatial predictors when datasets could be very large. Statistics Surveys, 10: 100–131. https://doi.org/10.1214/16-SS115
 
Chen W, Li Y, Reich BJ, Sun Y (2024). Deepkriging: Spatially dependent deep neural networks for spatial prediction. Statistica Sinica, 34: 291–311. https://doi.org/10.5705/ss.202021.0277
 
Cisneros D, Richards J, Dahal A, Lombardo L, Huser R (2024). Deep graphical regression for jointly moderate and extreme Australian wildfires. Spatial Statistics, 100811. https://doi.org/10.1016/j.spasta.2024.100811
 
Datta A, Banerjee S, Finley AO, Gelfand AE (2016a). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514): 800–812. https://doi.org/10.1080/01621459.2015.1044091
 
Datta A, Banerjee S, Finley AO, Gelfand AE (2016b). On nearest-neighbor Gaussian process models for massive spatial data. Wiley Interdisciplinary Reviews. Computational Statistics, 8(5): 162–171. https://doi.org/10.1002/wics.1383
 
Finley AO, Datta A, Cook BD, Morton DC, Andersen HE, Banerjee S (2019). Efficient algorithms for Bayesian nearest neighbor Gaussian processes. Journal of Computational and Graphical Statistics, 28(2): 401–414. https://doi.org/10.1080/10618600.2018.1537924
 
Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, et al. (2023). A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 56(Suppl 1): 1513–1589. https://doi.org/10.1007/s10462-023-10562-9
 
Gelfand AE, Schliep EM (2016). Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics, 18: 86–104. https://doi.org/10.1016/j.spasta.2016.03.006
 
Georganos S, Grippa T, Niang Gadiaga A, Linard C, Lennert M, Vanhuysse S, et al. (2021). Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2): 121–136. https://doi.org/10.1080/10106049.2019.1595177
 
Gray SD, Heaton MJ, Bolintineanu DS, Olson A (2022). On the use of deep neural networks for large-scale spatial prediction. Journal of Data Science, 20(4): 493–511. https://doi.org/10.6339/22-JDS1070
 
Guinness J (2018). Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics, 60(4): 415–429. https://doi.org/10.1080/00401706.2018.1437476
 
Harris R, Jarvis C (2014). Statistics for Geography and Environmental Science. Routledge.
 
Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, et al. (2019). A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological, and Environmental Statistics, 24: 398–425. https://doi.org/10.1007/s13253-018-00348-w
 
Huang H, Abdulah S, Sun Y, Ltaief H, Keyes DE, Genton MG (2021). Competition on spatial statistics for large datasets. Journal of Agricultural, Biological, and Environmental Statistics, 26: 580–595. https://doi.org/10.1007/s13253-021-00457-z
 
Katzfuss M, Guinness J (2021). A general framework for Vecchia approximations of Gaussian processes. Statistical Science, 36(1). https://doi.org/10.1214/19-STS755
 
Lin DC, Huang HC, Tzeng S (2023). Some enhancements to deepkriging. Stat, 12(1): e559. https://doi.org/10.1002/sta4.559
 
Lindsay BG, Yi GY, Sun J (2011). Issues and strategies in the selection of composite likelihoods. Statistica Sinica, 21(1): 71–105.
 
Nikparvar B, Thill JC (2021). Machine learning of spatial data. ISPRS International Journal of Geo-Information, 10(9): 600. https://doi.org/10.3390/ijgi10090600
 
Pace RK, Barry R, Sirmans CF (1998). Spatial statistics and real estate. The Journal of Real Estate Finance and Economics, 17: 5–13. https://doi.org/10.1023/A:1007783811760
 
Patelli L, Cameletti M, Golini N, Ignaccolo R (2024). A path in regression random forest looking for spatial dependence: A taxonomy and a systematic review. In: Advanced Statistical Methods in Process Monitoring, Finance, and Environmental Science: Essays in Honour of Wolfgang Schmid, Knoth S, Okhrin Y, Otto P, 467–489.
 
Plant RE (2018). Spatial Data Analysis in Ecology and Agriculture Using R. CRC Press.
 
Saha A, Basu S, Datta A (2023). Random forests for spatially dependent data. Journal of the American Statistical Association, 118(541): 665–683. https://doi.org/10.1080/01621459.2021.1950003
 
Sainsbury-Dale M, Zammit-Mangion A, Richards J, Huser R (2025). Neural Bayes estimators for irregular spatial data using graph neural networks. Journal of Computational and Graphical Statistics, 1–16. https://doi.org/10.1080/10618600.2024.2433671
 
Sauer A, Gramacy RB, Higdon D (2023). Active learning for deep Gaussian process surrogates. Technometrics, 65(1): 4–18. https://doi.org/10.1080/00401706.2021.2008505
 
Sekulić A, Kilibarda M, Heuvelink GB, Nikolić M, Bajat B (2020). Random forest spatial interpolation. Remote Sensing, 12(10): 1687. https://doi.org/10.3390/rs12101687
 
Shaddick G, Zidek JV (2015). Spatio-Temporal Methods in Environmental Epidemiology. CRC Press.
 
Stein A, Gelfand A (2022). The impact of spatial statistics. Spatial Statistics, 50: 100641.
 
Stein ML (2014). Limitations on low rank approximations for covariance matrices of spatial data. Spatial Statistics, 8: 1–19. https://doi.org/10.1016/j.spasta.2013.06.003
 
Tonks A, Harris T, Li B, Brown W, Smith R (2024). Forecasting West Nile virus with graph neural networks: Harnessing spatial dependence in irregularly sampled geospatial data. GeoHealth, 8(7): e2023GH000784. https://doi.org/10.1029/2023GH000784
 
Turner R, Eriksson D, McCourt M, Kiili J, Laaksonen E, Xu Z, et al. (2021). Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In: NeurIPS 2020 Competition and Demonstration Track, 3–26. PMLR.
 
Vu Q, Zammit-Mangion A, Cressie N (2022). Modeling nonstationary and asymmetric multivariate spatial covariances via deformations. Statistica Sinica, 32(4): 2071–2093.
 
Waller LA, Gotway CA (2004). Applied Spatial Statistics for Public Health Data. John Wiley & Sons.
 
Wikle CK, Zammit-Mangion A (2023). Statistical deep learning for spatial and spatiotemporal data. Annual Review of Statistics and Its Application, 10: 247–270. https://doi.org/10.1146/annurev-statistics-033021-112628
 
Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1): 26–40.
 
Yuan Q, Shen H, Li T, Li Z, Li S, Jiang Y, et al. (2020). Deep learning in environmental remote sensing: Achievements and challenges. Remote Sensing of Environment, 241, 111716. https://doi.org/10.1016/j.rse.2020.111716
 
Zammit-Mangion A, Kaminski M.D., Tran B.H., Filippone M, Cressie N (2024). Spatial Bayesian neural networks. Spatial Statistics, 60, 100825. https://doi.org/10.1016/j.spasta.2024.100825
 
Zammit-Mangion A, Ng TLJ, Vu Q, Filippone M (2022). Deep compositional spatial models. Journal of the American Statistical Association, 117(540): 1787–1808. https://doi.org/10.1080/01621459.2021.1887741
 
Zhan W, Datta A (2025). Neural networks for geospatial data. Journal of the American Statistical Association, 120(549): 535–547. https://doi.org/10.1080/01621459.2024.2356293
 
Zhang H, Zimmerman J, Nettleton D, Nordman DJ (2020). Random forest prediction intervals. The American Statistician, 74(4): 392–406. https://doi.org/10.1080/00031305.2019.1585288
 
Ziakopoulos A, Yannis G (2020). A review of spatial approaches in road safety. Accident Analysis & Prevention, 135: 105323.

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
Gaussian process predictive accuracy Vecchia approximation whitening transformation

Metrics
since February 2021
539

Article info
views

394

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy