References

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1070

10.6339/22-JDS1070

Data Science in Action

On the Use of Deep Neural Networks for Large-Scale Spatial Prediction

Gray

Skyler D.

https://orcid.org/0000-0003-4654-9827

Heaton

Matthew J.

mheaton@stat.byu.edu1∗ Bolintineanu

Dan S.

2 Olson

Aaron

3 1Department of Statistics, Brigham Young University, 2152 WVB, Provo, UT 84602, United States 2Fluid and Reactive Processes Department, Sandia National Laboratories, Albuquerque, NM 87185, USA 3Radiation Effects Theory Department, Sandia National Laboratories, Albuquerque, NM 87185, USA

∗Corresponding author. Email: mheaton@stat.byu.edu.

2022

3102022

20449351128720222792022

2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

2022

Open access article under the CC BY license.

For spatial kriging (prediction), the Gaussian process (GP) has been the go-to tool of spatial statisticians for decades. However, the GP is plagued by computational intractability, rendering it infeasible for use on large spatial data sets. Neural networks (NNs), on the other hand, have arisen as a flexible and computationally feasible approach for capturing nonlinear relationships. To date, however, NNs have only been scarcely used for problems in spatial statistics but their use is beginning to take root. In this work, we argue for equivalence between a NN and a GP and demonstrate how to implement NNs for kriging from large spatial data. We compare the computational efficacy and predictive power of NNs with that of GP approximations across a variety of big spatial Gaussian, non-Gaussian and binary data applications of up to size n = 10 6 . Our results suggest that fully-connected NNs perform similarly to state-of-the-art, GP-approximated models for short-range predictions but can suffer for longer range predictions.

Keywords big data fully-connected neural network grid search

NASA

80NSSC20K1594

Sandia

Department of Energy’s National Nuclear Security Administration

DE-NA0003525

This research was supported by NASA grant 80NSSC20K1594 and by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

References

Allaire

, Chollet

(2022). keras: R Interface to ‘Keras’. R package version 2.9.0.

Banerjee

, Gelfand

, Finley

, Sang

(2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 70(4): 825–848.

Chen

, Li

, Reich

, Sun

(2022). Deepkriging: Spatially dependent deep neural networks for spatial prediction. Statistica Sinica. https://doi.org/10.5705/ss.202021.0277.

Cressie

, Johannesson

(2008a). Fixed rank Kriging for very large spatial data sets. Journal of the Royal Statistical Society, Series B, 70: 209–226.

Cressie

, Johannesson

(2008b). Fixed rank Kriging for very large spatial data sets. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 70(1): 209–226.

Cressie

, Wikle

(2015). Statistics for Spatio-Temporal Data. John Wiley & Sons.

Datta

, Banerjee

, Finley

, Gelfand

(2016a). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514): 800–812.

Datta

, Banerjee

, Finley

, Gelfand

(2016b). On nearest-neighbor Gaussian process models for massive spatial data. Wiley Interdisciplinary Reviews: Computational Statistics, 8(5): 162–171.

Diggle

, Ribeiro

, Christensen

(2003). An introduction to model-based geostatistics. In: Spatial Statistics and Computational Methods, 43–86. Springer.

Diggle

, Tawn

, Moyeed

(1998). Model-based geostatistics. Journal of the Royal Statistical Society. Series C. Applied Statistics, 47(3): 299–350.

El Bannany

, Khedr

, Sreedharan

, Kanakkayil

(2021). Financial distress prediction based on multi-layer perceptron with parameter optimization. IAENG International Journal of Computer Science, 48: 3.

Furrer

, Genton

, Nychka

(2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3): 502–523.

Gelfand

, Schliep

(2016). Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics, 18: 86–104. Spatial Statistics Avignon: Emerging Patterns.

Genton

, Kleiber

(2015). Cross-covariance functions for multivariate geostatistics. Statistical Science, 30(2): 147–163.

Gerber

, Nychka

(2021). Fast covariance parameter estimation of spatial Gaussian process models using neural networks. Stat, 10(1): e382.

Heaton

, Datta

, Finley

, Furrer

, Guinness

, Guhaniyogi

, et al. (2019). A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological, and Environmental Statistics, 24(3): 398–425.

Higdon

(1998). A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environmental and Ecological Statistics, 5(2): 173–190.

Huang

, Abdulah

, Sun

, Ltaief

, Keyes

, Genton

(2021a). Competition on spatial statistics for large datasets. Journal of Agricultural, Biological, and Environmental Statistics, 26(4): 580–595.

Huang

, Blake

, Katzfuss

, Hammerling

(2021b). Nonstationary spatial modeling of massive global satellite data. arXiv preprint: https://arxiv.org/abs/2111.13428.

Hughes

, Haran

(2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 75(1): 139–159.

Jabbar

, Khan

(2015). Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study). Computer Science, Communication and Instrumentation Devices, 70: 163–172.

Katzfuss

(2017). A multi-resolution approximation for massive spatial datasets. Journal of the American Statistical Association, 112(517): 201–214.

Katzfuss

, Guinness

(2021). A general framework for Vecchia approximations of Gaussian processes. Statistical Science, 36(1): 124–141.

Kaufman

, Schervish

, Nychka

(2008). Covariance tapering for likelihood-based estimation in large spatial data sets. Journal of the American Statistical Association, 103(484): 1545–1555.

Lee

, Sohl-dickstein

, Pennington

, Novak

, Schoenholz

, Bahri

(2018). Deep neural networks as Gaussian processes. In: International Conference on Learning Representations.

Lenzi

, Bessac

, Rudi

, Stein

(2021). Neural networks for parameter estimation in intractable models. arXiv preprint: https://arxiv.org/abs/2107.14346.

Liu

, Ong

, Shen

, Cai

(2020). When Gaussian process meets big data: A review of scalable gps. IEEE Transactions on Neural Networks and Learning Systems, 31(11): 4405–4423.

Matthews

, Rowland

, Hron

, Turner

, Ghahramani

(2018). Gaussian process behaviour in wide deep neural networks. arXiv preprint: https://arxiv.org/abs/1804.11271.

Mesa

, Vasquez

, Aguirre

, Valencia

JSB

(2019). Sensor fusion for distance estimation under disturbance with reflective optical sensors using multi layer perceptron (mlp). IEEE Latin America Transactions, 17(09): 1418–1423.

Molnar

, Freiesleben

, König

, Casalicchio

, Wright

, Bischl

(2021). Relating the partial dependence plot and permutation feature importance to the data generating process. arXiv preprint: https://arxiv.org/abs/2109.01433.

Neal

(1994). Priors for infinite networks (tech. rep. no. crg-tr-94-1). University of Toronto.

Nuanmeesri

, Sriurai

(2021). Multi-layer perceptron neural network model development for chili pepper disease diagnosis using filter and wrapper feature selection methods. Engineering, Technology & Applied Science Research, 11(5): 7714–7719.

Nwankpa

, Ijomah

, Gachagan

, Marshall

(2018). Activation functions: Comparison of trends in practice and research for deep learning. arXiv preprint: https://arxiv.org/abs/1811.03378.

Nychka

, Bandyopadhyay

, Hammerling

, Lindgren

, Sain

(2015). A multiresolution Gaussian process model for the analysis of large spatial datasets. Journal of Computational and Graphical Statistics, 24(2): 579–599.

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Ramachandran

, Zoph

, Le

(2017). Searching for activation functions. arXiv preprint: https://arxiv.org/abs/1710.05941.

Sang

, Huang

(2012). A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 74(1): 111–132.

Sauer

, Cooper

, Gramacy

(2022). Vecchia-approximated deep Gaussian processes for computer experiments. arXiv preprint: https://arxiv.org/abs/2204.02904.

Sauer

, Gramacy

, Higdon

(2022). Active learning for deep Gaussian process surrogates. Technometrics. https://doi.org/10.1080/00401706.2021.2008505.

Victoria

, Maragatham

(2021). Automatic tuning of hyperparameters using Bayesian optimization. Evolving Systems, 12(1): 217–223.

Wikle

, Zammit-Mangion

(2022). Statistical deep learning for spatial and spatio-temporal data. arXiv preprint: https://arxiv.org/abs/2206.02218.

, Zhang

, Li

, SS Kawarabayashi Ki

, Jegelka

(2020). How neural networks extrapolate: From feedforward to graph neural networks. arXiv preprint: https://arxiv.org/abs/2009.11848.

Yarotsky

(2018). Optimal approximation of continuous functions by very deep relu networks. In: Conference on Learning Theory (

Bubeck,

Perchet,

Rigollet, eds.), 639–649. PMLR.

Zammit-Mangion

, Ng

TLJ

, Vu

, Filippone

(2021). Deep compositional spatial models. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2021.1887741.

Zammit-Mangion

, Wikle

(2020). Deep integro-difference equation models for spatio-temporal forecasting. Spatial Statistics, 37: 100408.

Zhang

, Jia

, Gao

, Song

, Leung

(2018). Short-term rainfall forecasting using multi-layer perceptron. IEEE Transactions on Big Data, 6(1): 93–106.