Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes

Saha, Arkajyoti; Datta, Abhirup; Banerjee, Sudipto

doi:10.6339/22-JDS1073

Journal of Data Science

Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes

Volume 20, Issue 4 (2022): Special Issue: Large-Scale Spatial Data Science, pp. 533–544

Arkajyoti Saha Abhirup Datta Sudipto Banerjee

https://doi.org/10.6339/22-JDS1073

Pub. online: 3 November 2022 Type: Statistical Data Science

Open Access

Received
16 August 2022

Accepted
6 October 2022

Published
3 November 2022

Abstract

Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.

Supplementary material

Supplementary Material

This supplementary material contains discussion on why is it infeasible to directly use a Monte Carlo sampling to estimate p ( Y ) in (4), evaluation of the algorithms under consideration with respect to misclassification error, and details of the code and data used in the article.

References

Albert JH, Chib S (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422): 669–679.

Azzalini A, Capitanio A (2014). The Skew-Normal and Related Families, volume 3. Cambridge University Press.

Banerjee S, Gelfand AE (2006). Bayesian wombling: Curvilinear gradient assessment under spatial process models. Journal of the American Statistical Association, 101(476): 1487–1501.

Berrett C, Calder CA (2016). Bayesian spatial binary classification. Spatial Statistics, 16: 72–102.

Botev ZI (2017). The normal law under linear restrictions: simulation and estimation via minimax tilting. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 79(1): 125–148.

Botev Z, Belzile L (2021). TruncatedNormal: Truncated Multivariate Normal and Student Distributions. R package version 2.2.2.

Cao J, Durante D, Genton MG (2022). Scalable computation of predictive probabilities in probit models with gaussian process priors. Journal of Computational and Graphical Statistics, 1–12. https://doi.org/10.1080/10618600.2022.2036614.

Cao J, Genton MG, Keyes DE, Turkiyyah GM (2022). tlrmvnmvt: Computing high-dimensional multivariate normal and student-t probabilities with low-rank methods in r. Journal of Statistical Software, 101: 1–25.

Cao J, Genton M, Keyes D, Turkiyyah G (2020). tlrmvnmvt: Low-Rank Methods for MVN and MVT Probabilities. R package version 1.1.0.

Datta A (2021). Nearest-neighbor sparse cholesky matrices in spatial statistics. Wiley Interdisciplinary Reviews: Computational Statistics, 14(5): e1574.

Datta A, Banerjee S, Finley AO, Gelfand AE (2016a). Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514): 800–812.

Datta A, Banerjee S, Finley AO, Gelfand AE (2016b). On nearest-neighbor gaussian process models for massive spatial data. Wiley Interdisciplinary Reviews: Computational Statistics, 8(5): 162–171.

De Oliveira V (2000). Bayesian prediction of clipped gaussian random fields. Computational Statistics & Data Analysis, 34(3): 299–314.

De Oliveira V, Kedem B, Short DA (1997). Bayesian prediction of transformed gaussian random fields. Journal of the American Statistical Association, 92(440): 1422–1433.

Diggle PJ, Tawn JA, Moyeed RA (1998). Model-based geostatistics. Journal of the Royal Statistical Society. Series C. Applied Statistics, 47(3): 299–350.

Finley AO, Banerjee S, McRoberts RE (2009). Hierarchical spatial models for predicting tree species assemblages across large domains. Annals of Applied Statistics, 3(3): 1052–1079.

Finley AO, Datta A, Cook BD, Morton DC, Andersen HE, Banerjee S (2019). Efficient algorithms for bayesian nearest neighbor gaussian processes. Journal of Computational and Graphical Statistics, 28(2): 401–414.

Genz A (1992). Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1(2): 141–149.

Heagerty PJ, Lele SR (1998). A composite likelihood approach to binary spatial data. Journal of the American Statistical Association, 93(443): 1099–1111.

Lee Y, Nelder JA (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society, Series B, Methodological, 58(4): 619–656.

Saha A, Datta A (2018a). Brisc: bootstrap for rapid inference on spatial covariances. Stat, 7(1): e184.

Saha A, Datta A (2018b). BRISC: Fast Inference for Large Spatial Datasets using BRISC. R package version 0.1.0.

Vecchia AV (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society, Series B, Methodological, 50(2): 297–312.

Zhang Z, Arellano-Valle RB, Genton MG, Huser R (2022). Tractable bayes of skew-elliptical link models for correlated binary data. Biometrics. https://doi.org/10.1111/biom.13731.

2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

binary data generalized linear mixed models spatial, Gaussian processes

Funding

Abhirup Datta was partially supported by National Institute of Environmental Health Sciences (NIEHS) grant R01 ES033739 and by National Science Foundation (NSF) Division of Mathematical Sciences grant DMS-1915803. Sudipto Banerjee was partially supported by the National Science Foundation (NSF) from grants NSF/DMS 1916349 and NSF/IIS 1562303, and by the National Institute of Environmental Health Sciences (NIEHS) from grants R01ES030210 and 5R01ES027027.

Metrics

since February 2021

1285

Article info
views

507

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file