Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 20, Issue 4 (2022): Special Issue: Large-Scale Spatial Data Science
  4. Scalable Predictions for Spatial Probit ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes
Volume 20, Issue 4 (2022): Special Issue: Large-Scale Spatial Data Science, pp. 533–544
Arkajyoti Saha   Abhirup Datta   Sudipto Banerjee  

Authors

 
Placeholder
https://doi.org/10.6339/22-JDS1073
Pub. online: 3 November 2022      Type: Statistical Data Science      Open accessOpen Access

Received
16 August 2022
Accepted
6 October 2022
Published
3 November 2022

Abstract

Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.

Supplementary material

 Supplementary Material
This supplementary material contains discussion on why is it infeasible to directly use a Monte Carlo sampling to estimate p ( Y ) in (4), evaluation of the algorithms under consideration with respect to misclassification error, and details of the code and data used in the article.

References

 
Albert JH, Chib S (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422): 669–679.
 
Azzalini A, Capitanio A (2014). The Skew-Normal and Related Families, volume 3. Cambridge University Press.
 
Banerjee S, Gelfand AE (2006). Bayesian wombling: Curvilinear gradient assessment under spatial process models. Journal of the American Statistical Association, 101(476): 1487–1501.
 
Berrett C, Calder CA (2016). Bayesian spatial binary classification. Spatial Statistics, 16: 72–102.
 
Botev ZI (2017). The normal law under linear restrictions: simulation and estimation via minimax tilting. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 79(1): 125–148.
 
Botev Z, Belzile L (2021). TruncatedNormal: Truncated Multivariate Normal and Student Distributions. R package version 2.2.2.
 
Cao J, Durante D, Genton MG (2022). Scalable computation of predictive probabilities in probit models with gaussian process priors. Journal of Computational and Graphical Statistics, 1–12. https://doi.org/10.1080/10618600.2022.2036614.
 
Cao J, Genton MG, Keyes DE, Turkiyyah GM (2022). tlrmvnmvt: Computing high-dimensional multivariate normal and student-t probabilities with low-rank methods in r. Journal of Statistical Software, 101: 1–25.
 
Cao J, Genton M, Keyes D, Turkiyyah G (2020). tlrmvnmvt: Low-Rank Methods for MVN and MVT Probabilities. R package version 1.1.0.
 
Datta A (2021). Nearest-neighbor sparse cholesky matrices in spatial statistics. Wiley Interdisciplinary Reviews: Computational Statistics, 14(5): e1574.
 
Datta A, Banerjee S, Finley AO, Gelfand AE (2016a). Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514): 800–812.
 
Datta A, Banerjee S, Finley AO, Gelfand AE (2016b). On nearest-neighbor gaussian process models for massive spatial data. Wiley Interdisciplinary Reviews: Computational Statistics, 8(5): 162–171.
 
De Oliveira V (2000). Bayesian prediction of clipped gaussian random fields. Computational Statistics & Data Analysis, 34(3): 299–314.
 
De Oliveira V, Kedem B, Short DA (1997). Bayesian prediction of transformed gaussian random fields. Journal of the American Statistical Association, 92(440): 1422–1433.
 
Diggle PJ, Tawn JA, Moyeed RA (1998). Model-based geostatistics. Journal of the Royal Statistical Society. Series C. Applied Statistics, 47(3): 299–350.
 
Finley AO, Banerjee S, McRoberts RE (2009). Hierarchical spatial models for predicting tree species assemblages across large domains. Annals of Applied Statistics, 3(3): 1052–1079.
 
Finley AO, Datta A, Cook BD, Morton DC, Andersen HE, Banerjee S (2019). Efficient algorithms for bayesian nearest neighbor gaussian processes. Journal of Computational and Graphical Statistics, 28(2): 401–414.
 
Genz A (1992). Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1(2): 141–149.
 
Heagerty PJ, Lele SR (1998). A composite likelihood approach to binary spatial data. Journal of the American Statistical Association, 93(443): 1099–1111.
 
Lee Y, Nelder JA (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society, Series B, Methodological, 58(4): 619–656.
 
Saha A, Datta A (2018a). Brisc: bootstrap for rapid inference on spatial covariances. Stat, 7(1): e184.
 
Saha A, Datta A (2018b). BRISC: Fast Inference for Large Spatial Datasets using BRISC. R package version 0.1.0.
 
Vecchia AV (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society, Series B, Methodological, 50(2): 297–312.
 
Zhang Z, Arellano-Valle RB, Genton MG, Huser R (2022). Tractable bayes of skew-elliptical link models for correlated binary data. Biometrics. https://doi.org/10.1111/biom.13731.

Related articles PDF XML
Related articles PDF XML

Copyright
2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
binary data generalized linear mixed models spatial, Gaussian processes

Funding
Abhirup Datta was partially supported by National Institute of Environmental Health Sciences (NIEHS) grant R01 ES033739 and by National Science Foundation (NSF) Division of Mathematical Sciences grant DMS-1915803. Sudipto Banerjee was partially supported by the National Science Foundation (NSF) from grants NSF/DMS 1916349 and NSF/IIS 1562303, and by the National Institute of Environmental Health Sciences (NIEHS) from grants R01ES030210 and 5R01ES027027.

Metrics
since February 2021
1217

Article info
views

479

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy