Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences
  4. Bayesian Inference for Spatial Count Dat ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Bayesian Inference for Spatial Count Data that May be Over-Dispersed or Under-Dispersed with Application to the 2016 US Presidential Election
Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences, pp. 325–337
Hou-Cheng Yang   Jonathan R. Bradley  

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS1032
Pub. online: 29 December 2021      Type: Statistical Data Science      Open accessOpen Access

Received
30 August 2021
Accepted
27 November 2021
Published
29 December 2021

Abstract

We propose a method of spatial prediction using count data that can be reasonably modeled assuming the Conway-Maxwell Poisson distribution (COM-Poisson). The COM-Poisson model is a two parameter generalization of the Poisson distribution that allows for the flexibility needed to model count data that are either over or under-dispersed. The computationally limiting factor of the COM-Poisson distribution is that the likelihood function contains multiple intractable normalizing constants and is not always feasible when using Markov Chain Monte Carlo (MCMC) techniques. Thus, we develop a prior distribution of the parameters associated with the COM-Poisson that avoids the intractable normalizing constant. Also, allowing for spatial random effects induces additional variability that makes it unclear if a spatially correlated Conway-Maxwell Poisson random variable is over or under-dispersed. We propose a computationally efficient hierarchical Bayesian model that addresses these issues. In particular, in our model, the parameters associated with the COM-Poisson do not include spatial random effects (leading to additional variability that changes the dispersion properties of the data), and are then spatially smoothed in subsequent levels of the Bayesian hierarchical model. Furthermore, the spatially smoothed parameters have a simple regression interpretation that facilitates computation. We demonstrate the applicability of our approach using simulated examples, and a motivating application using 2016 US presidential election voting data in the state of Florida obtained from the Florida Division of Elections.

Supplementary material

 Supplementary Material
The real data and R code needed to reproduce the results in this paper can be found on the supplementary materials.

References

 
Banerjee S, Carlin BP, Gelfand AE (2014). Hierarchical Modeling and Analysis for Spatial Data. CRC Press.
 
Bradley JR, Holan SH, Wikle CK (2018). Computationally efficient distribution theory for bayesian inference of high-dimensional dependent count-valued data (with discussion). Bayesian Analysis, 13: 253–302.
 
Bradley JR, Holan SH, Wikle CK (2020). Bayesian hierarchical models with conjugate full-conditional distributions for dependent data from the natural exponential family. Journal of the American Statistical Association, 115: 2037–2052.
 
Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, volume 53. Cambridge University Press.
 
Carlin BP, Banerjee S (2003). Hierarchical multivariate CAR models for spatio-temporally correlated survival data. Bayesian Statistics, 7: 45–63.
 
Chakraborty S, Imoto T (2016). Extended Conway-Maxwell-Poisson distribution and its properties and applications. Journal of Statistical Distributions and Applications, 3: 5.
 
Chen MH, Huang L, Ibrahim JG, Kim S (2008). Bayesian variable selection and computation for generalized linear models with conjugate priors. Bayesian Analysis, 3: 585–614.
 
Conway RW, Maxwell WL (1962). A queuing model with state dependent service rates. Journal of Industrial Engineering, 12: 132–136.
 
Cressie N (1993). Statistics for Spatial Data. John Wiley & Sons, New York, NY.
 
Cressie N, Johannesson G (2008). Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 70: 209–226.
 
Dai F, Dutta S, Maitra R (2020). A matrix-free likelihood method for exploratory factor analysis of high-dimensional gaussian data. Journal of Computational and Graphical Statistics, 29: 675–680.
 
Daly F, Gaunt RE (2016). The Conway-Maxwell-Poisson distribution: distributional theory and approximation. arXiv preprint: https://arxiv.org/abs/1503.07012.
 
Duquette CM, Mixon FG, Cebula RJ (2017). Swing states, the winner-take-all electoral college, and fiscal federalism. Atlantic Economic Journal, 45: 45–57.
 
Gelfand AE, Schliep EM (2016). Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics, 18: 86–104.
 
Guikema SD, Goffelt JP (2008). A flexible count data regression model for risk analysis. Risk Analysis, 28: 213–223.
 
Gupta RC, Sim S, Ong S (2014). Analysis of discrete data by Conway–Maxwell Poisson distribution. AStA Advances in Statistical Analysis, 98: 327–343.
 
Hadfield JD, et al. (2010). MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software, 33: 1–22.
 
Hilbe JM (2011). Negative Binomial Regression. Cambridge University Press.
 
Hooten MB, Wikle CK, Dorazio RM, Royle JA (2007). Hierarchical spatio-temporal matrix models for characterizing invasions. Biometrics, 63: 558–567.
 
Kadane JB, Shmueli G, Minka TP, Borle S, Boatwright P (2006). Conjugate analysis of the Conway-Maxwell-Poisson distribution. Bayesian Analysis, 1: 363–374.
 
Lindén A, Mäntyniemi S (2011). Using the negative binomial distribution to model overdispersion in ecological count data. Ecology, 92: 1414–1421.
 
Manton KG, Woodbury MA, Stallard E (1981). A variance components approach to categorical data models with heterogenous cell populations: Analysis of spatial gradients in lung cancer mortality rates in North Carolina counties. Biometrics, 37: 259–269.
 
Neal RM (2003). Slice sampling. The Annals of Statistics, 31: 705–767.
 
Sellers KF, Borle S, Shmueli G (2012). The COM-Poisson model for count data: A survey of methods and applications. Applied Stochastic Models in Business and Industry, 28: 104–116.
 
Sellers KF, Morris DS, Balakrishnan N (2016). Bivariate Conway–Maxwell–Poisson distribution: Formulation, properties, and inference. Journal of Multivariate Analysis, 150: 152–168.
 
Sellers KF, Raim A (2016). A flexible zero-inflated model to address data dispersion. Computational Statistics & Data Analysis, 99: 68–80.
 
Sellers KF, Shmueli G (2013). Data dispersion: now you see it… now you don’t. Communications in Statistics. Theory and Methods, 42(17): 3134–3147.
 
Shmueli G, Minka TP, Kadane JB, Borle S, Boatwright P (2005). A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. Journal of the Royal Statistical Society. Series C. Applied Statistics, 54: 127–142.
 
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 64: 583–639.
 
Ver Hoef JM, Boveng PL (2007). Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology, 88: 2766–2772.
 
Wahba G (1990). Spline Models for Observational Data. SIAM, Philadelphia, PA.
 
Waller LA, Carlin BP, Xia H, Gelfand AE (1997). Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association, 92: 607–617.
 
Wikle CK (2010). Low-rank representations for spatial processes. In: Handbook of Spatial Statistics (AE Gelfand, P Diggle, P Guttorp, M Fuentes, eds.), 114–125. CRC Press.
 
Wikle CK, Hooten MB (2006). Hierarchical Bayesian spatio-temporal models for population spread. In: Hierarchical Modelling for the Environmental Sciences: Statistical Methods and Applications (JS Clark, AE Gelfand, eds.). Ch. 8.
 
Wu G, Holan SH, Wikle CK (2013). Hierarchical Bayesian spatio-temporal Conway–Maxwell Poisson models with dynamic dispersion. Journal of Agricultural, Biological, and Environmental Statistics, 18: 335–356.
 
Yang HC, Bradley JR (2021). Bayesian inference for big spatial data using non-stationary spectral simulation. Spatial Statistics, 43: 100507.

Related articles PDF XML
Related articles PDF XML

Copyright
2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
Bayesian inference Conway-Maxwell count data dispersion Poisson distribution spatial statistics

Funding
Jonathan Bradley’s research was partially supported by the US National Science Foundation (NSF) grant SES-1853099.

Metrics
since February 2021
1743

Article info
views

694

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy