Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. A Bayesian Negative Binomial-Bernoulli M ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

A Bayesian Negative Binomial-Bernoulli Model with Tensor Decomposition: Application to Jointly Analyzing Shot Attempts and Shot Successes in Basketball Games
Kwok-Wah Ho  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1196
Pub. online: 9 July 2025      Type: Data Science In Action      Open accessOpen Access

Received
17 September 2024
Accepted
30 June 2025
Published
9 July 2025

Abstract

We propose a Bayesian Negative Binomial-Bernoulli model to jointly analyze the patterns behind field goal attempts and the factors influencing shot success. We apply nonnegative CANDECOMP/PARAFAC tensor decomposition to study shot patterns and use logistic regression to predict successful shots. To maintain the conditional conjugacy of the model, we employ a double Pólya-Gamma data augmentation scheme and devise an efficient variational inference algorithm for estimation. The model is applied to shot chart data from the National Basketball Association, focusing on the regular seasons from 2015–16 to 2022–23. We consistently identify three latent features in shot patterns across all seasons and verify a popular claim from recent years about the increasing importance of three-point shots. Additionally, we find that the home court advantage in field goal accuracy disappears in the 2020–21 regular season, which was the only full season under strict COVID-19 crowd control, aside from the short bubble period in 2019–20. This finding contributes to the literature on the influence of crowd effects on home advantage in basketball games.

Supplementary material

 Supplementary Material
We have included a supplementary section about the details of the Pólya-Gamma augmentation, the variational EM algorithm outlined in Section 3, and more graphs for the empirical analysis. The codes for downloading the shot chart data and for generating the major results are included in https://github.com/kwho1/NBA_JDS.

References

 
Bro R (1997). PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 38: 149–171. https://doi.org/10.1016/S0169-7439(97)00032-4
 
Cheng L, Tong X, Wang S, Wu YC, Poor HV (2020). Learning nonnegative factors from tensor data: Probabilistic modeling and inference algorithm. IEEE Transactions on Signal Processing, 68: 1792–1806. https://doi.org/10.1109/TSP.2020.2975353
 
Ehrlich J, Potter J (2023). Estimating the effect of attendance on home advantage in the National Basketball Association. Applied Economics Letters, 30(11): 1471–1482. https://doi.org/10.1080/13504851.2022.2061898
 
Franks A, Miller A, Bornn L, Goldsberry K (2015). Characterizing the spatial structure of defensive skill in professional basketball. The Annals of Applied Statistics, 9: 94–121.
 
Freitas L (2021). Shot distribution in the NBA: Did we see when 3-point shots became popular? German Journal of Exercise and Sport Research, 51: 237–240. https://doi.org/10.1007/s12662-020-00690-7
 
Ganz SC, Allsop K (2024). A mere fan effect on home-court advantage. Journal of Sports Economics, 25(1): 30–53. https://doi.org/10.1177/15270025231200890
 
Harshman RA (1970). Foundations of the PARAFAC procedure: Model and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers Phonetics, 16: 1–84.
 
Hinrichy JL, Nielseny SFV, Madseny KH, Mørup M (2018). Variational bayesian partially observed non-negative tensor factorization. In: IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6.
 
Hu G, Xue Y, Shen W (2022). Multidimensional heterogeneity learning for count value tensor data with applications to field goal attempt analysis of NBA players. arXiv preprint: https://arxiv.org/abs/2205.09918.
 
Hu G, Yang HC, Xue Y (2021). Bayesian group learning for shot selection of professional basketball players. Stat, 10: e4324.
 
Hu G, Yang HC, Xue Y, Dey DK (2023). Zero-inflated Poisson model with clustered regression coefficients: Application to heterogeneity learning of field goal attempts of professional basketball players. The Canadian Journal of Statistics, 51(1): 157–172. https://doi.org/10.1002/cjs.11684
 
Jiao J, Hu G, Yan J (2021). A bayesian marked spatial point processes model for basketball shot chart. Journal of Quantitative Analysis in Sports, 17(2): 77–90. https://doi.org/10.1515/jqas-2019-0106
 
Klami A (2014). Polya-Gamma augmentations for factor models. JMLR: Workshop and Conference Proceedings, 39: 112–128.
 
Kolda T, Bader B (2009). Tensor decompositions and applications. Siam Review, 51(3): 455–500. https://doi.org/10.1137/07070111X
 
Leota J, Hoffman D, Mascaro L, Czeisler M, Nash K, Drummond S, et al. (2022). Home is where the hustle is: the influence of crowds on effort and home advantage in the National Basketball Association. Journal of Sports Sciences, 40(20): 2343–2352. https://doi.org/10.1080/02640414.2022.2154933
 
Ma X, Brynjarsdóttir J, LaFramboise T (2024). A double Pólya-Gamma data augmentation scheme for a hierarchical negative binomial – binomial data model. Computational Statistics and Data Analysis, 199: 108009. https://doi.org/10.1016/j.csda.2024.108009
 
Miller AC, Bornn L, Adams R, Goldsberry K (2014). Factorized point process intensities: A spatial analysis of professional basketball. In: Proceedings of the 31st International Conference on Machine Learning (ICML), Xing E, Jebara T (Eds.), volume 32(1), 235–243.
 
Neelon B (2019). Bayesian zero-inflated negative binomial regression based on Pólya-Gamma mixtures. Bayesian Analysis, 14(3): 829–855. https://doi.org/10.1214/18-BA1132
 
Pillow J, Scott J (2012). Fully bayesian inference for neural models with negative-binomial spiking. Advances in Neural Information Processing Systems, 25: 1907–1915.
 
Polson NG, Scott JG, Windle J (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108(504): 1339–2349. https://doi.org/10.1080/01621459.2013.829001
 
Rai P, Hu C, Harding M, Carin L (2015). Scalable probabilistic tensor factorization for binary and count data. In: IJCAI’15: Proceedings of the 24th International Conference on Artificial Intelligence, 3770–3776.
 
Reich B, Hodges J, Carlin B, Reich A (2006). A spatial analysis of basketball shot chart data. The American Statistician, 60(1): 3–12. https://doi.org/10.1198/000313006X90305
 
Rolland G, Vuillemot R, Bos W, Rivière N (2020). Characterization of space and time-dependence of 3-point shots in basketball. In: MIT Sloan Sports Analytics Conference.
 
Sørensen M, De Lathauwer L, Comon P, Icart S, Deneire L (2012). Canonical polyadic decomposition with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 33: 1190–1213. https://doi.org/10.1137/110830034
 
Soulat H, Keshavarzi S, Margrie T, Sahani M (2021). Probabilistic tensor decomposition of neural population spiking activity. In: Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J (Eds.), volume 34, 15969–15980. 2021.
 
Steinfeldt H, Dallmeyer S, Breuer C (2022). The silence of the fans: The impact of restricted crowds on the margin of victory in the NBA. International Journal of Sport Finance, 17: 165–177. https://doi.org/10.32731/ijsf/173.082022.04
 
Takayama H, Zhao Q, Hontani H, Yokota T (2022). Bayesian tensor completion and decomposition with automatic CP rank determination using MGP shrinkage prior. SN Computer Science, 3: 225. https://doi.org/10.1007/s42979-022-01119-8
 
Tipping ME (2001). Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1: 211–244.
 
Tucker L (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3): 279–311. https://doi.org/10.1007/BF02289464
 
Wong-Toi W, Yang H, Shen W, Hu G (2023). A joint analysis for field goal attempts and percentages of professional basketball players: Bayesian nonparametric resource. Journal of Data Science, 21(1): 68–86. https://doi.org/10.6339/22-JDS1062
 
Yang Z, Yang T, Wang H, Zhao H, Liu D (2025). Bayesian nonnegative tensor completion with automatic rank determination. IEEE Transactions on Image Processing, 34: 2036–2051. https://doi.org/10.1109/TIP.2024.3459647
 
Yin F, Hu G, Shen W (2023). Analysis of professional basketball field goal attempts via a bayesian matrix clustering approach. Journal of Computational and Graphical Statistics, 32(1): 49–60. https://doi.org/10.1080/10618600.2022.2085727
 
Zajac T, Mikolajec K, Chmura P, Konefal M, Krzysztofik M, Makar P (2023). Long-term trends in shooting performance in the NBA: An analysis of two- and three-point shooting across 40 consecutive seasons. International Journal of Environmental Research and Public Health, 20(3): 1924.
 
Zhao Q, Zhang L, Cichocki A (2015). Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751–1763. https://doi.org/10.1109/TPAMI.2015.2392756

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
logistic regression Pólya-Gamma variational inference

Metrics
since February 2021
20

Article info
views

6

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy