A Bayesian Negative Binomial-Bernoulli Model with Tensor Decomposition: Application to Jointly Analyzing Shot Attempts and Shot Successes in Basketball Games
Pub. online: 9 July 2025
Type: Data Science In Action
Open Access
Received
17 September 2024
17 September 2024
Accepted
30 June 2025
30 June 2025
Published
9 July 2025
9 July 2025
Abstract
We propose a Bayesian Negative Binomial-Bernoulli model to jointly analyze the patterns behind field goal attempts and the factors influencing shot success. We apply nonnegative CANDECOMP/PARAFAC tensor decomposition to study shot patterns and use logistic regression to predict successful shots. To maintain the conditional conjugacy of the model, we employ a double Pólya-Gamma data augmentation scheme and devise an efficient variational inference algorithm for estimation. The model is applied to shot chart data from the National Basketball Association, focusing on the regular seasons from 2015–16 to 2022–23. We consistently identify three latent features in shot patterns across all seasons and verify a popular claim from recent years about the increasing importance of three-point shots. Additionally, we find that the home court advantage in field goal accuracy disappears in the 2020–21 regular season, which was the only full season under strict COVID-19 crowd control, aside from the short bubble period in 2019–20. This finding contributes to the literature on the influence of crowd effects on home advantage in basketball games.
Supplementary material
Supplementary MaterialWe have included a supplementary section about the details of the Pólya-Gamma augmentation, the variational EM algorithm outlined in Section 3, and more graphs for the empirical analysis. The codes for downloading the shot chart data and for generating the major results are included in https://github.com/kwho1/NBA_JDS.
References
Bro R (1997). PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 38: 149–171. https://doi.org/10.1016/S0169-7439(97)00032-4
Cheng L, Tong X, Wang S, Wu YC, Poor HV (2020). Learning nonnegative factors from tensor data: Probabilistic modeling and inference algorithm. IEEE Transactions on Signal Processing, 68: 1792–1806. https://doi.org/10.1109/TSP.2020.2975353
Ehrlich J, Potter J (2023). Estimating the effect of attendance on home advantage in the National Basketball Association. Applied Economics Letters, 30(11): 1471–1482. https://doi.org/10.1080/13504851.2022.2061898
Freitas L (2021). Shot distribution in the NBA: Did we see when 3-point shots became popular? German Journal of Exercise and Sport Research, 51: 237–240. https://doi.org/10.1007/s12662-020-00690-7
Ganz SC, Allsop K (2024). A mere fan effect on home-court advantage. Journal of Sports Economics, 25(1): 30–53. https://doi.org/10.1177/15270025231200890
Hu G, Xue Y, Shen W (2022). Multidimensional heterogeneity learning for count value tensor data with applications to field goal attempt analysis of NBA players. arXiv preprint: https://arxiv.org/abs/2205.09918.
Hu G, Yang HC, Xue Y, Dey DK (2023). Zero-inflated Poisson model with clustered regression coefficients: Application to heterogeneity learning of field goal attempts of professional basketball players. The Canadian Journal of Statistics, 51(1): 157–172. https://doi.org/10.1002/cjs.11684
Jiao J, Hu G, Yan J (2021). A bayesian marked spatial point processes model for basketball shot chart. Journal of Quantitative Analysis in Sports, 17(2): 77–90. https://doi.org/10.1515/jqas-2019-0106
Kolda T, Bader B (2009). Tensor decompositions and applications. Siam Review, 51(3): 455–500. https://doi.org/10.1137/07070111X
Leota J, Hoffman D, Mascaro L, Czeisler M, Nash K, Drummond S, et al. (2022). Home is where the hustle is: the influence of crowds on effort and home advantage in the National Basketball Association. Journal of Sports Sciences, 40(20): 2343–2352. https://doi.org/10.1080/02640414.2022.2154933
Ma X, Brynjarsdóttir J, LaFramboise T (2024). A double Pólya-Gamma data augmentation scheme for a hierarchical negative binomial – binomial data model. Computational Statistics and Data Analysis, 199: 108009. https://doi.org/10.1016/j.csda.2024.108009
Neelon B (2019). Bayesian zero-inflated negative binomial regression based on Pólya-Gamma mixtures. Bayesian Analysis, 14(3): 829–855. https://doi.org/10.1214/18-BA1132
Polson NG, Scott JG, Windle J (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108(504): 1339–2349. https://doi.org/10.1080/01621459.2013.829001
Reich B, Hodges J, Carlin B, Reich A (2006). A spatial analysis of basketball shot chart data. The American Statistician, 60(1): 3–12. https://doi.org/10.1198/000313006X90305
Sørensen M, De Lathauwer L, Comon P, Icart S, Deneire L (2012). Canonical polyadic decomposition with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 33: 1190–1213. https://doi.org/10.1137/110830034
Soulat H, Keshavarzi S, Margrie T, Sahani M (2021). Probabilistic tensor decomposition of neural population spiking activity. In: Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J (Eds.), volume 34, 15969–15980. 2021.
Steinfeldt H, Dallmeyer S, Breuer C (2022). The silence of the fans: The impact of restricted crowds on the margin of victory in the NBA. International Journal of Sport Finance, 17: 165–177. https://doi.org/10.32731/ijsf/173.082022.04
Takayama H, Zhao Q, Hontani H, Yokota T (2022). Bayesian tensor completion and decomposition with automatic CP rank determination using MGP shrinkage prior. SN Computer Science, 3: 225. https://doi.org/10.1007/s42979-022-01119-8
Tucker L (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3): 279–311. https://doi.org/10.1007/BF02289464
Wong-Toi W, Yang H, Shen W, Hu G (2023). A joint analysis for field goal attempts and percentages of professional basketball players: Bayesian nonparametric resource. Journal of Data Science, 21(1): 68–86. https://doi.org/10.6339/22-JDS1062
Yang Z, Yang T, Wang H, Zhao H, Liu D (2025). Bayesian nonnegative tensor completion with automatic rank determination. IEEE Transactions on Image Processing, 34: 2036–2051. https://doi.org/10.1109/TIP.2024.3459647
Yin F, Hu G, Shen W (2023). Analysis of professional basketball field goal attempts via a bayesian matrix clustering approach. Journal of Computational and Graphical Statistics, 32(1): 49–60. https://doi.org/10.1080/10618600.2022.2085727
Zhao Q, Zhang L, Cichocki A (2015). Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751–1763. https://doi.org/10.1109/TPAMI.2015.2392756