A Mixed-Membership Model for Social Network Clustering
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 508–522
Pub. online: 7 August 2023
Type: Statistical Data Science
Open Access
Received
7 November 2021
7 November 2021
Accepted
4 July 2023
4 July 2023
Published
7 August 2023
7 August 2023
Abstract
We propose a simple mixed membership model for social network clustering in this paper. A flexible function is adopted to measure affinities among a set of entities in a social network. The model not only allows each entity in the network to possess more than one membership, but also provides accurate statistical inference about network structure. We estimate the membership parameters using an MCMC algorithm. We evaluate the performance of the proposed algorithm by applying our model to two empirical social network data, the Zachary club data and the bottlenose dolphin network data. We also conduct some numerical studies based on synthetic networks for further assessing the effectiveness of our algorithm. In the end, some concluding remarks and future work are addressed briefly.
Supplementary material
Supplementary MaterialThe codes for Algorithm 1 and the implementations can be found on the journal website. The results of empirical data applications are saved in RDS files.
References
Betancourt M (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint: https://arxiv.org/abs/1701.02434
Bickle PJ, Chen A (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 160(50): 21068–21073. https://doi.org/10.1073/pnas.0907096106
Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004). Models of social networks based on social distance attachment. Physical Review E, 70(5): 056122. https://doi.org/10.1103/PhysRevE.70.056122
Cartwright D, Harary F (1956). Structure balance: A generalization of Heider’s theory. Psychological Review, 63(5): 277–293. https://doi.org/10.1037/h0046049
Casella G, George EI (1992). Explaining the Gibbs sampler. The American Statistician, 46(3): 167–174. https://doi.org/10.1080/00031305.1992.10475878
Freeman LC (1977). A set of measures of centrality based on betweenness. Sociometry, 40(1): 35–41. https://doi.org/10.2307/3033543
Fronczak P, Fronczak A, Bujok M (2013). Exponential random graph models for networks with community structure. Physical Review E, 88(3): 032810. https://doi.org/10.1103/PhysRevE.88.032810
Gelfand AE, Smith AFE (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410): 398–409. https://doi.org/10.1080/01621459.1990.10476213
Geman S, Genman D (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6): 721–741. https://doi.org/10.1109/TPAMI.1984.4767596
Geng J, Bhattacharya A, Pati D (2019). Probabilistic community detection with unknown number of communities. Journal of the American Statistical Association, 114(526): 893–905. https://doi.org/10.1080/01621459.2018.1458618
Gilbert EN (1959). Random graphs. The Annals of Mathematical Statistics, 30(4): 1141–1144. https://doi.org/10.1214/aoms/1177706098
Girvan M, Newman MEJ (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12): 7821–7826. https://doi.org/10.1073/pnas.122653799
Handcock MS, Raftery AE (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A. Statistics in Society, 170: 301–354. https://doi.org/10.1111/j.1467-985X.2007.00471.x
Harary F (1953). On the notion of balance of a signed graph. The Michigan Mathematical Journal, 2(2): 143–146. https://doi.org/10.1307/mmj/1028989917
Hoff PD, Raftery AE, Handcock MS (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460): 1090–1098. https://doi.org/10.1198/016214502388618906
Holland PW, Laskey KB, Leinhardt S (1983). Stochastic blockmodels: First steps. Social Networks, 5(2): 109–137. https://doi.org/10.1016/0378-8733(83)90021-7
Huang W, Liu Y, Chen Y (2020). Mixed membership stochastic blockmodels for heterogeneous networks. Bayesian Analysis, 15(3): 711–736. https://doi.org/10.1214/19-BA1163
Hunter DR, Handcock MS, Butts CT, Goodreau Morris M SM (2008). ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software, 24(3): 1–29. https://doi.org/10.18637/jss.v024.i03
Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003). The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54: 396–405. https://doi.org/10.1007/s00265-003-0651-y
Lyzinski V, Tang M, Athreya A, Park Y, Priebe CE (2017). Community detection and classification in hierarchical stochastic blockmodels. IEEE Transactions on Network Science and Engineering, 4(1): 13–26. https://doi.org/10.1109/TNSE.2016.2634322
Marchette DJ, Priebe CE (2008). Predicting unobserved links in incompletely observed networks. Computational Statistics & Data Analysis, 52(3): 1373–1386. https://doi.org/10.1016/j.csda.2007.03.016
Meilǎ M (2007). Comparing clustering—an information based distance. Journal of Multivariate Analysis, 98(5): 873–895. https://doi.org/10.1016/j.jmva.2006.11.013
Newman MEJ (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2): 404–409. https://doi.org/10.1073/pnas.98.2.404
Newman MEJ (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103: 8577–8582 (2006). https://doi.org/10.1073/pnas.0601602103
Newman MEJ, Strogatz SH, Watts DJ (2001). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2): 026118. https://doi.org/10.1103/PhysRevE.64.026118
Newman MEJ, Watts DJ, Strogatz SH (2002). Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America, 99(supp(1): 2566–2572. https://doi.org/10.1073/pnas.012582999
Noroozi M, Pensky M (2022). The hierarchy of block models. Sankhya. Series A, 84: 64–107. https://doi.org/10.1007/s13171-021-00247-2
Nowicki K, Snijders TAB (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455): 1077–1087. https://doi.org/10.1198/016214501753208735
Ouyang G, Dipak DK, Zhang P (2020). Clique-based method for social network clustering. Journal of Classification, 37: 254–274. https://doi.org/10.1007/s00357-019-9310-5
Rand WM (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336): 846–850. https://doi.org/10.1080/01621459.1971.10482356
Rapoport A (1949a). Outline of a probabilistic approach to animal sociology: I. The Bulletin of Mathematical Biophysics, 11(3): 183–196. https://doi.org/10.1007/BF02478364
Rapoport A (1949b). Outline of a probabilistic approach to animal sociology: II. The Bulletin of Mathematical Biophysics, 11(4): 273–281. https://doi.org/10.1007/BF02477980
Rapoport A (1950). Outline of a probabilistic approach to animal sociology: III. The Bulletin of Mathematical Biophysics, 12(1): 7–17. https://doi.org/10.1007/BF02477340
Sengupta S, Chen Y (2018). A block model for node popularity in networks with community structure. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 80(2): 365–386. https://doi.org/10.1111/rssb.12245
Sewell DK, Chen Y (2017). Latent space approaches to community detection in dynamic networks. Bayesian Analysis, 12(2): 351–377. https://doi.org/10.1214/16-BA1000
Shi J, Malik J (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 888–905. https://doi.org/10.1109/34.868688
Snijders TAB (2001). Statistical models for social networks. Annual Review of Sociology, 37: 131–153. https://doi.org/10.1146/annurev.soc.012809.102709
Snijders TAB, Nowicki K (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14: 75–100. https://doi.org/10.1007/s003579900004
Snijders TAB, Pattison PE, Robins GL, Handcock MS (2006). New specifications for exponential random graph models. Sociological Methodology, 36(1): 99–153. https://doi.org/10.1111/j.1467-9531.2006.00176.x
Toivonen R, Kovanen L, Kivelä M, Onnela JP, Saramäki J, Kaski K (2009). A comparative study of social network models: Network evolution models and nodal attribute models. Social Networks, 31(4): 240–254. https://doi.org/10.1016/j.socnet.2009.06.004
Watts DJ, Strogatz SH (1998). Collective dynamics of “small-world” networks. Nature, 393: 440–442. https://doi.org/10.1038/30918
Xie J, Kelley S, Szymański BK (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys, 45(4): 43. https://doi.org/10.1145/2501654.2501657
Zachary WW (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4): 452–473. https://doi.org/10.1086/jar.33.4.3629752