A Mixed-Membership Model for Social Network Clustering

Ouyang, Guang; Dey, Dipak K.; Zhang, Panpan

doi:10.6339/23-JDS1109

Journal of Data Science

A Mixed-Membership Model for Social Network Clustering

Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 508–522

Guang Ouyang Dipak K. Dey Panpan Zhang

https://doi.org/10.6339/23-JDS1109

Pub. online: 7 August 2023 Type: Statistical Data Science

Open Access

Received
7 November 2021

Accepted
4 July 2023

Published
7 August 2023

Abstract

We propose a simple mixed membership model for social network clustering in this paper. A flexible function is adopted to measure affinities among a set of entities in a social network. The model not only allows each entity in the network to possess more than one membership, but also provides accurate statistical inference about network structure. We estimate the membership parameters using an MCMC algorithm. We evaluate the performance of the proposed algorithm by applying our model to two empirical social network data, the Zachary club data and the bottlenose dolphin network data. We also conduct some numerical studies based on synthetic networks for further assessing the effectiveness of our algorithm. In the end, some concluding remarks and future work are addressed briefly.

Supplementary material

Supplementary Material

The codes for Algorithm 1 and the implementations can be found on the journal website. The results of empirical data applications are saved in RDS files.

References

Abbe E (2018). Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177): 1–86.

Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9(65): 888–896.

Athreya A, Fishkind DE, Tang M, Priebe CE, Park Y, Vogelstein JT, et al. (2018). Statistical inference on random dot product graphs: A survey. Journal of Machine Learning Research, 18(226): 1–92.

Barabási AL, Albert R (1999). Emergence of scaling in random networks. Nature, 286(5439): 509–512.

Betancourt M (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint: https://arxiv.org/abs/1701.02434

Bickle PJ, Chen A (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 160(50): 21068–21073. https://doi.org/10.1073/pnas.0907096106

Blei DM, Ng AY, Jordan MI (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993–1022.

Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004). Models of social networks based on social distance attachment. Physical Review E, 70(5): 056122. https://doi.org/10.1103/PhysRevE.70.056122

Cartwright D, Harary F (1956). Structure balance: A generalization of Heider’s theory. Psychological Review, 63(5): 277–293. https://doi.org/10.1037/h0046049

Casella G, George EI (1992). Explaining the Gibbs sampler. The American Statistician, 46(3): 167–174. https://doi.org/10.1080/00031305.1992.10475878

Freeman LC (1977). A set of measures of centrality based on betweenness. Sociometry, 40(1): 35–41. https://doi.org/10.2307/3033543

Fronczak P, Fronczak A, Bujok M (2013). Exponential random graph models for networks with community structure. Physical Review E, 88(3): 032810. https://doi.org/10.1103/PhysRevE.88.032810

Gao C, Ma Z, Zhang AY, Zhou HH (2018). Community detection in degree-corrected block models. The Annals of Statistics, 46(5): 2153–2185.

Gelfand AE, Smith AFE (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410): 398–409. https://doi.org/10.1080/01621459.1990.10476213

Geman S, Genman D (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6): 721–741. https://doi.org/10.1109/TPAMI.1984.4767596

Geng J, Bhattacharya A, Pati D (2019). Probabilistic community detection with unknown number of communities. Journal of the American Statistical Association, 114(526): 893–905. https://doi.org/10.1080/01621459.2018.1458618

Gilbert EN (1959). Random graphs. The Annals of Mathematical Statistics, 30(4): 1141–1144. https://doi.org/10.1214/aoms/1177706098

Girvan M, Newman MEJ (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12): 7821–7826. https://doi.org/10.1073/pnas.122653799

Handcock MS, Raftery AE (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A. Statistics in Society, 170: 301–354. https://doi.org/10.1111/j.1467-985X.2007.00471.x

Harary F (1953). On the notion of balance of a signed graph. The Michigan Mathematical Journal, 2(2): 143–146. https://doi.org/10.1307/mmj/1028989917

Hoff PD, Raftery AE, Handcock MS (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460): 1090–1098. https://doi.org/10.1198/016214502388618906

Holland PW, Laskey KB, Leinhardt S (1983). Stochastic blockmodels: First steps. Social Networks, 5(2): 109–137. https://doi.org/10.1016/0378-8733(83)90021-7

Huang W, Liu Y, Chen Y (2020). Mixed membership stochastic blockmodels for heterogeneous networks. Bayesian Analysis, 15(3): 711–736. https://doi.org/10.1214/19-BA1163

Hunter DR, Handcock MS, Butts CT, Goodreau Morris M SM (2008). ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software, 24(3): 1–29. https://doi.org/10.18637/jss.v024.i03

Lusseau D, Newman MEJ (2004). Identifying the role that animals play in their social networks. Proceedings of the Royal Society B, 271(supp(6)): 477–481.

Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003). The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54: 396–405. https://doi.org/10.1007/s00265-003-0651-y

Lyzinski V, Tang M, Athreya A, Park Y, Priebe CE (2017). Community detection and classification in hierarchical stochastic blockmodels. IEEE Transactions on Network Science and Engineering, 4(1): 13–26. https://doi.org/10.1109/TNSE.2016.2634322

Marchette DJ, Priebe CE (2008). Predicting unobserved links in incompletely observed networks. Computational Statistics & Data Analysis, 52(3): 1373–1386. https://doi.org/10.1016/j.csda.2007.03.016

Meilǎ M (2007). Comparing clustering—an information based distance. Journal of Multivariate Analysis, 98(5): 873–895. https://doi.org/10.1016/j.jmva.2006.11.013

Neal RM (2011). MCMC using Hamiltonian dynamics. In: Handbook of Markov Chain Monte Carlo (S Brooks, A Gelman, G Jones, XL Meng, eds.), 113–162. Chapman & Hall/CRC, Boca Raton, FL, USA.

Newman MEJ (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2): 404–409. https://doi.org/10.1073/pnas.98.2.404

Newman MEJ (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103: 8577–8582 (2006). https://doi.org/10.1073/pnas.0601602103

Newman MEJ, Strogatz SH, Watts DJ (2001). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2): 026118. https://doi.org/10.1103/PhysRevE.64.026118

Newman MEJ, Watts DJ, Strogatz SH (2002). Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America, 99(supp(1): 2566–2572. https://doi.org/10.1073/pnas.012582999

Ng AY, Jordan MI, Weiss Y (2001). On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems 14 (TG Dietterich, S Becker, Z Ghahramani, eds.), 849–856. MIT Press, Cambridge, MA, USA.

Noroozi M, Pensky M (2022). The hierarchy of block models. Sankhya. Series A, 84: 64–107. https://doi.org/10.1007/s13171-021-00247-2

Nowicki K, Snijders TAB (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455): 1077–1087. https://doi.org/10.1198/016214501753208735

Ouyang G, Dipak DK, Zhang P (2020). Clique-based method for social network clustering. Journal of Classification, 37: 254–274. https://doi.org/10.1007/s00357-019-9310-5

Rand WM (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336): 846–850. https://doi.org/10.1080/01621459.1971.10482356

Rapoport A (1949a). Outline of a probabilistic approach to animal sociology: I. The Bulletin of Mathematical Biophysics, 11(3): 183–196. https://doi.org/10.1007/BF02478364

Rapoport A (1949b). Outline of a probabilistic approach to animal sociology: II. The Bulletin of Mathematical Biophysics, 11(4): 273–281. https://doi.org/10.1007/BF02477980

Rapoport A (1950). Outline of a probabilistic approach to animal sociology: III. The Bulletin of Mathematical Biophysics, 12(1): 7–17. https://doi.org/10.1007/BF02477340

Sengupta S, Chen Y (2018). A block model for node popularity in networks with community structure. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 80(2): 365–386. https://doi.org/10.1111/rssb.12245

Sewell DK, Chen Y (2017). Latent space approaches to community detection in dynamic networks. Bayesian Analysis, 12(2): 351–377. https://doi.org/10.1214/16-BA1000

Shi J, Malik J (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 888–905. https://doi.org/10.1109/34.868688

Snijders TAB (2001). Statistical models for social networks. Annual Review of Sociology, 37: 131–153. https://doi.org/10.1146/annurev.soc.012809.102709

Snijders TAB, Nowicki K (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14: 75–100. https://doi.org/10.1007/s003579900004

Snijders TAB, Pattison PE, Robins GL, Handcock MS (2006). New specifications for exponential random graph models. Sociological Methodology, 36(1): 99–153. https://doi.org/10.1111/j.1467-9531.2006.00176.x

Toivonen R, Kovanen L, Kivelä M, Onnela JP, Saramäki J, Kaski K (2009). A comparative study of social network models: Network evolution models and nodal attribute models. Social Networks, 31(4): 240–254. https://doi.org/10.1016/j.socnet.2009.06.004

Watts DJ, Strogatz SH (1998). Collective dynamics of “small-world” networks. Nature, 393: 440–442. https://doi.org/10.1038/30918

Xie J, Kelley S, Szymański BK (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys, 45(4): 43. https://doi.org/10.1145/2501654.2501657

Young SJ, Scheinerman ER (2007). Random dot product graph models for social networks. In: WAW 2007: Algorithms and Models for the Web-Graph (A Bonato, FRK Chung, eds.), 138–149. Springer, Berlin, Heidelberg.

Zachary WW (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4): 452–473. https://doi.org/10.1086/jar.33.4.3629752

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

cosine similarity MCMC algorithm mixed membership social network clustering stochastic blockmodels

Metrics

since February 2021

413

Article info
views

248

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file