Pub. online:19 Apr 2022Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 470–489
Networks are ubiquitous in today’s world. Community structure is a well-known feature of many empirical networks, and a lot of statistical methods have been developed for community detection. In this paper, we consider the problem of community extraction in text networks, which is greatly relevant in medical errors and patient safety databases. We adapt a well-known community extraction method to develop a scalable algorithm for extracting groups of similar documents in large text databases. The application of our method on a real-world patient safety report system demonstrates that the groups generated from community extraction are much more accurate than manual tagging by frontline workers.