Scalable Community Extraction of Text Networks for Automated Grouping in Medical Databases
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 470–489
Pub. online: 19 April 2022
Type: Statistical Data Science
Open Access
Received
4 November 2021
4 November 2021
Accepted
13 February 2022
13 February 2022
Published
19 April 2022
19 April 2022
Abstract
Networks are ubiquitous in today’s world. Community structure is a well-known feature of many empirical networks, and a lot of statistical methods have been developed for community detection. In this paper, we consider the problem of community extraction in text networks, which is greatly relevant in medical errors and patient safety databases. We adapt a well-known community extraction method to develop a scalable algorithm for extracting groups of similar documents in large text databases. The application of our method on a real-world patient safety report system demonstrates that the groups generated from community extraction are much more accurate than manual tagging by frontline workers.
Supplementary material
Supplementary MaterialSupplementary material online include R code for implementing the proposed method.
References
Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. arXiv preprint: https://arxiv.org/abs/1301.3781.
Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2017). Advances in pre-training distributed word representations. arXiv preprint: https://arxiv.org/abs/1712.09405
The White House (2020). Clinton-gore administration announces new actions to improve patient safety and assure health care quality. https://clintonwhitehouse4.archives.gov/textonly/WH/New/html/20000222_1.html.