<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1038</article-id>
<article-id pub-id-type="doi">10.6339/22-JDS1038</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Scalable Community Extraction of Text Networks for Automated Grouping in Medical Databases</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Komolafe</surname><given-names>Tomilayo</given-names></name><xref ref-type="aff" rid="j_jds1038_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Fong</surname><given-names>Allan</given-names></name><xref ref-type="aff" rid="j_jds1038_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-6889-8599</contrib-id>
<name><surname>Sengupta</surname><given-names>Srijan</given-names></name><email xlink:href="mailto:ssengup2@ncsu.edu">ssengup2@ncsu.edu</email><xref ref-type="aff" rid="j_jds1038_aff_003">3</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1038_aff_001"><label>1</label><institution>Qlik</institution>, 211 S Gulph Rd, King of Prussia, PA 19406, <country>USA</country></aff>
<aff id="j_jds1038_aff_002"><label>2</label><institution>MedStar Health Research Institute</institution>, Hyattsville, Maryland, <country>USA</country></aff>
<aff id="j_jds1038_aff_003"><label>3</label>Department of Statistics, <institution>North Carolina State University</institution>, Raleigh, NC, 27695, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author Email: <ext-link ext-link-type="uri" xlink:href="mailto:ssengup2@ncsu.edu">ssengup2@ncsu.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>19</day><month>4</month><year>2022</year></pub-date><volume>21</volume><issue>3</issue><fpage>470</fpage><lpage>489</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1038_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>Supplementary material online include R code for implementing the proposed method.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>4</day><month>11</month><year>2021</year></date><date date-type="accepted"><day>13</day><month>2</month><year>2022</year></date></history>
<permissions><copyright-statement>2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Networks are ubiquitous in today’s world. Community structure is a well-known feature of many empirical networks, and a lot of statistical methods have been developed for community detection. In this paper, we consider the problem of community extraction in text networks, which is greatly relevant in medical errors and patient safety databases. We adapt a well-known community extraction method to develop a scalable algorithm for extracting groups of similar documents in large text databases. The application of our method on a real-world patient safety report system demonstrates that the groups generated from community extraction are much more accurate than manual tagging by frontline workers.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>community structure</kwd>
<kwd>community extraction</kwd>
<kwd>natural language processing</kwd>
<kwd>patient safety</kwd>
</kwd-group>
<funding-group><award-group><funding-source xlink:href="https://doi.org/10.13039/100000092">National Library of Medicine</funding-source><award-id>1R01LM013309</award-id></award-group><funding-statement>We acknowledge the support from the NIH R01 grant 1R01LM013309 from the National Library of Medicine. </funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1038_reflist_001">
<title>References</title>
<ref id="j_jds1038_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Aizawa</surname> <given-names>A</given-names></string-name> (<year>2003</year>). <article-title>An information-theoretic perspective of tf–idf measures</article-title>. <source><italic>Information Processing &amp; Management</italic></source>, <volume>39</volume>(<issue>1</issue>): <fpage>45</fpage>–<lpage>65</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Amini</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bickel</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Levina</surname> <given-names>E</given-names></string-name> (<year>2013</year>). <article-title>Pseudo-likelihood methods for community detection in large sparse networks</article-title>. <source><italic>Ann. Statist.</italic></source>, <volume>41</volume>(<issue>4</issue>): <fpage>2097</fpage>–<lpage>2122</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_003">
<mixed-citation publication-type="chapter"> <string-name><surname>Aspden</surname> <given-names>P</given-names></string-name>, <string-name><surname>Corrigan</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Wolcott</surname> <given-names>J</given-names></string-name>, <string-name><surname>Erickson</surname> <given-names>SM</given-names></string-name>, <etal>et al.</etal> (<year>2004</year>). <chapter-title>Patient safety reporting systems and applications</chapter-title>. In: <source><italic>Patient Safety: Achieving a New Standard for Care</italic></source>. <publisher-name>National Academies Press (US)</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_004">
<mixed-citation publication-type="other"> <string-name><surname>Beasley</surname> <given-names>JE</given-names></string-name> (1998). Heuristic algorithms for the unconstrained binary quadratic programming problem, <italic>Technical report, Citeseer</italic>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Bickel</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>A</given-names></string-name> (<year>2009</year>). <article-title>A nonparametric view of network models and Newman–Girvan and other modularities</article-title>. <source><italic>Proceedings of the National Academy of Sciences</italic></source>, <volume>106</volume>: <fpage>21068</fpage>–<lpage>21073</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Chang</surname> <given-names>A</given-names></string-name>, <string-name><surname>Schyve</surname> <given-names>PM</given-names></string-name>, <string-name><surname>Croteau</surname> <given-names>RJ</given-names></string-name>, <string-name><surname>O’Leary</surname> <given-names>DS</given-names></string-name>, <string-name><surname>Loeb</surname> <given-names>JM</given-names></string-name> (<year>2005</year>). <article-title>The JCAHO patient safety event taxonomy: a standardized terminology and classification schema for near misses and adverse events</article-title>. <source><italic>International Journal for Quality in Health Care</italic></source>, <volume>17</volume>(<issue>2</issue>): <fpage>95</fpage>–<lpage>105</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Clarke</surname> <given-names>JR</given-names></string-name> (<year>2006</year>). <article-title>How a system for reporting medical errors can and cannot improve patient safety</article-title>. <source><italic>The American Surgeon</italic></source>, <volume>72</volume>(<issue>11</issue>): <fpage>1088</fpage>–<lpage>1091</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Dong</surname> <given-names>R</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name> (<year>2020</year>). <article-title>Overlapping community detection in weighted temporal text networks</article-title>. <source><italic>IEEE Access</italic></source>, <volume>8</volume>: <fpage>58118</fpage>–<lpage>58129</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Dovey</surname> <given-names>S</given-names></string-name>, <string-name><surname>Meyers</surname> <given-names>D</given-names></string-name>, <string-name><surname>Phillips</surname> <given-names>R</given-names></string-name>, <string-name><surname>Green</surname> <given-names>L</given-names></string-name>, <string-name><surname>Fryer</surname> <given-names>G</given-names></string-name>, <string-name><surname>Galliher</surname> <given-names>J</given-names></string-name>, <etal>et al.</etal> (<year>2002</year>). <article-title>A preliminary taxonomy of medical errors in family practice</article-title>. <source><italic>BMJ Quality &amp; Safety</italic></source>, <volume>11</volume>(<issue>3</issue>): <fpage>233</fpage>–<lpage>238</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Dumais</surname> <given-names>ST</given-names></string-name> (<year>2004</year>). <article-title>Latent semantic analysis</article-title>. <source><italic>Annual review of information science and technology</italic></source>, <volume>38</volume>(<issue>1</issue>): <fpage>188</fpage>–<lpage>230</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Fortunato</surname> <given-names>S</given-names></string-name> (<year>2010</year>). <article-title>Community detection in graphs</article-title>. <source><italic>Physics Reports</italic></source>, <volume>486</volume>(<issue>3</issue>): <fpage>75</fpage>–<lpage>174</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_012">
<mixed-citation publication-type="chapter"> <string-name><surname>Glover</surname> <given-names>F</given-names></string-name>, <string-name><surname>Laguna</surname> <given-names>M</given-names></string-name> (<year>1998</year>). <chapter-title>Tabu search</chapter-title>. In: <source><italic>Handbook of Combinatorial Optimization</italic></source>, <fpage>2093</fpage>–<lpage>2229</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Gong</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Song</surname> <given-names>HY</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Hua</surname> <given-names>L</given-names></string-name> (<year>2015</year>). <article-title>Identifying barriers and benefits of patient safety event reporting toward user-centered design</article-title>. <source><italic>Safety in Health</italic></source>, <volume>1</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>9</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Griffey</surname> <given-names>RT</given-names></string-name>, <string-name><surname>Schneider</surname> <given-names>RM</given-names></string-name>, <string-name><surname>Todorov</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Yaeger</surname> <given-names>L</given-names></string-name>, <string-name><surname>Sharp</surname> <given-names>BR</given-names></string-name>, <string-name><surname>Vrablik</surname> <given-names>MC</given-names></string-name>, <etal>et al.</etal> (<year>2019</year>). <article-title>Critical review, development, and testing of a taxonomy for adverse events and near misses in the emergency department</article-title>. <source><italic>Academic Emergency Medicine</italic></source>, <volume>26</volume>(<issue>6</issue>): <fpage>670</fpage>–<lpage>679</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Günther</surname> <given-names>F</given-names></string-name>, <string-name><surname>Dudschig</surname> <given-names>C</given-names></string-name>, <string-name><surname>Kaup</surname> <given-names>B</given-names></string-name> (<year>2015</year>). <article-title>Lsafun: An r package for computations based on latent semantic analysis</article-title>. <source><italic>Behavior Research Methods</italic></source>, <volume>47</volume>(<issue>4</issue>): <fpage>930</fpage>–<lpage>944</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Guo</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sengupta</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hong</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mitra</surname> <given-names>T</given-names></string-name> (<year>2020</year>). <article-title>Online social deception and its countermeasures: A survey</article-title>. <source><italic>IEEE Access</italic></source>, <volume>9</volume>: <fpage>1770</fpage>–<lpage>1806</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Harrington</surname> <given-names>MM</given-names></string-name> (<year>2005</year>). <article-title>Revisiting medical error: Five years after the iom report, have reporting systems made a measurable difference</article-title>. <source><italic>Health Matrix</italic></source>, <volume>15</volume>: <fpage>329</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_018">
<mixed-citation publication-type="chapter"> <string-name><surname>Hofmann</surname> <given-names>T</given-names></string-name> (<year>1999</year>). <chapter-title>Probabilistic latent semantic indexing</chapter-title>. In: <source><italic>Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</italic></source>, <fpage>50</fpage>–<lpage>57</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Huberman</surname> <given-names>BA</given-names></string-name>, <string-name><surname>Adamic</surname> <given-names>LA</given-names></string-name> (<year>1999</year>). <article-title>Internet: growth dynamics of the World-Wide Web</article-title>. <source><italic>Nature</italic></source>, <volume>401</volume>: <fpage>131</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Jin</surname> <given-names>J</given-names></string-name> (<year>2015</year>). <article-title>Fast community detection by SCORE</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>43</volume>(<issue>1</issue>): <fpage>57</fpage>–<lpage>89</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Johnson</surname> <given-names>C</given-names></string-name> (<year>2003</year>). <article-title>How will we get the data and what will we do with it then? issues in the reporting of adverse healthcare events</article-title>. <source><italic>BMJ Quality &amp; Safety</italic></source>, <volume>12</volume>(<issue>suppl</issue>): <fpage>ii64</fpage>–<lpage>ii67</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Jonsson</surname> <given-names>PF</given-names></string-name>, <string-name><surname>Cavanna</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zicha</surname> <given-names>D</given-names></string-name>, <string-name><surname>Bates</surname> <given-names>PA</given-names></string-name> (<year>2006</year>). <article-title>Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis</article-title>. <source><italic>BMC Bioinformatics</italic></source>, <volume>7</volume>(<issue>1</issue>): <fpage>2</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_023">
<mixed-citation publication-type="book"> <string-name><surname>Kohn</surname> <given-names>LT</given-names></string-name>, <string-name><surname>Corrigan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Donaldson</surname> <given-names>MS</given-names></string-name> (<year>2000</year>). <source><italic>To err is Human: Building a Safer Health System</italic></source>. <publisher-name>National Academy Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_024">
<mixed-citation publication-type="journal"> <string-name><surname>Landauer</surname> <given-names>TK</given-names></string-name>, <string-name><surname>Foltz</surname> <given-names>PW</given-names></string-name>, <string-name><surname>Laham</surname> <given-names>D</given-names></string-name> (<year>1998</year>). <article-title>An introduction to latent semantic analysis</article-title>. <source><italic>Discourse processes</italic></source>, <volume>25</volume>(<issue>2–3</issue>): <fpage>259</fpage>–<lpage>284</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Leitch</surname> <given-names>J</given-names></string-name>, <string-name><surname>Alexander</surname> <given-names>KA</given-names></string-name>, <string-name><surname>Sengupta</surname> <given-names>S</given-names></string-name> (<year>2019</year>). <article-title>Toward epidemic thresholds on temporal networks: a review and open questions</article-title>. <source><italic>Applied Network Science</italic></source>, <volume>4</volume>(<issue>1</issue>): <fpage>105</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_026">
<mixed-citation publication-type="journal"> <string-name><surname>Lynall</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Bassett</surname> <given-names>DS</given-names></string-name>, <string-name><surname>Kerwin</surname> <given-names>R</given-names></string-name>, <string-name><surname>McKenna</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Kitzbichler</surname> <given-names>M</given-names></string-name>, <string-name><surname>Muller</surname> <given-names>U</given-names></string-name>, <etal>et al.</etal> (<year>2010</year>). <article-title>Functional connectivity and brain networks in schizophrenia</article-title>. <source><italic>Journal of Neuroscience</italic></source>, <volume>30</volume>(<issue>28</issue>): <fpage>9477</fpage>–<lpage>9487</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_027">
<mixed-citation publication-type="other"> <string-name><surname>Makary</surname> <given-names>MA</given-names></string-name> <string-name><surname>Daniel</surname> <given-names>M</given-names></string-name> (2016). Medical error—the third leading cause of death in the US. <italic>Bmj</italic>, 353.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_028">
<mixed-citation publication-type="other"> <string-name><surname>Mikolov</surname> <given-names>T</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>K</given-names></string-name>, <string-name><surname>Corrado</surname> <given-names>G</given-names></string-name>, <string-name><surname>Dean</surname> <given-names>J</given-names></string-name> (2013). Efficient estimation of word representations in vector space. arXiv preprint: <uri>https://arxiv.org/abs/1301.3781</uri>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_029">
<mixed-citation publication-type="other"> <string-name><surname>Mikolov</surname> <given-names>T</given-names></string-name>, <string-name><surname>Grave</surname> <given-names>E</given-names></string-name>, <string-name><surname>Bojanowski</surname> <given-names>P</given-names></string-name>, <string-name><surname>Puhrsch</surname> <given-names>C</given-names></string-name>, <string-name><surname>Joulin</surname> <given-names>A</given-names></string-name> (2017). Advances in pre-training distributed word representations. arXiv preprint: <uri>https://arxiv.org/abs/1712.09405</uri></mixed-citation>
</ref>
<ref id="j_jds1038_ref_030">
<mixed-citation publication-type="journal"> <string-name><surname>Newman</surname> <given-names>MEJ</given-names></string-name>, <string-name><surname>Girvan</surname> <given-names>M</given-names></string-name> (<year>2004</year>). <article-title>Finding and evaluating community structure in networks</article-title>. <source><italic>Physical Review E</italic></source>, <volume>69</volume>(<issue>2</issue>): <fpage>026113</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_031">
<mixed-citation publication-type="journal"> <string-name><surname>Pagani</surname> <given-names>GA</given-names></string-name>, <string-name><surname>Aiello</surname> <given-names>M</given-names></string-name> (<year>2013</year>). <article-title>The power grid as a complex network: A survey</article-title>. <source><italic>Physica A: Statistical Mechanics and its Applications</italic></source>, <volume>392</volume>(<issue>11</issue>): <fpage>2688</fpage>–<lpage>2700</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_032">
<mixed-citation publication-type="journal"> <string-name><surname>Papadimitriou</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Raghavan</surname> <given-names>P</given-names></string-name>, <string-name><surname>Tamaki</surname> <given-names>H</given-names></string-name>, <string-name><surname>Vempala</surname> <given-names>S</given-names></string-name> (<year>2000</year>). <article-title>Latent semantic indexing: A probabilistic analysis</article-title>. <source><italic>Journal of Computer and System Sciences</italic></source>, <volume>61</volume>(<issue>2</issue>): <fpage>217</fpage>–<lpage>235</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_033">
<mixed-citation publication-type="book"> <string-name><surname>Pronovost</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Morlock</surname> <given-names>LL</given-names></string-name>, <string-name><surname>Sexton</surname> <given-names>JB</given-names></string-name>, <string-name><surname>Miller</surname> <given-names>MR</given-names></string-name>, <string-name><surname>Holzmueller</surname> <given-names>CG</given-names></string-name>, <string-name><surname>Thompson</surname> <given-names>DA</given-names></string-name>, <etal>et al.</etal> (<year>2008</year>). <source><italic>Improving the value of patient safety reporting systems</italic></source> <series><italic>Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 1: Assessment)</italic></series>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_034">
<mixed-citation publication-type="journal"> <string-name><surname>Puthumana</surname> <given-names>JS</given-names></string-name>, <string-name><surname>Fong</surname> <given-names>A</given-names></string-name>, <string-name><surname>Blumenthal</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ratwani</surname> <given-names>RM</given-names></string-name> (<year>2021</year>). <article-title>Making patient safety event data actionable: understanding patient safety analyst needs</article-title>. <source><italic>Journal of Patient Safety</italic></source>, <volume>17</volume>(<issue>6</issue>): <fpage>e509</fpage>–<lpage>e514</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_035">
<mixed-citation publication-type="chapter"> <string-name><surname>Ramos</surname> <given-names>J</given-names></string-name>, <etal>et al.</etal> (<year>2003</year>). <chapter-title>Using tf-idf to determine word relevance in document queries</chapter-title>. In: <source><italic>Proceedings of the First Instructional Conference on Machine Learning</italic></source>, volume <volume>242</volume>, <fpage>29</fpage>–<lpage>48</lpage>. <publisher-name>Citeseer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_036">
<mixed-citation publication-type="journal"> <string-name><surname>Rohe</surname> <given-names>K</given-names></string-name>, <string-name><surname>Chatterjee</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>B</given-names></string-name> (<year>2011</year>). <article-title>Spectral clustering and the high-dimensional stochastic blockmodel</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>39</volume>(<issue>4</issue>): <fpage>1878</fpage>–<lpage>1915</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_037">
<mixed-citation publication-type="book"> <string-name><surname>Rosenthal</surname> <given-names>J</given-names></string-name>, <string-name><surname>Booth</surname> <given-names>M</given-names></string-name> (<year>2005</year>). <source><italic>Maximizing the use of State Adverse Event Data to Improve Patient Safety</italic></source>. <publisher-name>National Academy for State Health Policy</publisher-name>, <publisher-loc>Portland, ME</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_038">
<mixed-citation publication-type="journal"> <string-name><surname>Sengupta</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name> (<year>2015</year>). <article-title>Spectral clustering in heterogeneous networks</article-title>. <source><italic>Statistica Sinica</italic></source>, <volume>25</volume>: <fpage>1081</fpage>–<lpage>1106</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_039">
<mixed-citation publication-type="journal"> <string-name><surname>Sengupta</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name> (<year>2018</year>). <article-title>A block model for node popularity in networks with community structure</article-title>. <source><italic>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</italic></source>, <volume>80</volume>(<issue>2</issue>): <fpage>365</fpage>–<lpage>386</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_040">
<mixed-citation publication-type="other"> <string-name><surname>The White House</surname></string-name> (2020). <italic>Clinton-gore administration announces new actions to improve patient safety and assure health care quality.</italic> <uri>https://clintonwhitehouse4.archives.gov/textonly/WH/New/html/20000222_1.html</uri>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_041">
<mixed-citation publication-type="chapter"> <string-name><surname>Turney</surname> <given-names>PD</given-names></string-name> (<year>2001</year>). <chapter-title>Mining the web for synonyms: PMI-IR versus LSA on TOEFL</chapter-title>. In: <source><italic>European Conference on Machine Learning</italic></source>, <fpage>491</fpage>–<lpage>502</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_042">
<mixed-citation publication-type="journal"> <string-name><surname>Vijayarani</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ilamathi</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Nithya</surname> <given-names>M</given-names></string-name>, <etal>et al.</etal> (<year>2015</year>). <article-title>Preprocessing techniques for text mining-an overview</article-title>. <source><italic>International Journal of Computer Science &amp; Communication Networks</italic></source>, <volume>5</volume>(<issue>1</issue>): <fpage>7</fpage>–<lpage>16</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_043">
<mixed-citation publication-type="other"> <string-name><surname>Wild</surname> <given-names>F</given-names></string-name>, <string-name><surname>Stahl</surname> <given-names>C</given-names></string-name>, <string-name><surname>Stermsek</surname> <given-names>G</given-names></string-name>, <string-name><surname>Neumann</surname> <given-names>G</given-names></string-name> (2005). Parameters driving effectiveness of automated essay scoring with lsa.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_044">
<mixed-citation publication-type="other"> <string-name><surname>Yan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jia</surname> <given-names>Y</given-names></string-name> <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, (2021). Overlapping community detection in temporal text networks.</mixed-citation>
</ref>
<ref id="j_jds1038_ref_045">
<mixed-citation publication-type="journal"> <string-name><surname>Zhao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Levina</surname> <given-names>E</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name> (<year>2011</year>). <article-title>Community extraction for social networks</article-title>. <source><italic>Proceedings of the National Academy of Sciences</italic></source>, <volume>108</volume>(<issue>18</issue>): <fpage>7321</fpage>–<lpage>7326</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
