<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1140</article-id>
<article-id pub-id-type="doi">10.6339/24-JDS1140</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>A Two-Stage Classification for Dealing with Unseen Clusters in the Testing Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-6683-1822</contrib-id>
<name><surname>Lee</surname><given-names>Jung Wun</given-names></name><email xlink:href="mailto:jwlee@hsph.harvard.edu">jwlee@hsph.harvard.edu</email><xref ref-type="aff" rid="j_jds1140_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-1054-3055</contrib-id>
<name><surname>Harel</surname><given-names>Ofer</given-names></name><email xlink:href="mailto:ofer.harel@uconn.edu">ofer.harel@uconn.edu</email><xref ref-type="aff" rid="j_jds1140_aff_002">2</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1140_aff_001"><label>1</label>Department of Biostatistics, <institution>Harvard University</institution>, Boston, MA, 02115, <country>USA</country></aff>
<aff id="j_jds1140_aff_002"><label>2</label>Department of Statistics, <institution>University of Connecticut</institution>, Storrs, CT, 06269, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:jwlee@hsph.harvard.edu">jwlee@hsph.harvard.edu</ext-link> or <ext-link ext-link-type="uri" xlink:href="mailto:ofer.harel@uconn.edu">ofer.harel@uconn.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>2</day><month>7</month><year>2024</year></pub-date><volume>23</volume><issue>1</issue><fpage>188</fpage><lpage>207</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1140_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>
<list>
<list-item id="j_jds1140_li_001">
<label>•</label>
<p>Supplementary document: The supplementary document provides the proofs of the Theorems 1, 2, and 3, and additional numerical study results.</p>
</list-item>
<list-item id="j_jds1140_li_002">
<label>•</label>
<p>Software: R codes for the proposed methods and algorithms.</p>
</list-item>
</list> 
</p>
</caption>
</supplementary-material><history><date date-type="received"><day>21</day><month>1</month><year>2024</year></date><date date-type="accepted"><day>28</day><month>4</month><year>2024</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Classification is an important statistical tool that has increased its importance since the emergence of the data science revolution. However, a training data set that does not capture all underlying population subgroups (or clusters) will result in biased estimates or misclassification. In this paper, we introduce a statistical and computational solution to a possible bias in classification when implemented on estimated population clusters. An unseen-cluster problem denotes the case in which the training data does not contain all underlying clusters in the population. Such a scenario may occur due to various reasons, such as sampling errors, selection bias, or emerging and disappearing population clusters. Once an unseen-cluster problem occurs, a testing observation will be misclassified because a classification rule based on the sample cannot capture a cluster not observed in the training data (sample). To overcome such issues, we suggest a two-stage classification method to ameliorate the unseen-cluster problem in classification. We suggest a test to identify the unseen-cluster problem and demonstrate the performance of the two-stage tailored classifier using simulations and a public data example.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>classification</kwd>
<kwd>cluster analysis</kwd>
<kwd>open set recognition</kwd>
<kwd>outlier detection</kwd>
</kwd-group>
<funding-group><funding-statement>This work was partially supported by the National Science Foundation under grant DMS-2015320.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1140_reflist_001">
<title>References</title>
<ref id="j_jds1140_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Bartlett</surname> <given-names>PL</given-names></string-name>, <string-name><surname>Wegkamp</surname> <given-names>MH</given-names></string-name> (<year>2008</year>). <article-title>Classification with a reject option using a hinge loss</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>9</volume>(<issue>8</issue>): <fpage>1823</fpage>–<lpage>1840</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_002">
<mixed-citation publication-type="chapter"> <string-name><surname>Bendale</surname> <given-names>A</given-names></string-name>, <string-name><surname>Boult</surname> <given-names>T</given-names></string-name> (<year>2015</year>). <chapter-title>Towards open world recognition</chapter-title>. In: <source><italic>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</italic></source>, <fpage>1893</fpage>–<lpage>1902</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Bethlehem</surname> <given-names>J</given-names></string-name> (<year>2010</year>). <article-title>Selection bias in web surveys</article-title>. <source><italic>International Statistical Review</italic></source>, <volume>78</volume>(<issue>2</issue>): <fpage>161</fpage>–<lpage>188</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.1751-5823.2010.00112.x" xlink:type="simple">https://doi.org/10.1111/j.1751-5823.2010.00112.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Bouveyron</surname> <given-names>C</given-names></string-name> (<year>2014</year>). <article-title>Adaptive mixture discriminant analysis for supervised learning with unobserved classes</article-title>. <source><italic>Journal of Classification</italic></source>, <volume>31</volume>: <fpage>49</fpage>–<lpage>84</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s00357-014-9147-x" xlink:type="simple">https://doi.org/10.1007/s00357-014-9147-x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Cappozzo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Greselin</surname> <given-names>F</given-names></string-name>, <string-name><surname>Murphy</surname> <given-names>TB</given-names></string-name> (<year>2020</year>). <article-title>Anomaly and novelty detection for robust semi-supervised learning</article-title>. <source><italic>Statistics and Computing</italic></source>, <volume>30</volume>(<issue>5</issue>): <fpage>1545</fpage>–<lpage>1571</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11222-020-09959-1" xlink:type="simple">https://doi.org/10.1007/s11222-020-09959-1</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Clifton</surname> <given-names>DA</given-names></string-name>, <string-name><surname>Hugueny</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tarassenko</surname> <given-names>L</given-names></string-name> (<year>2011</year>). <article-title>Novelty detection with multivariate extreme value statistics</article-title>. <source><italic>Journal of Signal Processing Systems</italic></source>, <volume>65</volume>(<issue>3</issue>): <fpage>371</fpage>–<lpage>389</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11265-010-0513-6" xlink:type="simple">https://doi.org/10.1007/s11265-010-0513-6</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Dempster</surname> <given-names>AP</given-names></string-name>, <string-name><surname>Laird</surname> <given-names>NM</given-names></string-name>, <string-name><surname>Rubin</surname> <given-names>DB</given-names></string-name> (<year>1977</year>). <article-title>Maximum likelihood from incomplete data via the em algorithm</article-title>. <source><italic>Journal of the Royal Statistical Society, Series B, Methodological</italic></source>, <volume>39</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>22</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.2517-6161.1977.tb01600.x" xlink:type="simple">https://doi.org/10.1111/j.2517-6161.1977.tb01600.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Denti</surname> <given-names>F</given-names></string-name>, <string-name><surname>Cappozzo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Greselin</surname> <given-names>F</given-names></string-name> (<year>2021</year>). <article-title>A two-stage Bayesian semiparametric model for novelty detection with robust prior information</article-title>. <source><italic>Statistics and Computing</italic></source>, <volume>31</volume>(<issue>4</issue>): <fpage>42</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11222-021-10017-7" xlink:type="simple">https://doi.org/10.1007/s11222-021-10017-7</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_009">
<mixed-citation publication-type="chapter"> <string-name><surname>Doan</surname> <given-names>T</given-names></string-name>, <string-name><surname>Kalita</surname> <given-names>J</given-names></string-name> (<year>2017</year>). <chapter-title>Overcoming the challenge for text classification in the open world</chapter-title>. In: <source><italic>2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC)</italic></source>, <fpage>1</fpage>–<lpage>7</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_010">
<mixed-citation publication-type="other"> <string-name><surname>Feinman</surname> <given-names>R</given-names></string-name>, <string-name><surname>Curtin</surname> <given-names>RR</given-names></string-name>, <string-name><surname>Shintre</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gardner</surname> <given-names>AB</given-names></string-name> (<year>2017</year>). Detecting adversarial samples from artifacts. arXiv preprint: <uri>https://arxiv.org/abs/1703.00410</uri>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Geng</surname> <given-names>C</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>Sj</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>S</given-names></string-name> (<year>2020</year>). <article-title>Recent advances in open set recognition: A survey</article-title>. <source><italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic></source>, <volume>43</volume>(<issue>10</issue>): <fpage>3614</fpage>–<lpage>3631</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TPAMI.2020.2981604" xlink:type="simple">https://doi.org/10.1109/TPAMI.2020.2981604</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_012">
<mixed-citation publication-type="other"> <string-name><surname>Grosse</surname> <given-names>K</given-names></string-name>, <string-name><surname>Manoharan</surname> <given-names>P</given-names></string-name>, <string-name><surname>Papernot</surname> <given-names>N</given-names></string-name>, <string-name><surname>Backes</surname> <given-names>M</given-names></string-name>, <string-name><surname>McDaniel</surname> <given-names>P</given-names></string-name> (<year>2017</year>). On the (statistical) detection of adversarial examples. arXiv preprint: <uri>https://arxiv.org/abs/1702.06280</uri>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>He</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>S</given-names></string-name> (<year>2003</year>). <article-title>Discovering cluster-based local outliers</article-title>. <source><italic>Pattern Recognition Letters</italic></source>, <volume>24</volume>(<issue>9–10</issue>): <fpage>1641</fpage>–<lpage>1650</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/S0167-8655(03)00003-5" xlink:type="simple">https://doi.org/10.1016/S0167-8655(03)00003-5</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Hodge</surname> <given-names>V</given-names></string-name>, <string-name><surname>Austin</surname> <given-names>J</given-names></string-name> (<year>2004</year>). <article-title>A survey of outlier detection methodologies</article-title>. <source><italic>Artificial Intelligence Review</italic></source>, <volume>22</volume>(<issue>2</issue>): <fpage>85</fpage>–<lpage>126</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1023/B:AIRE.0000045502.10941.a9" xlink:type="simple">https://doi.org/10.1023/B:AIRE.0000045502.10941.a9</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_015">
<mixed-citation publication-type="chapter"> <string-name><surname>Klawonn</surname> <given-names>F</given-names></string-name>, <string-name><surname>Höppner</surname> <given-names>F</given-names></string-name>, <string-name><surname>Jayaram</surname> <given-names>B</given-names></string-name> (<year>2012</year>). <chapter-title>What are clusters in high dimensions and are they difficult to find?</chapter-title> In: <source><italic>Clustering High-Dimensional Data</italic></source>, <fpage>14</fpage>–<lpage>33</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Koklu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ozkan</surname> <given-names>IA</given-names></string-name> (<year>2020</year>). <article-title>Multiclass classification of dry beans using computer vision and machine learning techniques</article-title>. <source><italic>Computers and Electronics in Agriculture</italic></source>, <volume>174</volume>: <fpage>105507</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.compag.2020.105507" xlink:type="simple">https://doi.org/10.1016/j.compag.2020.105507</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_017">
<mixed-citation publication-type="chapter"> <string-name><surname>Lee</surname> <given-names>K</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>K</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>H</given-names></string-name>, <string-name><surname>Shin</surname> <given-names>J</given-names></string-name> (<year>2018</year>). <chapter-title>A simple unified framework for detecting out-of-distribution samples and adversarial attacks</chapter-title>. In: <source><italic>Advances in Neural Information Processing Systems</italic></source>, volume <volume>31</volume> (<string-name><given-names>S</given-names> <surname>Bengio</surname></string-name>, <string-name><given-names>H</given-names> <surname>Wallach</surname></string-name>, <string-name><given-names>H</given-names> <surname>Larochelle</surname></string-name>, <string-name><given-names>K</given-names> <surname>Grauman</surname></string-name>, <string-name><given-names>N</given-names> <surname>Cesa-Bianchi</surname></string-name>, <string-name><given-names>R</given-names> <surname>Garnett</surname></string-name>, eds.).</mixed-citation>
</ref>
<ref id="j_jds1140_ref_018">
<mixed-citation publication-type="other"> <string-name><surname>Liang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Srikant</surname> <given-names>R</given-names></string-name> (<year>2017</year>). Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint: <uri>https://arxiv.org/abs/1706.02690</uri>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Lo</surname> <given-names>AY</given-names></string-name> (<year>1984</year>). <article-title>On a class of Bayesian nonparametric estimates: I. density estimates</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>12</volume>(<issue>1</issue>): <fpage>351</fpage>–<lpage>357</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_020">
<mixed-citation publication-type="other"> <string-name><surname>Lonij</surname> <given-names>V</given-names></string-name>, <string-name><surname>Rawat</surname> <given-names>A</given-names></string-name>, <string-name><surname>Nicolae</surname> <given-names>MI</given-names></string-name> (<year>2017</year>). Open-world visual recognition using knowledge graphs. arXiv preprint: <uri>https://arxiv.org/abs/1708.08310</uri>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_021">
<mixed-citation publication-type="other"> <string-name><surname>Ma</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Erfani</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Wijewickrema</surname> <given-names>S</given-names></string-name>, <string-name><surname>Schoenebeck</surname> <given-names>G</given-names></string-name>, et al. (<year>2018</year>). Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint: <uri>https://arxiv.org/abs/1801.02613</uri>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_022">
<mixed-citation publication-type="chapter"> <string-name><surname>Miller</surname> <given-names>DJ</given-names></string-name>, <string-name><surname>Browning</surname> <given-names>J</given-names></string-name> (<year>2003</year>). <chapter-title>A mixture model framework for class discovery and outlier detection in mixed labeled/unlabeled data sets</chapter-title>. In: <source><italic>2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No. 03TH8718)</italic></source>, <fpage>489</fpage>–<lpage>498</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_023">
<mixed-citation publication-type="other"> <string-name><surname>Papernot</surname> <given-names>N</given-names></string-name>, <string-name><surname>McDaniel</surname> <given-names>P</given-names></string-name> (<year>2018</year>). Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint: <uri>https://arxiv.org/abs/1803.04765</uri>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_024">
<mixed-citation publication-type="journal"> <string-name><surname>Pimentel</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Clifton</surname> <given-names>DA</given-names></string-name>, <string-name><surname>Clifton</surname> <given-names>L</given-names></string-name>, <string-name><surname>Tarassenko</surname> <given-names>L</given-names></string-name> (<year>2014</year>). <article-title>A review of novelty detection</article-title>. <source><italic>Signal Processing</italic></source>, <volume>99</volume>: <fpage>215</fpage>–<lpage>249</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.sigpro.2013.12.026" xlink:type="simple">https://doi.org/10.1016/j.sigpro.2013.12.026</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Redner</surname> <given-names>RA</given-names></string-name>, <string-name><surname>Walker</surname> <given-names>HF</given-names></string-name> (<year>1984</year>). <article-title>Mixture densities, maximum likelihood and the em algorithm</article-title>. <source><italic>SIAM Review</italic></source>, <volume>26</volume>(<issue>2</issue>): <fpage>195</fpage>–<lpage>239</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1137/1026034" xlink:type="simple">https://doi.org/10.1137/1026034</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_026">
<mixed-citation publication-type="journal"> <string-name><surname>Rousseeuw</surname> <given-names>PJ</given-names></string-name> (<year>1987</year>). <article-title>Silhouettes: A graphical aid to the interpretation and validation of cluster analysis</article-title>. <source><italic>Journal of Computational and Applied Mathematics</italic></source>, <volume>20</volume>: <fpage>53</fpage>–<lpage>65</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/0377-0427(87)90125-7" xlink:type="simple">https://doi.org/10.1016/0377-0427(87)90125-7</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_027">
<mixed-citation publication-type="chapter"> <string-name><surname>Schölkopf</surname> <given-names>B</given-names></string-name>, <string-name><surname>Williamson</surname> <given-names>RC</given-names></string-name>, <string-name><surname>Smola</surname> <given-names>A</given-names></string-name>, <string-name><surname>Shawe-Taylor</surname> <given-names>J</given-names></string-name>, <string-name><surname>Platt</surname> <given-names>J</given-names></string-name> (<year>1999</year>). <chapter-title>Support vector method for novelty detection</chapter-title>. In: <source><italic>Advances in Neural Information Processing Systems</italic></source>, volume <volume>12</volume> (<string-name><given-names>S</given-names> <surname>Solla</surname></string-name>, <string-name><given-names>T</given-names> <surname>Leen</surname></string-name>, <string-name><given-names>K</given-names> <surname>Müller</surname></string-name>, eds.).</mixed-citation>
</ref>
<ref id="j_jds1140_ref_028">
<mixed-citation publication-type="journal"> <string-name><surname>Schwarz</surname> <given-names>G</given-names></string-name> (<year>1978</year>). <article-title>Estimating the dimension of a model</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>6</volume>(<issue>2</issue>): <fpage>461</fpage>–<lpage>464</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_029">
<mixed-citation publication-type="journal"> <string-name><surname>Scrucca</surname> <given-names>L</given-names></string-name>, <string-name><surname>Fop</surname> <given-names>M</given-names></string-name>, <string-name><surname>Murphy</surname> <given-names>TB</given-names></string-name>, <string-name><surname>Raftery</surname> <given-names>AE</given-names></string-name> (<year>2016</year>). <article-title>mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models</article-title>. <source><italic>The R Journal</italic></source>, <volume>8</volume>(<issue>1</issue>): <fpage>289</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.32614/RJ-2016-021" xlink:type="simple">https://doi.org/10.32614/RJ-2016-021</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_030">
<mixed-citation publication-type="book"> <string-name><surname>Shewhart</surname> <given-names>WA</given-names></string-name>, <string-name><surname>Deming</surname> <given-names>WE</given-names></string-name> (<year>1986</year>). <source><italic>Statistical Method from the Viewpoint of Quality Control</italic></source>. <publisher-name>Courier Corporation</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_031">
<mixed-citation publication-type="journal"> <string-name><surname>Sun</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>XF</given-names></string-name>, <string-name><surname>Lafyatis</surname> <given-names>R</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>Y</given-names></string-name>, <etal>et al.</etal> (<year>2018</year>). <article-title>Dimm-sc: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data</article-title>. <source><italic>Bioinformatics</italic></source>, <volume>34</volume>(<issue>1</issue>): <fpage>139</fpage>–<lpage>146</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/bioinformatics/btx490" xlink:type="simple">https://doi.org/10.1093/bioinformatics/btx490</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_032">
<mixed-citation publication-type="journal"> <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name>, <string-name><surname>Walther</surname> <given-names>G</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name> (<year>2001</year>). <article-title>Estimating the number of clusters in a data set via the gap statistic</article-title>. <source><italic>Journal of the Royal Statistical Society, Series B, Statistical Methodology</italic></source>, <volume>63</volume>(<issue>2</issue>): <fpage>411</fpage>–<lpage>423</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/1467-9868.00293" xlink:type="simple">https://doi.org/10.1111/1467-9868.00293</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Wankhade</surname> <given-names>KK</given-names></string-name>, <string-name><surname>Jondhale</surname> <given-names>KC</given-names></string-name>, <string-name><surname>Thool</surname> <given-names>VR</given-names></string-name> (<year>2018</year>). <article-title>A hybrid approach for classification of rare class data</article-title>. <source><italic>Knowledge and Information Systems</italic></source>, <volume>56</volume>(<issue>1</issue>): <fpage>197</fpage>–<lpage>221</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10115-017-1114-5" xlink:type="simple">https://doi.org/10.1007/s10115-017-1114-5</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_034">
<mixed-citation publication-type="journal"> <string-name><surname>Wu</surname> <given-names>CJ</given-names></string-name> (<year>1983</year>). <article-title>On the convergence properties of the em algorithm</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>11</volume>(<issue>1</issue>): <fpage>95</fpage>–<lpage>103</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1140_ref_035">
<mixed-citation publication-type="journal"> <string-name><surname>Xu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Qiao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>L</given-names></string-name> (<year>2016</year>). <article-title>Reviews on determining the number of clusters</article-title>. <source><italic>Applied Mathematics &amp; Information Sciences</italic></source>, <volume>10</volume>(<issue>4</issue>): <fpage>1493</fpage>–<lpage>1512</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18576/amis/100428" xlink:type="simple">https://doi.org/10.18576/amis/100428</ext-link></mixed-citation>
</ref>
<ref id="j_jds1140_ref_036">
<mixed-citation publication-type="journal"> <string-name><surname>Yong</surname> <given-names>SP</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>JD</given-names></string-name>, <string-name><surname>Purvis</surname> <given-names>MK</given-names></string-name> (<year>2012</year>). <article-title>Novelty detection in wildlife scenes through semantic context modelling</article-title>. <source><italic>Pattern Recognition</italic></source>, <volume>45</volume>(<issue>9</issue>): <fpage>3439</fpage>–<lpage>3450</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.patcog.2012.02.036" xlink:type="simple">https://doi.org/10.1016/j.patcog.2012.02.036</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
