<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">120206</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.201404_12(2).0006</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>On Classifying At Risk Latent Zeros Using Zero Inflated Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Dwivedi</surname>
            <given-names>Dwivedi</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Texas Tech University Health Sciences Center</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Rao</surname>
            <given-names>MB</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">University of Cincinnati</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Dwivedi</surname>
            <given-names>Sada Nand</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_002"/>
        </contrib>
        <aff id="j_JDS_aff_002">All India Institute of Medical Sciences</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Deo</surname>
            <given-names>S.V. S.</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_003"/>
        </contrib>
        <aff id="j_JDS_aff_003">All India Institute of Medical Sciences</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Shukla</surname>
            <given-names>Rakesh</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_004"/>
        </contrib>
        <aff id="j_JDS_aff_004">University of Cincinnati</aff>
      </contrib-group>
      <volume>12</volume>
      <issue>2</issue>
      <fpage>307</fpage>
      <lpage>323</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>Abstract: Count data often have excess zeros in many clinical studies. These zeros usually represent “disease-free state”. Although disease (event) free at the time, some of them might be at a high risk of having the putative outcome while others may be at low or no such risk. We postulate these zeros as a one of the two types, either as ‘low risk’ or as ‘high risk’ zeros for the disease process in question. Low risk zeros can arise due to the absence of risk factors for disease initiation/progression and/or due to very early stage of the disease. High risk zeros can arise due to the presence of significant risk factors for disease initiation/ progression or could be, in rare situations, due to misclassification, more specific diagnostic tests, or below the level of detection. We use zero inflated models which allows us to assume that zeros arise from one of the two separate latent processes-one giving low-risk zeros and the other high-risk zeros and subsequently propose a strategy to identify and classify them as such. To illustrate, we use data on the number of involved nodes in breast cancer patients. Of the 1152 patients studied, 38.8% were node- negative (zeros). The model predicted that about a third (11.4%) of negative nodes are “high risk” and the remaining (27.4%) are at “low risk” of nodal positivity. Posterior probability based classification was more appropriate compared to other methods. Our approach indicates that some node negative patients may be re-assessed for their diagnosis about nodal positivity and/or for future clinical management of their disease. The approach developed here is applicable to any scenario where the disease or outcome can be characterized by count-data.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>Count data</kwd>
        <kwd>Classification</kwd>
        <kwd>Low-risk zeros</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
