<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">050104</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.2007.05(1).316
</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Using Occupancy Models to Estimate the Number of Duplicate Cases in a Data System without Unique Identifiers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Song</surname>
            <given-names>Ruiguang</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Centers for Disease Control and Prevention</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Green</surname>
            <given-names>Timothy</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">Centers for Disease Control and Prevention</aff>
        <contrib contrib-type="author">
          <name>
            <surname>McKenna</surname>
            <given-names>Matthew</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_002"/>
        </contrib>
        <aff id="j_JDS_aff_002">Centers for Disease Control and Prevention</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Kathleen Glynn</surname>
            <given-names>K.</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_003"/>
        </contrib>
        <aff id="j_JDS_aff_003">Centers for Disease Control and Prevention</aff>
      </contrib-group>
      <volume>5</volume>
      <issue>1</issue>
      <fpage>53</fpage>
      <lpage>66</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>Abstract: Data systems collecting information from different sources or over long periods of time can receive multiple reports from the same indi vidual. An important example is public health surveillance systems that monitor conditions with long natural histories. Several state-level systems for surveillance of one such condition, the human immunodeficiency virus (HIV), use codes composed of combinations of non-unique personal charac teristics such as birth date, soundex (a code based on last name), and sex as patient identifiers. As a result, these systems cannot distinguish between several different individuals having identical codes and a unique individual erroneously represented several times. We applied results for occupancy models to estimate the potential magnitude of duplicate case counting for AIDS cases reported to the Centers for Disease Control and Prevention with only non-unique partial personal identifiers. Occupancy models with equal and unequal occupancy probabilities are considered. Unbiased estimators for the numbers of true duplicates within and between case reporting areas are provided. Formulas to calculate estimators’ variances are also provided. These results can be applied to evaluating duplicate reporting in other data systems that have no unique identifier for each individual.</p>
      </abstract>
    </article-meta>
  </front>
</article>
