Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 5, Issue 1 (2007)
  4. Using Occupancy Models to Estimate the N ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Using Occupancy Models to Estimate the Number of Duplicate Cases in a Data System without Unique Identifiers
Volume 5, Issue 1 (2007), pp. 53–66
Ruiguang Song   Timothy Green   Matthew McKenna     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/JDS.2007.05(1).316
Pub. online: 4 August 2022      Type: Research Article      Open accessOpen Access

Published
4 August 2022

Abstract

Abstract: Data systems collecting information from different sources or over long periods of time can receive multiple reports from the same indi vidual. An important example is public health surveillance systems that monitor conditions with long natural histories. Several state-level systems for surveillance of one such condition, the human immunodeficiency virus (HIV), use codes composed of combinations of non-unique personal charac teristics such as birth date, soundex (a code based on last name), and sex as patient identifiers. As a result, these systems cannot distinguish between several different individuals having identical codes and a unique individual erroneously represented several times. We applied results for occupancy models to estimate the potential magnitude of duplicate case counting for AIDS cases reported to the Centers for Disease Control and Prevention with only non-unique partial personal identifiers. Occupancy models with equal and unequal occupancy probabilities are considered. Unbiased estimators for the numbers of true duplicates within and between case reporting areas are provided. Formulas to calculate estimators’ variances are also provided. These results can be applied to evaluating duplicate reporting in other data systems that have no unique identifier for each individual.

PDF XML
PDF XML

Copyright
No copyright data available.

Metrics
since February 2021
523

Article info
views

369

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy