On Classifying At Risk Latent Zeros Using Zero Inflated Models

Dwivedi, Dwivedi; Rao, MB; Dwivedi, Sada Nand; Deo, S.V. S.; Shukla, Rakesh

doi:10.6339/JDS.201404_12(2).0006

Journal of Data Science

On Classifying At Risk Latent Zeros Using Zero Inflated Models

Volume 12, Issue 2 (2014), pp. 307–323

Dwivedi Dwivedi MB Rao Sada Nand Dwivedi All authors (5)

https://doi.org/10.6339/JDS.201404_12(2).0006

Pub. online: 4 August 2022 Type: Research Article

Open Access

Published
4 August 2022

Abstract

Abstract: Count data often have excess zeros in many clinical studies. These zeros usually represent “disease-free state”. Although disease (event) free at the time, some of them might be at a high risk of having the putative outcome while others may be at low or no such risk. We postulate these zeros as a one of the two types, either as ‘low risk’ or as ‘high risk’ zeros for the disease process in question. Low risk zeros can arise due to the absence of risk factors for disease initiation/progression and/or due to very early stage of the disease. High risk zeros can arise due to the presence of significant risk factors for disease initiation/ progression or could be, in rare situations, due to misclassification, more specific diagnostic tests, or below the level of detection. We use zero inflated models which allows us to assume that zeros arise from one of the two separate latent processes-one giving low-risk zeros and the other high-risk zeros and subsequently propose a strategy to identify and classify them as such. To illustrate, we use data on the number of involved nodes in breast cancer patients. Of the 1152 patients studied, 38.8% were node- negative (zeros). The model predicted that about a third (11.4%) of negative nodes are “high risk” and the remaining (27.4%) are at “low risk” of nodal positivity. Posterior probability based classification was more appropriate compared to other methods. Our approach indicates that some node negative patients may be re-assessed for their diagnosis about nodal positivity and/or for future clinical management of their disease. The approach developed here is applicable to any scenario where the disease or outcome can be characterized by count-data.

No copyright data available.

Keywords

Count data Classification Low-risk zeros

Metrics

since February 2021

684

Article info
views

422

PDF
downloads

RSS

Authors

Abstract

Export citation

Copy and paste formatted citation

Download citation in file