On Classifying At Risk Latent Zeros Using Zero Inflated Models
Volume 12, Issue 2 (2014), pp. 307–323
Pub. online: 4 August 2022
Type: Research Article
Open Access
Published
4 August 2022
4 August 2022
Abstract
Abstract: Count data often have excess zeros in many clinical studies. These zeros usually represent “disease-free state”. Although disease (event) free at the time, some of them might be at a high risk of having the putative outcome while others may be at low or no such risk. We postulate these zeros as a one of the two types, either as ‘low risk’ or as ‘high risk’ zeros for the disease process in question. Low risk zeros can arise due to the absence of risk factors for disease initiation/progression and/or due to very early stage of the disease. High risk zeros can arise due to the presence of significant risk factors for disease initiation/ progression or could be, in rare situations, due to misclassification, more specific diagnostic tests, or below the level of detection. We use zero inflated models which allows us to assume that zeros arise from one of the two separate latent processes-one giving low-risk zeros and the other high-risk zeros and subsequently propose a strategy to identify and classify them as such. To illustrate, we use data on the number of involved nodes in breast cancer patients. Of the 1152 patients studied, 38.8% were node- negative (zeros). The model predicted that about a third (11.4%) of negative nodes are “high risk” and the remaining (27.4%) are at “low risk” of nodal positivity. Posterior probability based classification was more appropriate compared to other methods. Our approach indicates that some node negative patients may be re-assessed for their diagnosis about nodal positivity and/or for future clinical management of their disease. The approach developed here is applicable to any scenario where the disease or outcome can be characterized by count-data.