Modeling County-Level Rare Disease Prevalence Using Bayesian Hierarchical Sampling Weighted Zero-Inflated Regression✩
Volume 21, Issue 1 (2023), pp. 145–157
Pub. online: 22 June 2022
Type: Statistical Data Science
Open Access
✩
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Received
14 December 2021
14 December 2021
Accepted
26 April 2022
26 April 2022
Published
22 June 2022
22 June 2022
Abstract
Estimates of county-level disease prevalence have a variety of applications. Such estimation is often done via model-based small-area estimation using survey data. However, for conditions with low prevalence (i.e., rare diseases or newly diagnosed diseases), counties with a high fraction of zero counts in surveys are common. They are often more common than the model used would lead one to expect; such zeros are called ‘excess zeros’. The excess zeros can be structural (there are no cases to find) or sampling (there are cases, but none were selected for sampling). These issues are often addressed by combining multiple years of data. However, this approach can obscure trends in annual estimates and prevent estimates from being timely. Using single-year survey data, we proposed a Bayesian weighted Binomial Zero-inflated (BBZ) model to estimate county-level rare diseases prevalence. The BBZ model accounts for excess zero counts, the sampling weights and uses a power prior. We evaluated BBZ with American Community Survey results and simulated data. We showed that BBZ yielded less bias and smaller variance than estimates based on the binomial distribution, a common approach to this problem. Since BBZ uses only a single year of survey data, BBZ produces more timely county-level incidence estimates. These timely estimates help pinpoint the special areas of county-level needs and help medical researchers and public health practitioners promptly evaluate rare diseases trends and associations with other health conditions.
Supplementary material
Supplementary MaterialFigure 4: Agreement between BRFSS model-based estimates and ACS 1-year reports of county-level DDRS based on 225 selected counties in 2015. The reference line denotes if model-based estimates and standard references (e.g., ACS 1-year report) were identical. Among the four models (BHBI, BZBI, BPLW and BBZ), estimates of BHBI and BZBI present both large variances and bias; Most counties have a positive estimated bias. Estimates of BBZ tend to stay closer to the reference line with least bias and variance. These results are matched with those in 2019. Figure 5: Agreement between BRFSS model-based estimates and ACS 1-year reports of county-level DDRS based on 225 selected counties in 2016. The reference line denotes if model-based estimates and standard references (e.g., ACS 1-year report) were identical. Among the four models (BHBI, BZBI, BPLW and BBZ), estimates of BHBI and BZBI present both large variances and bias; Most counties have a positive estimated bias. Estimates of BBZ tend to stay closer to the reference line with least bias and variance. These results are matched with those in 2019.
References
Sugasawa S, Kubokawa T (2020). Small area estimation with mixed models: a review. Japanese Journal of Statistics and Data Science. https://doi.org/10.1007/s42081-020-00076-x.
Best N, Richardson S, Clarke P, et al. (2019). A comparison of model-based methods for small area estimation. BIAS project report. http://www.bias-project.org.uk/papers/ComparisonSAE.pdf (Accessed August 2019).
Centers for Disease Control and Prevention. National Center for chronic disease prevention and health promotion. National Diabetes Statistics Report, 2017: Estimates of Diabetes and Its Burden in the United States. www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf (Accessed December 2017).