Estimates of county-level disease prevalence have a variety of applications. Such estimation is often done via model-based small-area estimation using survey data. However, for conditions with low prevalence (i.e., rare diseases or newly diagnosed diseases), counties with a high fraction of zero counts in surveys are common. They are often more common than the model used would lead one to expect; such zeros are called ‘excess zeros’. The excess zeros can be structural (there are no cases to find) or sampling (there are cases, but none were selected for sampling). These issues are often addressed by combining multiple years of data. However, this approach can obscure trends in annual estimates and prevent estimates from being timely. Using single-year survey data, we proposed a Bayesian weighted Binomial Zero-inflated (BBZ) model to estimate county-level rare diseases prevalence. The BBZ model accounts for excess zero counts, the sampling weights and uses a power prior. We evaluated BBZ with American Community Survey results and simulated data. We showed that BBZ yielded less bias and smaller variance than estimates based on the binomial distribution, a common approach to this problem. Since BBZ uses only a single year of survey data, BBZ produces more timely county-level incidence estimates. These timely estimates help pinpoint the special areas of county-level needs and help medical researchers and public health practitioners promptly evaluate rare diseases trends and associations with other health conditions.
Bayesian hierarchical regression (BHR) is often used in small area estimation (SAE). BHR conditions on the samples. Therefore, when data are from a complex sample survey, neither survey sampling design nor survey weights are used. This can introduce bias and/or cause large variance. Further, if non-informative priors are used, BHR often requires the combination of multiple years of data to produce sample sizes that yield adequate precision; this can result in poor timeliness and can obscure trends. To address bias and variance, we propose a design assisted model-based approach for SAE by integrating adjusted sample weights. To address timeliness, we use historical data to define informative priors (power prior); this allows estimates to be derived from a single year of data. Using American Community Survey data for validation, we applied the proposed method to Behavioral Risk Factor Surveillance System data. We estimated the prevalence of disability for all U.S. counties. We show that our method can produce estimates that are both more timely than those arising from widely-used alternatives and are closer to ACS’ direct estimates, particularly for low-data counties. Our method can be generalized to estimate the county-level prevalence of other health related measurements.