Estimates of county-level disease prevalence have a variety of applications. Such estimation is often done via model-based small-area estimation using survey data. However, for conditions with low prevalence (i.e., rare diseases or newly diagnosed diseases), counties with a high fraction of zero counts in surveys are common. They are often more common than the model used would lead one to expect; such zeros are called ‘excess zeros’. The excess zeros can be structural (there are no cases to find) or sampling (there are cases, but none were selected for sampling). These issues are often addressed by combining multiple years of data. However, this approach can obscure trends in annual estimates and prevent estimates from being timely. Using single-year survey data, we proposed a Bayesian weighted Binomial Zero-inflated (BBZ) model to estimate county-level rare diseases prevalence. The BBZ model accounts for excess zero counts, the sampling weights and uses a power prior. We evaluated BBZ with American Community Survey results and simulated data. We showed that BBZ yielded less bias and smaller variance than estimates based on the binomial distribution, a common approach to this problem. Since BBZ uses only a single year of survey data, BBZ produces more timely county-level incidence estimates. These timely estimates help pinpoint the special areas of county-level needs and help medical researchers and public health practitioners promptly evaluate rare diseases trends and associations with other health conditions.
Bayesian hierarchical regression (BHR) is often used in small area estimation (SAE). BHR conditions on the samples. Therefore, when data are from a complex sample survey, neither survey sampling design nor survey weights are used. This can introduce bias and/or cause large variance. Further, if non-informative priors are used, BHR often requires the combination of multiple years of data to produce sample sizes that yield adequate precision; this can result in poor timeliness and can obscure trends. To address bias and variance, we propose a design assisted model-based approach for SAE by integrating adjusted sample weights. To address timeliness, we use historical data to define informative priors (power prior); this allows estimates to be derived from a single year of data. Using American Community Survey data for validation, we applied the proposed method to Behavioral Risk Factor Surveillance System data. We estimated the prevalence of disability for all U.S. counties. We show that our method can produce estimates that are both more timely than those arising from widely-used alternatives and are closer to ACS’ direct estimates, particularly for low-data counties. Our method can be generalized to estimate the county-level prevalence of other health related measurements.
Abstract: In the United States, diabetes is common and costly. Programs to prevent new cases of diabetes are often carried out at the level of the county, a unit of local government. Thus, efficient targeting of such programs re quires county-level estimates of diabetes incidence−the fraction of the non diabetic population who received their diagnosis of diabetes during the past 12 months. Previously, only estimates of prevalence−the overall fraction of population who have the disease−have been available at the county level. Counties with high prevalence might or might not be the same as counties with high incidence, due to spatial variation in mortality and relocation of persons with incident diabetes to another county. Existing methods cannot be used to estimate county-level diabetes incidence, because the fraction of the population who receive a diabetes diagnosis in any year is too small. Here, we extend previously developed methods of Bayesian small-area esti mation of prevalence, using diffuse priors, to estimate diabetes incidence for all U.S. counties based on data from a survey designed to yield state-level estimates. We found high incidence in the southeastern United States, the Appalachian region, and in scattered counties throughout the western U.S. Our methods might be applicable in other circumstances in which all cases of a rare condition also must be cases of a more common condition (in this analysis, “newly diagnosed cases of diabetes” and “cases of diabetes”). If ap propriate data are available, our methods can be used to estimate proportion of the population with the rare condition at greater geographic specificity than the data source was designed to provide.
Abstract: The National Immunization Survey (NIS) is the United States’ primary tool for assessing immunization coverage among 19- to 35-monthold children. Although annual estimates from the NIS are quite precise at the national level, US State-level estimates have much larger sampling error than national-level estimates. We combined two independent unbiased estimates of US State-level coverages within a given year to obtain new estimates which are more precise than previously published estimates. We first calculated a model-based estimate for each State for 2001 using multiple years of NIS data. Next, we combined each model-based estimate with the corresponding, previously reported NIS estimate for 2001. Our resulting estimates of State-level immunization coverage had smaller standard errors than the previously published estimates. To make similar improvements in precision by increasing sample size would, depending on State, require an increase in sample size of 30% – 120%.