Increasing the Precision of Estimates of Immunization Coverage Among 19-to 35-Month-Old Children in the United States

The National Immunization Survey (NIS) is the United States’ primary tool for assessing immunization coverage among 19to 35-monthold children. Although annual estimates from the NIS are quite precise at the national level, US State-level estimates have much larger sampling error than national-level estimates. We combined two independent unbiased estimates of US State-level coverages within a given year to obtain new estimates which are more precise than previously published estimates. We first calculated a model-based estimate for each State for 2001 using multiple years of NIS data. Next, we combined each model-based estimate with the corresponding, previously reported NIS estimate for 2001. Our resulting estimates of State-level immunization coverage had smaller standard errors than the previously published estimates. To make similar improvements in precision by increasing sample size would, depending on State, require an increase in sample size of 30% – 120%.


Background
Childhood immunizations in the United States have made rare many diseases that once caused enormous morbidity and mortality.Maintaining high rates of immunization in each new birth cohort is the key to continued success; coverage among infants and young children is monitored by the National Immunization Survey (NIS), which estimates coverage nationally, for the 51 US States (we treat the District of Columbia as a State, resulting in one more State than the usual 50), and for selected urban areas.The NIS produces annual estimates of immunization coverage for each geographic area considered.
At the national level, annual NIS coverage estimates are quite precise, with standard errors smaller than 0.5 percentage points.At the level of US States and urban areas, which have smaller sample sizes than the national estimate, standard errors are of size up to 3 percentage points.However, precise estimates are most desirable for States and local areas.Concentrations of un-immunized children could result in the transmission, or even re-establishment, of disease; such concentrations might be more likely in States or cities with lower immunization coverage.Further, preventive measures are more easily implemented on a State or local level than on a national level.
We present a method for improving the precision of annual NIS estimates, using as examples coverage for the 4:3:1:3:3 vaccine series (4 or more doses of diphtheria and tetanus toxoids and pertussis vaccine; three or more doses of poliovirus vaccine; one or more doses of measles-containing vaccine; three or more doses of Haemophilus influenzae type b vaccine; and 3 or more doses of hepatitis B vaccine) in the 51 US States for the year 2001.This method might be applicable to other ongoing surveys, both in and outside the US.

Methods
We used a basic statistical principle (i.e., combining two unbiased, independent estimates of the same parameter into a new estimate with smaller variance) to obtain State-level immunization coverage estimates that are more precise than those previously published.We first used NIS data for 1998-2001 to calculate a model-based estimate of coverage for each State for 2001 that was statistically independent of the previously published estimate.Next, we combined each model-based estimate with the corresponding, previously reported NIS estimate for 2001.

NIS methods
The NIS, conducted by the Centers for Disease Control and Prevention, is a large, ongoing, telephone survey used to provide annual estimates of immunization coverage among 19-to 35-month-old children.A random-digit-dialing telephone survey identifies households with an age-eligible child.In eligible households, the respondent is asked for demographic and socioeconomic information and for permission to contact the child's immunization provider(s).Later, the surveyed child's immunization providers are asked to submit the child's vaccination record.Provider information is used to determine the number of doses of each vaccine that a child received.The sample is weighted to represent the population of children 19-35 months old during a particular calendar year.Sampling weights account for multiple voice telephone lines in the household, telephone nonresponse, provider non-response, vital statistics natality data, and non-telephone households.Each area surveyed (urban area, State, or State excluding surveyed urban areas within that State) yields about 300 children with usable provider data per year.Further details of NIS methodology, including sample sizes, are described in Smith et al. (2001).

Principle of combining independent unbiased estimates
. Thus, an estimate with minimum variance, among linear combinations of Y and Z, can be obtained from: If sample variances replace population variances in this expression, the combination is no longer linear.However, if sample variances are obtained from a sufficiently large sample, the practical consequences of this are small.

Applying to NIS
For each State, our first estimate of coverage in a given year, Y, is obtained by constructing a model based on four years of NIS data, 1998 through 2001; these data appear in Centers for Disease Control (2002a and2002b).These years were chosen because, before 1998, State acceptance of hepatitis B vaccine, part of the 4:3:1:3:3 series, varied widely among States (Centers for Disease Control 2002b).When we performed this analysis, 2001 was the most recent data year available.
States vary in their immunization coverages.Some States achieve 4:3:1:3:3 coverages that approach or exceed 80% year after year.Other States only achieve 4:3:1:3:3 coverages of less than 70%.Within a State, estimated coverage is relatively stable over time.Thus, any reasonable model of immunization coverage by state and year must reflect State effects.Further, fitting a parameter for each State accounts for State-level demographics (e.g., some States are more urbanized than others, some have a greater incidence of poverty than others, etc.), without requiring decisions about which demographic factors to consider.
Coverages change over time.National-level policy changes can impact vaccine coverage, as can changes in demographics (e.g., the percent of persons of Hispanic ethnicity steadily increases in the US).Thus, we should consider a year effect in our model.
Consider the model: where Y ij is the published NIS estimate of 4:3:1:3:3 coverage in State i during year j, i = 1, . . ., 51, and j = 1, 4; β is the modeled coverage in the reference State in the reference year; γ i is the amount by which modeled coverage in State i differs from that of the reference State, i = 1, 51 (where γ i is zero for the reference State); δ j is the amount by which modeled coverage in year j differs from that of the reference year, j = 1, 4 (where δ j is zero for the reference year); and ij is the residual, assumed identically independently normally distributed with constant variance, i = 1, 51 and j = 1, 4. Without loss of generality, any State and any year can be references.This model has 54 free linear parameters: one intercept; one parameter for each of the 50 non-reference states; and one parameter for each of the three non-reference years.This model assumes that there are no year × State interactions (year effect is the same in all States).This assumption is examined later.Models with logistic and Box-Cox transforms applied to the dependent variable were also considered.The Box-Cox transform's improvement in fit over the simple untransformed model was tiny; the logistic transform produced no improvement in fit.Thus, we used the simple model described here.
The selected model can be described as: a given State's coverage in a given year is modeled as a reference coverage plus State effect plus a year effect.Under the assumption that the ij are independently normally distributed with constant variance (this assumption is examined later), this is a text-book linear model (dependent variable equals intercept plus additive effects of independent variables).Let β, γi , and δj be least-squares estimates of the model parameters; least squares estimates are both maximum likelihood and uniformly minimum variance unbiased estimates in linear models with constant variance, independent, normally distributed residuals.
In the selected model, coverage in the reference State and year is modeled simply as β (because State and year effects are zero in the reference State and year).To obtain a β independent of the observed coverage in the State and year of interest (respectively, State i and year j), we excluded Y ij from this four-year data set, resulting in {51 × 4} − 1 = 203 observations.Since the modified data set {Y k } k=1,51; =1,4;(k, ) =(i,j) contains some observations {Y ij } for which k = i and others for which = j (although not Y ij , the modified data set contains information about State and year effects for all States and all years.In particular, β, fitted with the State and year of interest as the reference State and year, can be estimated.This statistic fulfils the requirement for our first needed estimate, Y , an unbiased estimate of coverage in State i during year j.
Fifty-one models were fit, one with each State as reference and 2001 serving as the reference year.This yielded the needed estimates.
Our second estimate, Z, was the previously published NIS estimate of coverage (Centers for Disease Control, 2002a).Since Y was calculated omitting Z from the collection of year and State coverages, Y and Z were statistically independent.
The sample variance for Z and the sample estimate of V ar( β) were substituted for population variances in the weighting factor for combining estimates.The number of degrees of freedom for estimating variance is, in this application, large enough for sample variances to reasonably approximate their population counterparts.
Sample variances determine the weight given the two estimates to be combined; thus, the proposed estimate is not linear in the 204 observations.The stated confidence intervals account for neither the sampling uncertainty in variances, nor whatever bias the model's non-linearity might induce.However, given the number of degrees of freedom, one can reasonably expect that the bias and the amount of variance not accounted for are small.

Model checking
We can not prove an hypothesized model correct; we can only test assumptions.If assumptions were not rejected, we concluded that the data were consistent with the hypothesized model.The model considered here involved four assumptions: no { year × State} interactions, constant residual variance, normality of residuals, and independence of residuals.
We tested the assumption of { year × State} interactions using Tukey's test for interactions (Tukey, 1949).We tested the assumption of constant residual variance with a two-way analysis of variance (ANOVA), with squared residuals as the dependent variable and dummy variables for State and year as the independent variables.We tested normality of residuals with the correlation test, in which one calculates the correlation between order statistics and their expected value under normality (Filliben, 1975).We justified the independence assumption from process knowledge.

Model checking
Tukey's test for interactions yielded a p-value of 0.41, which indicated no evidence for interactions.While we can not prove that no interactions exist, Tukey's test showed that any interactions were unlikely to be strong.The twoway ANOVA, used to test for constant error variance, yielded a p-value of 0.39.While this finding did not prove that residual variance depended on neither State nor year, it showed that the assumption was reasonably consistent with the data.The correlation test, applied to residuals, yielded a p-value of 0.73.Hence, the residuals are at least approximately normally distributed.The {Y i } i=1,51; =1,4 Table 1: Comparison of customary (taken directly from the National Immunization Survey) and proposed estimates of state-level coverage of the 4:3:1:3:3 series (4 or more doses of diphtheria and tetanus toxoids and pertussis vaccine, three or more doses of poliovirus vaccine, one or more doses of measlescontaining vaccine, three or more doses of Haemophilus influenzae type b vaccine, and 3 or more doses of hepatitis B vaccine) vaccine series among children who were 19-35 months old in 2001.
were calculated using data collected in non-overlapping time intervals (years) in non-overlapping geographic areas (States), making independence a reasonable assumption.Therefore, the data are consistent with the assumed model.

More precise estimates of State-level 4:3:1:3:3 coverage for the 2001 NIS
Table 1 lists previously published NIS estimates for 2001, along with the estimates obtained as described.Standard errors for both estimates appear in Table 1.Standard errors of the new estimates (ignoring the small additional variance due to the uncertainty in the weights given the two estimates combined to yield the new estimates) were 11 -33 percent smaller.
Increasing sample size would also increase the precision of NIS estimates.If the hypothetical increased sample size did not change the design effect that resulted from the NIS' complex sample survey design, a sample that was: standard error of published estimator standard error of new estimator 2 times as large as the current sample would reduce standard error as much as the method described here.The sample size multiplier appears in Table 1; achieving equivalent precision via sample size increase would require, depending on state and ignoring the small increase in variance from substituting sample for population variances, 30 -120 percent more observations than were collected in 2001.
Enlarging the survey in this manner would have increased overall cost by more than 50 percent.

Discussion
While standard errors were reduced, estimates of State-level 4:3:1:3:3 coverage were close to the customary estimates (Figure 1).For every US State except Louisiana, the proposed estimate was within one standard error of the customary estimate.Previously instituted changes in NIS methodology have had similarly sized impacts on point estimates (Abt Associates, 2000 andAbt Associates, 2003).These reduced standard error estimates were neither consistently higher nor lower than the customary estimates.Retaining one more significant digit than reported, twenty-six of these estimates were greater than the customary estimates, and twenty-five were smaller; the mean difference was 0.1 percentage points.This is not surprising, since the customary estimates are relatively precise, even at a State level.Standard errors for the new estimates tended to be smaller in States in which estimates of immunization coverage were most stable over time.This is as one might expect.There is no reason to expect large differences among years within a fixed State, since immunization coverage reacts slowly to policy changes and other events that influence coverage.The customary estimate ignores states' past performance; the proposed estimates use this information.States' past performance is a better predictor in states for which immunization coverage is most stable over time.
The method used here has at least three drawbacks.First, this method might not be usable for future data sets.Although no evidence of violation of assumptions was found, future data might not support the assumptions; substantial violation of assumptions would invalidate the estimates.Second, these estimates are less suitable for making State-to-State and year-to-year comparisons than the customary estimates.These estimates tend to be more similar to each State's average coverage over time and each year's average coverage over States than the customary estimates.Thus, differences, over States and time, are blurred.
Finally, the customary estimates are statistically independent among years and States; the proposed estimates involve complex statistical dependencies, since each estimate depends, in a non-linear manner, on all states and years.The expression for the standard error of the difference of two estimates (e.g., same State in different years or two states in same year) is complicated, and is omitted for brevity.However, any two estimates derived using the proposed method are positively correlated.
Ignoring this correlation when calculating a confidence interval for the difference of two coverages results in an easily computed conservative confidence interval.Thus, although exact confidence intervals for differences involve messy calculations, conservative (up to the small amount of ignored variance) confidence intervals for the proposed estimates are easily constructed.

Conclusion
We can obtain more precise State-level immunization coverage estimates from NIS data than is currently done.However, for a given State, these more precise estimates dampen year-to-year differences and, for a given year, dampen Stateto-State variability.NIS estimates are commonly used to: compare coverage among States within a given year; compare coverage among years within a given State; and provide an estimate of coverage in a given year in a given State.The proposed method is probably less suitable for the first two purposes than the customary method; for this reason, we do not suggest that this method replace the customary method.However, in some cases, one is only interested in coverage in one State in one year (e.g., when point estimates are compared to cut points, such as the Healthy People 2010 goal of 90 percent immunization coverage (United States Department of Health and Human Services, 2000) or when determining if a state's coverage was low enough to not grant herd immunity, the level of immunization coverage required to prevent outbreaks of disease).In such cases, highly precise point estimates are essential -true and estimated immunization coverages can fall on opposite sides of the cut point, and increased precision substantially reduces the likelihood of this happening.For cases in which one is interested in one State and one year, we recommend that the methods proposed here be considered.
We recommend that this method be considered for use in other annual surveys.These methods can yield improved estimates of annual values of some characteristic in each of I regions over J years if: (1) the assumptions of (no { region × year } interactions) and (independent identically distributed normal residuals) are consistent with the data and (2) {I × J} − 1 (the number of observations after removal of one observation) is enough larger than {I + J − 1} (the number of free linear parameters in the model) for the estimated variance to be a reasonable approximation for the true variance.The number of States guarantees that (2) is satisfied for the NIS; this condition might or might not be satisfied for surveys with fewer geographic areas.
Finally, no claims for the optimality of the proposed estimator are made.It is quite possible that a formal empirical Bayes framework, or some other splitting of the data into subsets, could provide better estimates.However, such investigations are beyond this paper's scope.

Figure 1 :
Figure 1: Plot of proposed and customary estimates of state level coverage of the 4:3:1:3:3 series ( 4 or more doses of diphtheria and tetanus toxoids and pertussis vaccine, three or more doses of poliovirus vaccine, one or more doses of measlescontaining vaccine, three or more doses of Haemophilus influenzae type b vaccine, and 3 or more doses of hepatitis B vaccine) vaccine series among children who were 19-35 months old in 2001.On the dashed line, proposed and customary estimates are equal.
Consider two unbiased estimates of an arbitrary parameter, say Y and Z.Let the estimates have respective variances σ 2