Power Calculations for ZIP and ZINB Models

: We present power calculations for zero-inﬂated Poisson (ZIP) and zero-inﬂated negative-binomial (ZINB) models. We detail direct computations for a ZIP model based on a two-sample Wald test using the expected information matrix. We also demonstrate how Lyles, Lin, and Williamson’s method (2006) of power approximation for categorical and count outcomes can be extended to both zero-inﬂated models. This method can be used for power calculations based on the Wald test (via the observed information matrix) and the likelihood ratio test, and can accommodate both categorical and continuous covariates. All the power calculations can be conducted when covariates are used in the modeling of both the count data and the “excess zero” data, or in either part separately. We present simulations to detail the performance of the power calculations. Analysis of a malaria study is used for illustration.


Introduction
Often the outcome of interest in public health and medical studies is a count variable, which is usually assumed to follow the Poisson distribution and can be modeled accordingly.However, the count variable may contain excess zeroes above what is to be expected from the Poisson model.These excess zeroes may be due to 1) the presence of a subpopulation with only zero counts, 2) overdispersion, or 3) chance (Campbell, Machin, and D'Arcangues, 1991).A common approach for analyzing such data is the zero-inflated Poisson (ZIP) model (Mullahy, 1986;Lambert, 1992), which is an extension of Cohen's (1960) modified Poisson distribution.ZIP modeling has been used to analyze data on caries prevention (Bohning, Dietz, Schlattmann, Mendonca, and Kirchner, 1999), early growth failure in children (Cheung, 2002), and sudden infant death syndrome (Dalrymple, Hudson, and Ford, 2003), as well as many other diseases.ZIP modeling also can accommodate the extent of individual exposure (Lee, Wang, and Yau, 2001).A score test has been proposed for determining whether the number of zeros is too large for a Poisson distribution to fit the data well, indicating that a ZIP model may be a better choice (van den Broek, 1995).ZIP models also have been extended to the bivariate (Walhin, 2001) and multivariate settings (Li, Lu, Park, Brinkley, and Peterson, 1999).Hall (2000) and Yau and Lee (2001) have used random-effects approaches to extend the ZIP model to analyze longitudinal count data.
If count data are overdispersed such that the variance of the count variable is greater than the mean, then the Poisson assumption is violated.A negative binomial distribution may then be used for modeling purposes, as it uses an additional parameter in describing the variance of the count variable.If the data are still zero inflated, a zero-inflated negative binomial (ZINB) model may be fit.A score test for a ZIP regression model against a ZINB alternative (Ridout, Hinde, and Demetrio, 2001) has been proposed.Hall and Berenhaut (2002) provide an alternative to Ridout, Hinde, and Demetrio's (2001) score test by testing that the variance and covariance components associated with random effects in a ZIP model are equal to zero, implying that a ZINB model does not fit the data better than the ZIP alternative.
Studies without a sufficient sample size often will result in a failure to detect a significant effect when it exists; however, there is often a high cost associated with recruiting and evaluating large samples of subjects.This consideration makes sample size (power) calculations a crucial step in designing clinical research and public health studies, many of which are known to have a ZIP-or ZINB-distributed response.Sample size calculations are especially important for zero-inflated models because a larger sample size is required to detect a significant effect with these models than with the standard Poisson or negative-binomial models.To date, methods for power calculations for ZIP and ZINB models are scarce.
Here we present an example of a study with a zero-inflated count response.Weekly household mosquito counts were obtained for a longitudinal study of malaria risk factors in Western Kenya (Bloland et al., 1999).In this area, some houses are reputed to be "malaria houses" -where the risk of disease is great.A goal of the study was to see if basic sanitation and construction practices would affect malaria risk.A major outcome of the study was the weekly household mosquito count, which has an excess number of 0 values as some houses never have mosquitos.
In Section 2, we review the ZIP and ZINB models.We use Lyles, Lin, and Williamson's (2006) approach for computing power for generalized linear models to compute power for both zero-inflated models in Section 3. The power calculations can be based on the Wald or the likelihood ratio tests.We also present power computations for a two-group comparison with a ZIP model based on a Wald test where the variance-covariance matrix is calculated from the expected information matrix.The latter approach was not feasible for the ZINB model because of the difficulty in taking the expected value of the second derivative matrix.We present simulation studies to detail the methods' performance in Section 4 and illustrate their use with a public health example in Section 5. We conclude with a short discussion on the merits of these sample-size calculations.

Zero-Inflated Models
Let the response Y i denote a non-negative integer count for the ith observation, i = 1, • • • , N. The probability of an excess zero is denoted by π i , 0 ≤ π i ≤ 1.Following Cheung (2002), the random variable Y i follows a ZIP distribution if The mean and variance of the ZIP random variable are E(Y The mean and variance of the ZINB random variable are , where κ is an overdispersion parameter.The ZINB model reduces to the ZIP model as κ → 0 (that is, equation 2 and equation 1 are then equivalent).
For both models, we assume that π i will be modeled with logistic regression, logit(π i ) = X i β, where X i is a 1 × p row vector of covariates (including an intercept) and β is the corresponding p × 1 column vector of parameters.This is a natural choice for a binary response and results in an odds ratio interpretation for the parameters.We model the mean of the count part as log(λ i ) = Z i γ, where Z i is a 1×q row vector of covariates and γ is the corresponding q × 1 column vector of parameters.Let θ = [ β , γ ] .The covariate vectors X i and Z i may contain the same or differing predictors.Lyles, Lin, and Williamson (2006) proposed a simple and flexible method for estimating conditional power (that is, power given any prespecified fixed covariate design matrix) for binary, ordinal, and count outcomes.This method requires only standard software for fitting the desired generalized linear model.The model is fit to a representative expanded data set using easily calculated weights that represent response probabilities given the assumed values of the parameters.The resulting variance-covariance matrix is used for power calculations based on the Wald test.Power based on the likelihood ratio statistic can be approximated by refitting the model under the null hypothesis.Lyles, Lin and Williamson's (2006) approach proceeds as follows.Assume that the zero-inflated response variable Y takes J possible values 0, 1, • • • , J − 1 where J is chosen such that the P r(Y ≥ J|X i = x i , Z i = z i ) for any specific vectors x i and z i in their respective design matrices is negligible.For example, one could use the criterion that P r(Y < J|X i = x i , Z i = z i ) ≥ 0.999 for each distinct covariate set.The representative expanded data set proposed here represents a generalization of an "exemplary" data set proposed for other power calculations (O'Brien, 1986;O'Brien, 1988;and O'Brien and Muller, 1993).The expanded data set is comprised of one record for each of the J response values per combination of covariate values with an associated weight 1) and ( 2).We are interested in testing the following hypothesis:

Power Calculations for Zero-Inflated Models
where H is an (h × (p + q)) matrix of full rank and h 0 is a (h × 1) vector of constant terms.The Wald and likelihood ratio (LR) test statistics are given by and where θ is the unrestricted maximum likelihood estimate of θ, θ * is the corresponding maximum likelihood estimator under H 0 , and l(.) denotes the loglikelihood function.Under H 0 , both test statistics are asymptotically distributed as central chi-square random variables with h degrees of freedom.Under H A , and following Wald (1943) and Stroud (1973), T W is asymptotically distributed as non-central χ 2 h,λ , where the non-centrality parameter λ is (Hθ , where the non-centrality parameter is ν = −2[l * (θ) − l(θ)], l(θ) being the log-likelihood under the true θ, and l * (θ) is the loglikelihood when θ is evaluated under H 0 (Wald, 1943;Stuart, Ord, and Arnold, 1999).Let α represent the specified type-I error rate and χ 2 h:1−α the critical value from the central χ 2 h distribution.The power for testing H 0 with the Wald or likelihood test is where η is the type-II error rate, f (u; h, c) is the probability density function of the non-central χ 2 h, c distribution, and c is the appropriate degrees of freedom for either test.
The expanded data set is created as follows.For i = 1, • • • , N, specify x i and z i and calculate the weight w ij for specified θ (and also κ for the ZINB model).Choose J such that the probability that Y is greater than or equal to J is negligible.Each record in the data set will correspond to an observation with response j (j = 0, 1, • • • , J − 1).For example, assume that we are calculating the power for a sample of N = 100 and have chosen J = 16.Additionally, we assume that one covariate is used in the "excess zero" modeling, x i , and a different covariate is used in the count modeling, z i .The expanded data set will include N × J = 1600 records with the 16 records for the i th experimental unit given as follows: The third column represents the response and 15 j=0 w ij ≈ 1.0.If hypothetically x i and z i were specified to be 2 different binary covariates (50% in each category for each covariate), then the first 25 experimental units (400 records) would have x i = z i = 0, the next 25 x i = 0 and z i = 1, the following 25 x i = 1 and z i = 0, and the last 25 x i = z i = 1.The expanded data set can be shortened for categorical covariates by multiplying the appropriate weights by the number of observations with the corresponding covariate pattern.For the above example, the shortened data set would have 64 (16×4 distinct covariate patterns) records with the weights multiplied by 25.
When the predictor of interest is a continuous variable, we use a representative covariate data set along with the expanded data set.Let x i be a realization of a continuous covariate X whose cumulative distribution function F (x) is assumed known.Using the Blom adjustment (Blom, 1958), we create a representative covariate data set consisting of the expected quantiles: 25 .An example is found in the second set of simulations in Section 4 where such a data set is illustrated in a table.See Lyles, Lin, and Williamson (2006) for a more detailed explanation.

Power calculations for 2-sample test with ZIP data
Here we demonstrate that power for ZIP models can also be calculated using the traditional approach where the variance-covariance matrix of parameter estimates is calculated from the expected information matrix.In contrast, a similar approach for the ZINB model is not feasible due to the difficulty in taking the expected value of its second derivative matrix.Although this method is extendible to multiple groups, we focus here on a two-group comparison and present power calculations for ZIP models using the Wald test.For most sample size calculations conducted for medical and public health studies, the main hypothesis of interest is usually a comparison between two groups, such as treatment (new drug versus standard drug or placebo) or gender (male versus female).A binary covariate x i will be used in the modeling of both parts of the ZIP model.The probability π i of an excess zero will be modeled as logit(π i ) = β 0 + β 1 x i and the mean of the Poisson part as log(λ i ) = γ 0 + γ 1 x i .Therefore, the parameters of interest here are θ We present the score functions and the information matrix for this ZIP model in Appendix 1.Let θ be the solution to the score equations.Then, var( θ) = (1/N )I −1 θ (θ), which can be estimated by substituting θ for θ.Specify the hypothesis for θ as (3).For example, if the hypothesis of interest is that the two groups (x i = 0, 1) have the same probability of an excess zero, then H = [ 0 1 0 0 ]and h 0 = 0.If one is interested in testing for an overall difference between the two groups, then one may use H = 0 1 0 0 0 0 0 1 and h 0 =[ 0 0 ].One only has to specify the sample sizes (N 0 and N 1 ), the probabilities of an excess zero (π 0 and π 1 ), and the means of the Poisson counts (λ 0 and λ 1 ) for the two groups in order to conduct these power calculations, as θ is easily solved in terms of these parameters for the two-group comparison.The advantage here is that it is easier for a medical investigator to specify probabilities and mean counts then parameters on the logit or log scales.

Simulations
To assess the performance of our proposed power calculations for both approaches, we conducted several studies using simulated data.For the first set of simulations, we randomly generated ZIP-distributed data for 200 observations with a binary covariate (x = 0, 1 for 100 observations each) for each simulated data set.The parameters used to generate the data (β and γ) were specified in terms of π 0 , π 1 , λ 0 , and λ 1 .One thousand data sets were generated for various (but not exhaustive) combinations of π 0 , π 1 , λ 0 , and λ 1 .See Table 1.Power calculations were conducted using Lyles, Lin, and Williamson's (2006) approach (both the Wald and likelihood ratio tests) and the proposed two-sample power computation.The data sets were analyzed with the ZIP model that was used to generate the data (i.e., the binary covariate was incorporated into both the logit and Poisson parts of the model), so that the model was correctly specified.To calculate the empirical power for each set of simulations, we estimated β = [β 0 , β 1 ] and γ = [γ 0 , γ 1 ] to test H 0 : β 1 = γ 1 = 0.The resulting Wald test statistic was compared to a chi-square distribution with two degrees of freedom.
Then the empirical power based on the Wald test was calculated as the number of data sets resulting in the rejection of H 0 : β 1 = γ 1 = 0 (α = 0.05) divided by 1000.
Table 1 summarizes the simulation results.Of the 81 scenarios, the calculated power for the two-sample Wald test based on the expected information matrix was less than the empirical power 40 times, greater than the empirical power 39 times, and equal to the empirical power twice.The maximum difference between the calculated power and the empirical power was 0.090 (π 0 = 0.15, π 1 = 0.25, λ 0 = 10.0, λ 1 = 11.0), but there were only six scenarios (out of 81) for which the calculated power was different from the empirical power by more than 0.05.The results were excellent for moderate and large values of π.The largest discrepancies between the calculated power using the Wald test with the expected information matrix and the empirical power were for small values of π.
Power calculations using Lyles, Lin, and Williamson's (2006) method with the Wald test (observed information matrix) were not shown because the difference between this approach and the power calculation using the Wald test with the variance-covariance matrix calculated from the expected information matrix never exceeded 0.2%.Power calculations using Lyles, Lin, and Williamson's method with the likelihood ratio test were very similar to those based on the Wald test.In the instances where there was a slight difference, the power calculations based on the likelihood ratio test tended to be higher than those based on the Wald test.
For a second set of simulations, we randomly generated ZINB-distributed data (Krishnamoorthy, 2001) with samples of size 100 and 500.The probability π i of an excess zero was modeled as logit(π i ) = β 0 + β 1 x i and the mean of the count part as log(λ i ) = γ 0 + γ 1 x i .The covariate x i was distributed as a N (0, 1) random variable or a U (−0.5, 0.5) random variable.The parameters used to generate the data were β 0 = −0.4055(corresponding to 40% excess zeros when x i = 0), and γ 0 = 1.6094 (corresponding to a mean count of 5 when x i = 0).The parameters β 1 and γ 1 varied according to the sample size and distribution of x i .In general, we chose β 1 and γ 1 such that the resulting power is within practical range.Two thousand data sets were generated for each combination of the parameters.We found that taking J = 31 for both the N (0, 1) and U (−0.5, 0.5) distributions ensured that the w ij 's summed very close to one for each x i .We used the criterion that J−1 j=0 w ij ≥ 0.999 ∀ x i .For specified θ, the criterion can be checked directly for each unique x i .Alternatively, it is often easy to identify which x i assigns highest probability to large Y , in which case one could choose J to ensure that the criterion holds for that x i .See Table 2 for a partial listing of the expanded data set of size 100×31 = 3100 for the power calculation when N = 100 and x i is distributed as a N (0, 1) random variable using the Blom adjustment.
See Table 3 for the simulation results.Power calculations were conducted using Lyles, Lin, and Williamson's (2006) approach (both the Wald and likelihood ratio tests) for the tests H 0 : β 1 = γ 1 = 0, H 0 : β 1 = 0 and H 0 : γ 1 = 0.The empirical power was calculated as the number of data sets resulting in the rejection of the appropriate null hypothesis (α = 0.05) divided by 2000, for the respective Wald or likelihood ratio test.Both sets of simulations were conducted via SAS IML 1 .Overall, the results were satisfactory.The largest difference be-tween calculated power and empirical power for data sets of size 100 was 0.052 for the Wald test (x i ∼ N (0, 1) and testing H 0 : β 1 = 0).For data sets of size 100 and the likelihood ratio test, there were no instances where the calculated power was different than the empirical power by more than 0.03.There were no instances where the calculated power was different than the empirical power by more than 0.04 for either test and data sets of size 500.For sample size of 100, the likelihood ratio test had greater power than the Wald test for H 0 : β 1 = 0, and the opposite was true for H 0 : γ 1 = 0.However, these differences were negligible when the sample size increased to 500.

Western Kenya Malaria Cohort Study
Suppose that we want to calculate the sample size for a new study based on the parameter values estimated from the data collected in a similar previous study.One of the objectives of the Western Kenya Malaria Cohort study was to determine if basic household construction and sanitation practices were related to malaria risk (Bloland et al., 1999).All participating households were visited and information on construction, animal ownership, and sanitation practices (such as the presence of a separate pit latrine structure) were recorded.Since only two of the many species of mosquitoes in this area transmit malaria, each participating household then had weekly mosquito trapping sessions, using a standard CDC-developed mosquito trap during the time that these mosquitoes are actively feeding (generally 11:00 PM to 2:00 AM).Trapped mosquitoes were separated by species and malaria-transmitting species were counted to give the dependent measure used in this analysis.Before the initiation of the study, local residents had commented that certain homes were "malaria houses" while other nearby homes "had no mosquitoes" for reasons that were not well known.A simple change of sanitation or home construction procedure that resulted in a decrease to the inhabitants' likelihood of getting malaria would be of tremendous benefit.Virtually all infants in this area get malaria multiple times before their first birthday.Of particular interest was whether or not a separate pit latrine structure, which has clear sanitation benefits, was associated with any difference in the numbers of mosquitoes that transmit malaria.
The mosquito counts are from the first weekly visit to each house in September, 1994 and are presented in Table 4.In the sample of 492 houses, 286 (58%) had no mosquitoes, much greater than the expected count of 111 predicted from a standard Poisson distribution.Here we are concerned with whether mosquito counts are the same for houses with a separate pit latrine as those without one.We fit a ZIP model with the covariate x = 0, 1 corresponding to whether or not the house had a separate pit latrine.The resulting parameter estimates were β = [0.279,-0.020] and γ = [1.136,0.171] , which yielded π = [0.569,0.564] and λ = [3.113, 3.693] .The joint test of H 0 :[β 1 , γ 1 ] = [0, 0] was nonsignificant (χ 2 2 = 4.63, p-value= 0.099) as was the marginal test of H 0 : β 1 = 0 (χ 2 1 = 0.012, p-value= 0.91).However, the marginal test of H 0 : γ 1 = 0 was significant (χ 2 1 = 4.53, p-value= 0.033).The results indicate that having a separate pit latrine structure is not likely to increase mosquito counts to any meaningful extent.
Using these parameter estimates and based on the Wald test with expected information matrix, a new study would require sample sizes of 505, 165,000 and 419 in each group to achieve 80% power with a type-I error rate of 0.05 for the following three tests H 0 :[β 1 , γ 1 ] = [0, 0] , H 0 : β 1: = 0, and H 0 : γ 1 = 0, respectively.For the Wald test with Lyles, Lin, and Williamson's method with, the sample sizes calculated were 499, 163,400 and 414, respectively.The sample sizes calculated based on the likelihood ratio test were 496, 163,350 and 411.If we assumed that the mosquito count followed the usual Poisson distribution, a sample of size 323 would be required for 80% power.This example underlines the potential differences, which can be substantial, in sample size calculations based on the traditional Poisson model and the ZIP model.When the data clearly have excess zeros, it is important to use sample size calculations based on a zero-inflated model to avoid an underpowered study.

Discussion
Here we present power calculations for ZIP and ZINB models based on the Wald and likelihood ratio tests.If interested in a sample-size calculation instead of a power calculation, one would first specify the power and the percentage of the sample in each group.Then, by trial and error, one could calculate N 0 and N 1 using the above procedure.Example SAS programs and macros, and the malaria data set from Section 5, are available 2 .and the information matrix is comprised of and

Table 1 (
continued): Power (%) to test H 0 : [β 1 , γ 1 ] = 0 for a two-sample ZIP model at α = 0.05 (two sided) with N 0 = N 1 = 100.The calculated power using the Wald test with expected information matrix is the main entry (bold).The middle entry is the calculated power using Lyles et al.'s method based on the likelihood ratio test (italicized).The empirical power based on the Wald test from the 1000 simulations is presented below.

Table 2 :
Expanded data set of 3100 lines (N × J) with N = 100 and J = 31 for ZINB simulation with x i ∼ N (0, 1).The excess zero part of the model is logit(π i ) = −0.406+ 0.65x i and the count part is log(λ i ) = 1.609 + 0.25x i .

Table 4 :
Mosquito count from 492 houses in Western Kenya