The Poisson Inverse Gaussian Regression Model in the Analysis of Clustered Counts Data

We explore the possibility of modeling clustered count data using the Poisson Inverse Gaussian distribution. We develop a regression model, which relates the number of mastitis cases in a sample of dairy farms in Ontario, Canada, to various farm level covariates, to illustrate the methodology. Residual plots are constructed to explore the quality of the fit. We compare the results with a negative binomial regression model using maximum likelihood estimation, and to the generalized linear mixed regression model fitted in SAS.


Introduction
Mastitis may be defined as inflammation of the mammary gland of dairy cows, and may be categorized into subclinical and clinical mastitis.Clinical mastitis (CM) shows alterations of the milk in the form of flakes or pus, and is associated with a considerable increase in somatic cell counts (SCC).On the other hand, sub-clinical mastitis is characterized by a slight increase in the SCC and cannot be seen by eye.Identification of udders infected with sub-clinical mastitis may be performed with bacteriology and California Mastitis Test (CMT).The CMT is a simple rapid means for detecting mammary gland infection and has had wide acceptance and use by veterinarians and dairymen in routine mastitis prevention and control programs.
Two main groups of mastitis causing pathogens can be distinguished, environmental and contagious pathogens.Environmental pathogens are always present in the surroundings of the cow.Infection of the udder by these environmental pathogens is accompanied with a sudden increase in SCC.Contagious pathogens are transmitted during the milking process, and are usually spread from infected to non-infected susceptible animals during that process.The most common organism involved in dairy mammary disease is Staphylococcus aureus, commonly found on the skin of human hands and the udder skin of cows.Cows with contagious mastitis are usually culled to prevent the spread of the disease to other healthy cows in the herd.Due to these losses, the disease is of considerable economic importance.
Farm management practices play an important role in controlling and reducing the incidence of the disease.Hygiene is identified as an important factor.The use of organic (straw and sawdust) versus inorganic (sand) is associated with an increase in intra-mammary infections.Tail docking is practiced to prevent the tail from hitting the udder and spread of the disease causing pathogen.Milking technique is also identified as a route to spread mastitis from cow to cow.Some farmers milk the teats by hand (post milking) after the machine milking to reduce residual milk in the quarters, and such practice might serve as an ideal growth flora for bacteria.
We intend to identify the extent of the relationship between the above risk factors and the incidence of sub-clinical mastitis.The data are available from 57 Ontario dairy farms.These herds were willing to participate in the sentinel dairy herd project, which was instigated to identify the reasons behind the increase in SCC over time.From July 1997 to February 1999, each farm was visited three to six times.At each visit, composite samples were taken from each milking cow in the herd, and bacteriology was performed at the Animal Health Laboratory, Ontario Veterinary College, at the University of Guelph, to identify the pathogen causing sub-clinical mastitis.The primary response variable was the number of cows tested positive for the disease.Table 1 shows a summary of the data by visit.Herd level covariates were: type of bedding, tail docking, and post milking.These factors were coded as 0 when the producer adopted the factor, otherwise the factor was coded as 1.The fundamental feature of the data is its hierarchical or multilevel (clustered) structure, which frequently arises in many areas of investigations such as veterinary epidemiology, agriculture, and community trials.The highest hierarchy was the herd, and the lowest hierarchy was the visit.At each visit, the number of cows cultured, and the number of cows tested positive (the response variable) were recorded.
There are two general classes of models for analyzing clustered data: clusterspecific (CS) and marginal or population averaged (PA).The PA approach specifies the marginal of the response of the jth unit in the ith cluster y ij , and the Generalized Estimating Equation is the method of choice (Liang and Zeger 1986;Zeger et al. 1988).The most common CS approaches are Generalized Linear Mixed Models (GLMMS), which extend the class of generalized linear models by including random effects in the linear predictor (Breslow and Clayton 1993;  1991;McGilchrist 1994;Goldstein 1991 andLongford 1994).The above CS approach fits models to data using either the Taylor approximation of the log likelihood function (Goldstein 1991 andLongford 1994) or the penalized quasilikelihood (PQL) (Breslow and Clayton's 1993).Wolfinger and O'Connell (1993) describe an algorithm for computing PQL estimates and implement it with a SAS macro called GLIMMIX.Neuhaus and Segal (1997) showed that these approaches may yield highly biased estimates of covariate effects and the variance components of the random effects distribution.Recently, Nehaus (2001) recommended that care must be taken when using approximate methods in the analysis of multilevel data.To avoid most of the problems outlined by Neuhaus (2001), we use an exact modeling procedure.We consider a random effects Poisson model where the mixing distribution is the inverse-Gaussian.Details of the derivations, the method for incorporation of the covariates, and the procedures used to obtain estimates of the parameters are given in Section 2. In Section 3 we fit the model to the mastitis data to illustrate the potential differences between the Poisson inverse-Gaussian Model and other models such as the negative binomial and the generalized linear mixed model.A general discussion is presented in Section 4.

Over-dispersion test
Assume k clusters the ith of which has n i observations.Define Y ij to be the number of events of interest (cows with clinical mastitis) in the ith farm at visit j.Because of the hierarchical structure of the data, one should expect that the counts Y ij within a particular farm to be correlated.The result of such within-farm correlation is that the variance of counts would be much larger than the mean, a phenomenon known "over-dispersion."This over-dispersion may be modeled as follows (Cox, 1983): first we assume that conditional on the ith cluster (farm) effect, denoted by ν i we have Second, we assume that the random cluster effect ν i has E(ν i ) = 1, and var regardless of distribution assigned to ν i .Here ν i is a random variable that summarizes the effect of our inability to model all pertinent explanatory factors and other sources of variation.The above set-up has been discussed by many authors (Breslow 1984, Thall 1988, Stukel 1993, Burnett et al. 1992, and Zeger et al. 1988).Therefore λ > 0 indicates the presence of over-dispersion, λ < 0 under-dispersion, and λ = 0 equi-dispersion.Under equi-dispersion, the Poisson model is the most commonly used model to analyze count data.It seems natural then that our attempting to construct a plausible parametric model should be preceded by a test for over-dispersion.
Following Cameron and Trivedi (1998, p.78), a test of over-dispersion can empirically be carried out by using the method of least squares to fit the regression equation: where, ȳi• = n i j=1 y ij /n i , and ij is an error term.The reported t-statistic for λ is asymptotically normal under the null hypothesis H 0 : λ = 0 versus H 1 : λ > 0. For the mastitis data, it was found that λ = 0.1169 and the standard error SE( λ) = 0.0232, from which t = 0.1169/0.0232= 5.03 (p-value < 0.001).Therefore, based on the evidence in the data, we conclude that over-dispersion should not be ignored.
We therefore need to develop models for count data, which properly account for over-dispersion.In the next section, we develop a PIGM as a competitor to the negative binomial model (NBM).
There are several reasons for comparing the PIGM to a regression model assuming a negative binomial distribution for the counts.First, although the NBM is commonly used to model over-dispersed count data, most data analysts are not familiar with the PIGM, as it has not been fitted to clustered count data.Secondly; there is not one single model that provides a reasonable fit to a given data set, particularly in the absence of knowledge of the biological mechanism generating the responses.Therefore, it is not uncommon for data analysts to attempt fitting several models and investigate the advantages and the disadvantages of each.For example, Stukel (1993) analyzed clinical trial data of non-melanoma skin cancer and familial polyposis, using several models for overdispersed longitudinal count data.As well, Have and Hartzel (1995), and more recently Molenberghs and Geys (2001) discussed several modeling strategies to analyze clustered binary data in developmental toxicity studies.

Model specifications
Assume that conditional on the ith cluster (farm) the within cluster observations are conditionally independent so that their joint density is Following the generalized linear model approach, we relate the parameters µ ij to the covariates x i ∈ R p through the log-link function so that where β = (β 0 , β 1 , β 2 , β 3 , β 4 ) T is the vector of parameters, num ij is the number of animals cultured from ith farm at visit j, and the rest of the covariates are as explained in Section 1.
The specification of the joint distribution of T is completed by assigning a distribution for ν i .If we assign gamma distribution for ν i , the resulting marginal density of Y i is the negative binomial (see, Collings andMargolin 1985 andMcCullagh andNelder 1989).Alternatively, we assume that ν i has an inverse-Gaussian distribution, parameterized to have E(ν i ) = 1 and var(ν i ) = λ > 0 so that the pdf of ν i is given by The evaluation of the (marginalized) likelihood for the ith farm involves integrating out ν i and is equal to where where From Gardshteyn and Ryzhik (1980, p. 970) where K j (•) is the modified Bessel function of the second kind.Therefore L i can be written as where where c(λ) = (2/πλ) 1/2 e 1/λ .The Bessel functions has the following important properties: where from which it follows that Moreover, The importance of equations (2.5) and (2.6) will be apparent when we derive the estimating equations of the model parameters in Appendix I. We will show that the partial derivatives of the log-likelihood function with respect to the model parameters do not depend on the Bessel functions, rather they depend on M (y) which can be recursively computed.
To obtain the maximum likelihood estimators of the parameters we construct the total (marginalized) likelihood L, which is the product of k terms of equation (2.3).
The log-likelihood function = log(L) is given by: In Appendix I, we provide the estimating equations for β and λ.

Estimating unknown parameters
As in the nonlinear mixed models, the maximum likelihood equations cannot be explicitly solved.It is natural to use the Newton-Raphson algorithm or any of its variants to estimate the parameter vector θ = (β 0 , β 1 , β 2 , β 3 , β 4 , λ) T .However, like other iterative techniques this method depends on the validity of a quadratic approximation to the surface of , and for many nonlinear mixed-effects models, straightforward approximations do not provide consistent estimates regardless of the number of clusters, or the number of observations per cluster (see, Demidenko 1997).Therefore, we needed an optimization algorithm that could efficiently handle this complicated log-likelihood function.After some exploration, we have chosen to use "fminunc" (unconstrained function minimization) routine of the Optimization Toolbox of the MATLAB T M Software (The MathWorks, Inc., Natick, MA) to locate the minimum of the negative log-likelihood function.The routine implements a subspace trust region method which is based on the interior-reflective Newton method described in Coleman andLi (1994, 1996).Each iteration in this "large-scale optimization" algorithm involves the approximate solution of a large linear system using the method of reconditioned conjugate gradients.
The routine requires the user to provide an initial estimate for the parameter vector θ.Since PIGM reduces to Poisson regression model when λ = 0, we obtained the initial estimate by fitting a Poisson regression model to mastitis data using SAS (Genmod procedure).The result of the optimization, i.e., loglikelihood maximization, was: θ = (3.212,0.130, −0.034, −0.816, −0.798, 0.392) T To ensure that the global maximum was located, we perturbed the initial parameter estimates and verified that this solution was indeed the optimum.The optimization routine also produced the Hessian matrix from which we calculated Fisher's information matrix, I(θ).The asymptotic standard errors of the estimated parameters are the square roots of the elements on the diagonal of I(θ).

Analysis of Mastitis Data
Clinical mastitis (CM) is one of the endemic diseases and conditions of dairy cattle in many countries.Such disease causes significant losses to the dairy industry both in terms of the reduction in output levels and wastage of resources incurred, in addition to the costs of disease prevention and treatment.Various studies, although followed different methods of assessments, have estimated direct costs associated with CM, but not the wider effects of disease, such as impact on human health, animal welfare and livestock markets.In the UK, Bennett et al. (1992) demonstrated that mastitis treatment costs are substantial, owing to the large number of cows requiring treatment and the relatively high costs of treatment (which includes milk withdrawal following antibiotic use).Therefore, effective control and prevention are functions of our ability to identify potential farm level management practices that affect the distribution of the disease.
In Table 2, we provide a summary fitting of the data using the PIGM, NBM and the generalized linear mixed model (GLMM) with log-link (Schall 1991, McGilchrist 1994).The estimating equations of the two models are given in Appendices I & II.The MATLAB codes for fitting the PIGM and the NBM are available from the authors.The GLIMMIX is a SAS macro (1996) that fits the GLMM.It uses the pseudo likelihood method for estimating parameters of the GLMM as proposed by Wolfinger and O'Connell (1993).In summary, the pseudo likelihood approach fits a linearized pseudo variable (i.e., a transformation of Y ij onto to the linear scale using the weighted normal mixed model.)The name pseudo likelihood was used because the likelihood function maximized at each iteration, is that of the linearized (pseudo) variable and not that of the original data (Breslow and Clayton, 1993).
For the GLIMMIX, the residual log-likelihood, rather than the log-likelihood, is provided by SAS, and is not included in the summary.Note that, the only non-categorical covariate included in the model was the "number cultured."This covariate z ij was standardized so that it is modeled as is the number of animals cultured in the ith farm at the jth visit, zi is the average number of animals cultured in the ith farm, and s i is the standard deviation.As can be seen, the PIGM and NBM models are equivalent as they comprise the same significant covariates and the values of their maximized log-likelihood functions are almost the same.However, there is a difference in the sign of the "bedding" covariate.This covariate has been coded in such a way that the code "1" is given to farms that used sands, and code "0" for farms that used organic material.Therefore a positive sign for the regression coefficient of bedding means that, for a randomly selected farm, the use of sand should increase the mean number of diseased animals (on the log-scale) on that farm, as was shown by the NBM and the GLIMMIX.This is in contrast to what the PIGM fit gives, which shows that the use of sand as a bedding material, should decrease the number mastitis cases.This is in support of what is widely known among veterinarians.
Although we found statistically significant relationship between CM and management practices, those management practices are not necessarily causally related to the CM on the farm.Clearly "postmilk" which is directly related to udder health, should be either stopped or preceded by proper preparation (e.g., through hand washing and use of disinfectants).Moreover, the risk of transmitting the pathogens may be substantially reduced if "tail-docking" is practiced.Although previous studies (Barkema et al. 1999) demonstrated strong association between types of bedding and the distribution of CM, such an association was absent in our data, as can be seen from Table 2.The justification is that in our data, only one farm did not use organic bedding, and that particular farm was visited three times only.Therefore, this lack of association may be attributed to insufficient data.
To test the adequacy of the models, we provide two sets of residual plots.As was advocated by Davison and Gigli (1989) we plot Pearson's residuals r ij , where μij = exp(x T ij β), against the normal quantiles.Deviation of the plot of r ij against the normal quantiles from a straight line, indicates that the residuals are not normally distributed.In Figure 1, we notice, for both models, a slight deviation from a straight line at the upper tails of the distribution as would be expected for count data.
We also provided plots of the "deviance residuals" for over-dispersed count data, defined as  (McCullagh andNelder 1989, p.39 andCameron andTrivedi 1998, p.142). Figure 2 shows the plots of d ij versus the observed counts y ij for both models.As can be seen, both models provide adequate fit to the data.

Discussion
During the last fifteen years, hierarchical statistical modeling of epidemiological data has been an area of intense research.Depending on the objectives of the investigation, an appropriate model is selected.If interest is focused on the efficient estimation of the parameters of the regression function on the expense of the other parameters that characterize the components of dispersion, then the Generalized Estimation Equation approach of Liang and Zeger (1986) would be an appropriate tool, and the regression coefficients would have a population average interpretation.On the other hand, if in addition to the regression coefficients, the dispersion parameter is of interest, the cluster specific model is more appropriate.We would like to emphasize that had the main objectives of the study been the estimation of the disease prevalence, a binomial regression with over-dispersion parameter would be appropriate.However the main objective of the present project was the identification and estimation of the effects of the herd level covariates that influence the variability of the disease.Due to the fact that the number of cultured cows was large (excluding the sixth visit), and that the percentage of positive cows did not change significantly over time, we assumed, conditional on the herd, that the number of mastitis cases can be approximately modeled by the Poisson distribution, whose mean is a function of the herd level covariates, and an unobservable random effect.
The inverse-Gaussian distribution may be an attractive choice as a mixing distribution for models of count data that exhibit over dispersion caused by the hierarchical structure.When suitably parameterized it can be used as an alternative to the widely used gamma distribution.Although the use of the negative binomial, which results from mixing the Poisson with the gamma is long standing, the Poisson inverse-Gaussian can be used to analyze data with this structure  (see; Wilmot 1987).Moreover, we were able to fit the PIGM quite easily.
The GLIMMIX, although uses a general approximation to the integral in equation (2.1), to our surprise produced somewhat similar results to those of the PIGM and NBM.Evidence from the literature suggested that the GLIMMIX fitting procedure of Schall, McGilchrist and Breslow and Clayton are biased, and a full mixture model should be employed to analyze clustered data.We would like to point out that the difference in the sign of one of the coefficients (bedding) is not unexpected and has been reported in similar context by Lesaffre and Spiessens (2001) and Neuhaus and Segal (1997).Although the two models give close fits, common knowledge regarding the effect of the "bedding" factor, supports the selection of the PIGM over the NBM. where x ijr β r ), s i = y i• − 1/2 and z i = (1 + 2λµ i• ) 1/2 /λ.Then, The recurrence relations where z = (1 + 2λµ) 1/2 /λ.(Note that subscripts have been dropped for convenience.)Therefore, Figure 1.Upper panel: Quantiles of the Pearson's residuals of the PIGM against the normal quantiles.Lower panel: Quantiles of the Pearson's residuals of the NBM against the normal quantiles.

Figure 2 .
Figure 2. Upper panel: Deviance residuals of the PIGM versus the observed counts y ij .Lower panel: Deviance residuals of the NBM versus the observed counts y ij .

Table 1 :
Summary Mastitis Data collected from 57 farms at six visits to each farm.

Table 2 :
Summary of the PIGM, NBM and GLIMMIX (SAS) analysis of the mastitis data.