A Mixed Effects Model for Overdispersed Zero Inflated Poisson Data with an Application in Animal Breeding

Response variables that are scored as counts, for example, number of mastitis cases in dairy cattle, often arise in quantitative genetic analysis. When the number of zeros exceeds the amount expected such as under the Poisson density, the zero-inflated Poisson (ZIP) model is more appropriate. In using the ZIP model in animal breeding studies, it is necessary to accommodate genetic and environmental covariances. For that, this study proposes to model the mixture and Poisson parameters hierarchically, each as a function of two random effects, representing the genetic and environmental sources of variability, respectively. The genetic random effects are allowed to be correlated, leading to a correlation within and between clusters. The environmental effects are introduced by independent residual terms, accounting for overdispersion above that caused by extra-zeros. In addition, an inter correlation structure between random genetic effects affecting mixture and Poisson parameters is used to infer pleiotropy, an expression of the extent to which these parameters are influenced by common genes. The methods described here are illustrated with data on number of mastitis cases from Norwegian Red cows. Bayesian analysis yields posterior distributions useful for studying environmental and genetic variability, as well as genetic correlation.


Introduction
Some traits in animal breeding are scored as counts, for example, litter size in pigs, embryo yield produced after superovulation, and number of mastitis cases in dairy cattle.When the number of zeros exceeds the amount expected under a certain density, as for example, the Poisson density, a possibility for modeling the extra-zeros has been proposed by Lambert (1992).In using the ZIP model in animal breeding studies, it is necessary to accommodate genetic and environmental covariances.For that, this study proposes to model the mixture and Poisson parameters hierarchically, each as a function of two random effects, representing the genetic and environmental sources of variability, respectively.
Models for zero-inflated count data with random effects accounting for intragroup correlation and dependence of clustered observations either in the logistic regression model of the mixture parameter and/or the in log-linear model of the Poisson parameter have been discussed by Welsh et al. (1996), Hall et al. (2000), Wang et al. (2002), Kuhnert et al. (2005), and Min and Agresti (2005).In this article, modeling genetic effects of zero-inflated count data presents special challenges; in addition of the problem of extra zeros, the correlation within and between clusters, e.g., half-sibs families, needs to be taken into account.We explore a model where the correlated genetic random effects not only accommodate within but between cluster correlation.The ZIP model accomodates overdispersion (variance > mean) caused by extra zeros, however independent environmental effects (residual effects) accommodate overdispersion above that.Both genetic and environmental random effects are independent, and considered at the level of the mixture and Poisson parameters.As pleiotropy is the main genetic cause of correlation (Falconer, 1989), a correlation structure is also introduced between the genetic effects on these parameters, similar to the correlation between direct and maternal effects (Willham, 1963).The degree of correlation from pleiotropy express the extent to which the mixture and Poisson parameters are influenced by common genes.In a ZIP model, the mixture parameter p is interpreted as the "perfect state" probability (Rodrigues-Motta et al., 2007), so a negative correlation between p and the Poisson parameter would be expected, meaning that genes in favor of the perfect state act against to the imperfect Poisson state.In the same way, a correlation close to zero indicates that there are no common genes affecting those parameters simultaneously.
A hierarchical ZIP model with correlated parameters is developed to an application to number of mastitis cases in Norwegian Red cows in first lactation.First, a hierarchical Bayes structure is presented.Second, estimation of the parameters are suggested via a Gibbs-sampling approach.Third, a model checking was conducted via an analysis of residuals and predictive ability.

The Mixed Effects ZIP Model with Independent Residual Effects
Let y = (y 1 , y 2 , ..., y n ) be a vector containing the number of cases of an event per animal (e.g., number of mastitis cases in dairy cows).It is assumed that, given λ i and p i , the distribution of observation y i on animal i follows a zeroinflated Poisson distribution, and that all such observations are conditionally independent.The density is then k>0) , (2.1) Here, p i is the probability that a 0 is from the "perfect" state, and λ i is the parameter of the "imperfect" state (Poisson distribution).This model tends to a conditional Poisson model as p i → 0. Let λ * = (log(λ 1 ), . . ., log(λ n )) and p * = (logit(p 1 ), . . ., logit(p n )) be vectors of unobservable parameters.Further, suppose that λ * and p * satisfy the linear mixed model where β λ and β p are vectors of fixed effects; u 1 and u 2 are vectors of random effects, and X λ , X p , Z λ and Z p are known incidence matrices (matrices of 0 and 1 s).Factors included in β λ may or may not be the same as those in β p .In (2.2), ε p and ε λ are vectors of residuals which are assumed to follow a multivariate normal distribution on R 2n with mean zero and covariance matrix , I n is an identity matrix of order n, and σ 2 ε λ and σ 2 εp are residual variances reflecting overdispersion over and above that caused by extra zeros.The distribution of ε λ and ε p induces the distribution where λ * and p * are as in (2.2).
Here, u = (u 1 , u 2 ) ∼ N (0, Σ u ).In this study, we partition the Z incidence matrices as Z λ = ( Z 1,λ 0 Z 2,λ 0 ) and Z p = ( 0 Z 1,p 0 Z 2,p ), respectively; these matrices relate the random effects u 1 = (h λ , h p ) and u 2 = (a λ , a p ) to λ * and p * .The vectors h λ and h p , each of dimension n h , are non-genetic effects (e.g., herd effects); the vectors a λ and a p , each of dimension n a , are additive genetic effects.Additionally, where .
Further, A is a known additive genetic relationship matrix; σ 2 h λ and σ 2 hp are variances of non-genetic (say herd) effects affecting λ * and p * , respectively; σ 2 a λ and σ 2 ap are the additive genetic variances, and σ a λ,p is the additive genetic covariance.The genetic correlation is ρ = σ a λ,p / √ σ 2 a λ σ 2 ap .

Parameter estimation
The likelihood, priors and the joint posterior distribution Let τ = {λ * , p * , β λ , β p , h λ , h p , a λ , a p , H, G, Σ} be a set of unknown quantities.The conditional (given τ ) likelihood for the ZIP regression model is To achieve a reasonably vague prior, an uniform distribution is assigned to each element of β's, with large absolute values of the bounds β min,λ , β max,λ , β min,p , and β max,p .Independent scale inverse chi-square distributions with degrees of freedom (scale parameter) ν ε (δ ε ) are assigned to σ 2 ε λ and σ 2 εp , respectively, and independent inverse chi-square distributions with degrees of freedom (scale parameter) ν h (δ h ) are assigned to σ 2 h λ and σ 2 hp , respectively.An inverse Wishart distribution with parameter matrix (degrees of freedom) V G (ν G ) is assigned to G. Replacing (2.4) by the models in (2.2), the joint posterior density can be written as (2.5) where x i;λ , z i;1,λ , z i;2,λ , x i;p , z i;1,p and z i;2,p and are the i th rows of matrices X λ , Z 1,λ , Z 2,λ , X p , Z 1,p and Z 2,p , respectively, and The joint posterior distribution with density as in (2.5) is not recognizable, and can be written only up to a proportionality constant.Also, marginal posterior distributions cannot be obtained analytically.Therefore, a Metropolis-Gibbs sampling scheme was tailored to sample from the marginal posterior distributions.
Sampling the parameters λ * i and p * i From (2.5), the conditional posterior density of the vector of log-Poisson pa- , given all other parameters ("ELSE").The λ * i 's are assumed independent, a priori, and their prior densities are normal with parameters according to (2.3).Hence, .The p * i 's are assumed independent, a priori, and their prior densities are normal according to (2.3).The fully conditional distribution of these parameters are independent, i.e., The densities in (2.6) and (2.7) do not have any obviously recognizable form.A Metropolis-Hastings algorithm was therefore tailored for drawing the λ * i = log(λ i ) and p i = e p * i /(1 + e p * i ), one at a time.
Sampling location effects affecting λ * and p * From (2.5), the fully conditional posterior distribution of the β, h and a location parameters is p(β, h, a|ELSE, y) ∝ p(λ * , p * |β, h, a)p(β)p(h|H)p(a|G).This is the conditional posterior density of location parameters in a bivariate Gaussian model with known dispersion structure, in which λ * and p * play the role of "traits", and the only source of correlation is through genetic effects.The derivation of the fully posterior distribution of the location parameters is given in Sorensen and Gianola (2002).Let θ = (β , h , a ) and ] .
Note that this implies sorting of individuals within λ * and p * , respectively.Then, the standard mixed model equations of animal breeders are given by C θ = t, where ] .
The fully conditional posterior distribution of θ is the multivariate normal process θ|ELSE, y ∼ Normal( θ, C −1 ) and, for any sub-vector Here, C i,i is an appropriate sub-matrix of C; t i is the corresponding sub vector of t; C i,−i is a block of C linking the "θ i equations" to the "θ −i equations", and θ −i is θ with θ i removed.The Gibbs sampler is implemented in a scalar mode, drawing from the appropriated fully conditional posterior distribution one element of θ at a time.In this case, θi and C i,i are scalars and C i,−i is a row vector.

Conditional posterior distribution of the residual variances
From (2.5), and the fact that Σ is a diagonal matrix, it follows that These are the densities of two independent scaled inverse chi-square random variables.Sampling is straightforward.

Conditional posterior distribution of the non-genetic variances
From (2.5), it follows directly that the two non-genetic variances are conditionally independent.In particular, These are densities of two independent scale inverse chi-square random variables.

Conditional posterior distribution of genetic variance matrix G
From (2.5) it follows directly that where

The Gibbs-Metropolis algorithm and convergence criteria
Our Gibbs-Metropolis sampling algorithm consisted of cyclic sampling through all components of τ , drawing each parameter or subset of parameters, conditionally on the realized value of all other parameters, at each iteration of the algorithm.At iteration t, an ordering of the components of τ was chosen and elements of τ were sampled sequentially from their conditional distribution, given the current value of all other elements of τ .A normal distribution with mean equal to the value of λ * i,t at iteration t and appropriate variance was used as proposal distribution for sampling from (2.6).Similarly, a normal distribution with mean equal to the value of p * i,t at iteration t and some appropriate variance was used as proposal distribution for sampling from (2.7).The variances of the proposal distributions were chosen to attain acceptance rates between 30% and 50% (Gelman et al., 2004).For the proposed model, a possible implementation of the algorithm at each step t is as follows, where τ 0 would be some starting value: , y] from the scaled inverse chi-square distributions (2.8) and (2.9), respectively.
Above, "ELSE t−1 " stands for all parameters, other than those that have been updated in the preceding conditional distribution.Visual inspection of trace plots of the MCMC run and the scale reduction factor diagnostic suggested by Gelman and Rubin (1992) were used to determine the length of the burn-in period and the total number of iterations for the Gibbs-Metropolis procedure.Two chains with overdispersed starting points were used in the Gelman and Rubin (1992) method.This monitors convergence of the iterative simulation by estimating the factor by which the scale of the current distribution for a parameter under study, say τ , might be reduced if simulations were continued for an infinite amount of time.
The potential scale reduction is given by R = √ v ar(τ|y) W , which declines towards 1 as the number of iterations J goes to infinity.Here, v ar(τ |y) = J−1 J W + 1 J B, where W and B are estimates of the within and between-chain variances.Discarding early draws as burn-in, such that the starting value is "forgotten", samples are drawn as needed to attain a sufficiently small Monte Carlo error of estimation of features of the posterior distribution, such as the posterior mean.

Discrepancy statistic
The adequacy of the ZIP model fitted to the data was assessed by comparing the observed value of a statistic T k (y, τ ) with its predictive distribution under the ZIP model.As a measure of "discrepancy", the statistic , where y i is the i th component of the observed vector y, T k,l (y rep,l , τ ) = n −1 ∑ n i=1 I(y i;rep,l = k), where y i;rep,l is the i th component of the replicated vector y rep of size n in sample l = 1, ..., L, k = 0, 1, ..., and I(.) is an indicator function.A distribution of D k values was generated for each value of k; if the model holds, D k should be centered at 0. Computations were as follows: 1) For L = 100, vectors v = (v 1 , ..., v n ) of size n = number of individuals in the data set were drawn.Each element v i of v followed a ZIP density evaluated at the posterior mean of λ * i and p * i , respectively, as inputs for the Poisson and mixture parameters.2) For each realization of v at sample l, n −1 ∑ n i=1 I(y i;rep,l = k) was calculated at each k, leading to 100 "future" relative frequencies.3) D k was calculated for each k, with values far from zero being interpreted as evidence against the model.

Residual analysis
Another tool used for model adequacy was a residual analysis.A residual for the i th record was defined as r i = y i − E(y i |τ ).The posterior mean of the standardized residual was estimated as ri = J −1 ∑ J j=1 and J being the number of posterior samples.The expectation and variance used were respectively.An observation would be unusual if the posterior distribution of r i is concentrated away from zero.

Model comparison
ZIP models were contrasted against a Poisson model; all specifications included genetic and residual effects in the structure λ * .Models were contrasted using the pseudo-Bayes factor (Geiser and Eddy, 1979;Gelfand, 1996).The predictive log-likelihood of each observed datum was estimated as the harmonic mean of the likelihood evaluated at samples from the posterior distribution (Gelfand and Dey, 1994;Newton and Raftery, 1994).At each iteration, the likelihood was stored for each observation, and the harmonic mean of the likelihood across samples was calculated as Li = , where L (j) i is the likelihood of subject i, evaluated at draw j from the posterior distribution.The predictive likelihood of the model M was them estimated as p(y|M ) = ∏ n i=1 Li , and the pseudo-Bayes factor is the ratio between the predictive likelihoods of competing models.

An Animal Breeding Application
Quantitative genetic analysis of mastitis data has been carried out mainly with linear models (e.g., Carlén, Schneider and Strandberg, 2005) and with threshold models (Gianola and Foulley, 1983;Heringstad et al., 2001;Heringstad et al., 2004;Chang et al., 2004), with the latter ones accounting for the binary structure of the data, at least when mastitis is categorized as "absent" or "present".When cows have more than one case of mastitis during lactation, longitudinal binary response models have been used as well (Heringstad et al., 2003).However, the number of episodes of a disease is a random count, and a more appropriate sampling model would be the Poisson distribution.Further, the "zero count" (e.g., no disease) may have higher frequency than the expected under Poisson sampling, so a ZIP model might be suitable.If so, an extension for quantitative genetics analysis of counts with an excess of zeros is needed.The objective of this application was to investigate alternative specifications for modeling number of incidences of mastitis via a ZIP model, and to make inferences about genetic (co)variation between the Poisson and mixture parameters involved.
The hierarchical ZIP model was fitted to a data set consisting of number of mastitis cases in 36, 175 first-lactation Norwegian Red cows.The data is described in detail in Rodrigues-Motta et al. (2007).The ZIP model in terms of (2.2) included 4 "fixed" factors affecting both λ * and p * .Here, β λ = (β 1,λ , β 2,λ , β 3,λ , β 4,λ ) and β p = (β 1,p , β 2,p , β 3,p , β 4,p ) , respectively.The vector β 1,j included effects of 15 ages at first calving (< 20, ..., 32, > 32 months), the vector β 2,j included effects of 12 months at first calving, β 3,j consisted of effects of 3 years at first calving (1990, 1991 and 1992), and β 4,j was a regression on the logarithm of the number of days from first calving to the defined end of first lactation (culling, second calving, or 300 days after calving, whichever occurred first), with j = λ, p.To achieve a reasonably vague prior, each element of β λ and β p was sampled from an uniform distribution spanning from −999 to 999.Herd effects represented the non-genetic random factor contained in h λ and h p , respectively.The non-genetic herd effects, represented by h = (h λ , h p ) , were assumed to follow a priori a multivariate normal distribution with mean zero and covariance matrix and n h = 5, 286.A ZIP "sire" model was fitted, thus a λ and a p are vectors containing half of the breeding values affecting λ * i and p * i , respectively; each of these vectors was of order 437 × 1.The sire (genetic) effects, represented by a = (a λ , a p ) , were assumed to follow a multivariate normal distribution with mean zero and covariance matrix G ⊗ A, where , and A is a known matrix of additive genetic relationship of dimension 437.The A matrix was built from a sire pedigree file with a total of 437 males, where the pedigree of the 245 sires with daughters in the data set were traced back, through sires and maternal grandsires, as far back as possible.In quantitative genetics theory, variation between sires accounts only for 1/4 of the total additive genetic variation (Falconer, 1989); the rest of the variation (genetic and environmental) is captured by the residual terms ε λ and ε p .Therefore, the residual term in the model accounts for more overdispersion, and for environmental effects beyond those due to herds.The degrees of belief parameters of the scale inverse chi-square and inverse Wishart distributions assigned as priors for variance components and matrix G were Added to high dimension of the data set and the pedigree, the complexity of the hierarchical model demanded that a Fortran program was written to sample the unknowns, following the scheme proposed in Section 2.2.The Gelman and Rubin (1992) convergence criterion used 2 chains starting from overdispersed values, with 10 6 iterations and a burn-in period of 5 × 10 5 samples.The scale reduction factors for the residual, herd and sire variances affecting λ * were 1.05, 1 and 1, respectively; the scale reduction factors for the residual, herd and sire variances affecting p * were 1.04, 1.05 and 1, respectively.The scale reduction factor for the sire covariance was 1.These values suggest convergence to the equilibrium distribution.However, the trace plots (results not shown) indicated that additional iterations would produce more accurate posteriori estimates of the residual variance associated to λ * , and of all variance components associated to p * .A total of 500, 000 after-burn-in samples (without thinning) from one of the two chains were used to calculate Monte Carlo errors associated to the posterior mean of the variance components.The Monte Carlo error variances of residual, herd and sire variances affecting λ * were 2.1 × 10 −8 , 5.8 × 10 −9 and 8 × 10 −10 , respectively; the Monte Carlo error of variances of residual, herd and sire variances affecting p * were 1.7 × 10 −7 , 2.3 × 10 −7 and 1.7 × 10 −7 , respectively.The Monte Carlo error variance of the covariance between sire effects was 5.7 × 10 −9 .These small Monte Carlo errors indicated that posterior mean estimates were precise enough.The posterior distributions of the dispersion components affecting λ * and p * are given in Figures 1 and Figure 2, respectively, and the posterior means and standard deviation (SD) of the residual, herd and sire variances affecting λ * were 0.76 (0.04), 0.36 (0.02) and 0.07 (0.01), respectively, with the most important source of variation being that due to residual effects, followed by herds and then by sires.In Figure 1, the posterior distribution of the residual and herd variances were nearly symmetric, while the posterior distribution of the sire variance was slightly asymmetric with a longer tail to the right.The posterior means (SD) of the residual, herd and sire variances affecting p * were 0.4 (0.11), 0.45 (0.13) and 0.27 (0.11), respectively, with the largest source of variation being herd effects, followed by residuals and then sires.As shown in Figure 2, the posterior distributions of the residual, herd and sire variances affecting p * were all skewed, with long tails to the right.These results suggest that it is more difficult to infer components of variance precisely for p * than for λ * .Additionally, the posterior distribution of the covariance (correlation) between sire effects on λ * and p * is shown in Figure 1, with the mean (SD) of the sire covariance being 0.01 (0.02).The 90% credible interval is given by (−0.02; 0.05), suggesting that genetic effects affecting λ * and p * are uncorrelated.There was large uncertainty about the sire correlation (which is equal to the genetic correlation, under additive inheritance at the λ * and p * levels).The posterior distribution assigned high density to values of the correlation varying from −0.4 to 0.4.As shown in Figure 2, the distribution (over observations) of the average posterior mean of the perfect state probabilities p was skewed, and its mean (SD) was 0.07 (0.11); the 5% and 95% percentiles yielded the 90% interval (0.02; 0.22).The average of the posterior means of elements of p * lead to the inference that, on average, about 7% of firstlactation cows would not get mastitis, either due to being totally resistant to the disease or for never being exposed to mastitis.For model assessment, the posterior predictive density of D k = T k (y|.) − T k (y rep |.) displayed in Figure 3 (left panel) indicates an agreement between the observed (T k (y|.)) and replicate data (T k (y rep |.)), in special for number of mastitis cases greater than 1.The plot of posterior means of residuals displayed in Figure 3 (right panel) suggested that the model fitted the zero counts reasonably; but that it is less successful for fitting number of mastitis cases larger than 0 because the mean residual values were far from zero.In the analysis of residuals, standardized residuals were obtained by replacing λ * and p * by point estimates; however, in the complex models considered with a huge number of random effects such residuals are far from being standard normal.In particular, the lack of model adequacy may have been aggravated by a poor estimation of the parameters in the case of small number of observations for number of mastitis cases greater than 0: 15.8, 5.1, and 1.6% cows had 1, 2, and 3 cases, respectively; only 315 cows had more than 3 episodes of clinical mastitis during first lactation.
For comparative purposes, two other models were fitted to the data: (1) a Poisson model having the same exploratory structure for λ * and (2) a ZIP model having the same exploratory structure but uncorrelated λ * and p * .The predictive log-likelihood values for the Poisson, ZIP with correlated λ * and p * and ZIP with uncorrelated λ * and p * models were −27059.8,−27019.9 and −27017.9,respectively, favoring the ZIP with uncorrelated λ * and p * model in a log-scale basis.The pseudo-Bayes factor was 40 between the ZIP model with correlated λ * and p * and the Poisson model, and 41.9 between the ZIP model with uncorrelated λ * and p * and the Poisson model, both favoring the ZIP model.The pseudo-Bayes factor between ZIP models with uncorrelated and correlated λ * and p * , respectively, was 2, favoring weakly the ZIP with uncorrelated parameters.Estimates of the variance components affecting p * and λ * are summarized in Table 1.The posterior mean (SD) of the residual variance affecting λ * was larger in the Poisson (0.9 (0.03)) than in the ZIP models (posterior means (SD) were 0.72 (0.04) and 0.76 (0.04) in the ZIP model with uncorrelated and correlated λ * and p * , respectively), suggesting that the overdispersion due to zeros was absorbed by the residual term in the Poisson model.The posterior mean (SD) of the herd variance affecting λ * was similar in all models: 0.35(0.02) in the Poisson model and 0.36(0.02) in the two ZIP models.The ZIP models captured more genetic variation affecting λ * , since the variation between sires was larger (posterior means (SD) were 0.09 (0.01) and 0.07 (0.01) in the ZIP model with uncorrelated and correlated λ * and p * , respectively) than in the Poisson model (posterior mean (SD) was 0.05 (0.01)).The mean (SD) of the posterior distribution of p * in the ZIP model with uncorrelated and correlated parameters were 0.1 (0.11) and 0.07 (0.11), respectively.Estimates were similar in the two ZIP models, except for σ ap where the mean was 37% larger in the ZIP model with uncorrelated λ * and p * .Since the credible interval for the covariance between sire effects included zero, the principle of parsimony favors the ZIP model with uncorrelated λ * and p * .

Discussion
Count data models have been developed for animal breeding applications, which pose either a Poisson mixed effects model (Foulley et al., 1987) or accommodate "extra-Poisson" residual variation explicitly (Tempelman and Gianola, 1996).However, part of this extra variation may be due to extra zeros relative to a Poisson sampling.In this case, a ZIP model may provide a better fit to the data (Lambert, 1992).From an animal breeding perspective, quantities of main interest are the genetic values of candidates for selection and associated variance components.Here, the ZIP model was extended to accommodate genetic effects by introducing correlated random effects in the structure of the log-Poisson parameter (λ * ) and of the logit of the mixture probability (p * ); the correlated random effects accounted for correlation within and between half-sibs families.The model structure is analogous to that of a multiple-trait linear model described, for example, in Sorensen and Gianola (2002).Moreover, a correlation between these two genetic effects would account for pleiotropic genes affecting the Poisson and the mixture probability, as in models in which a correlation between direct and maternal effects is fitted (Willham, 1963).Additionally, to account for overdispersion over and above that caused by extra zeros, residual terms were included in the structure of λ * and p * .The hierarchical structures posed for λ * and p * would permit to discriminate between individuals being resistance to a certain disease and those that are mildly liable.
In an application of this model to number of mastitis cases in first-lactation Norwegian Red cows, it seemed that a Poisson regression model absorbs overdispersion due to zeros in the residual term reasonable well.If this is so, the Poisson mixed model would produce poor estimates of the variance components.In the ZIP model, the components of variance affecting p * were inferred less precisely than those affecting λ * .The small number of observations in each combination of levels of random effects may have been the cause that large residuals were produced in the residual analysis (in special, for counts greater than 0).However, in a predictive analysis random effects were integrate out and the model seems to fit the data.
Although the scale reduction factor value proposed by Gelman and Rubin (1992) as a convergence criterion was satisfied, trace plots (not shown here) suggested that, in the case of variance components affecting p * , additional iterations are needed for convergence.However, this can be computationally very intensive, and mixing might be improved by switching sampling order of the unknowns at each iteration of the Gibbs-Metropolis algorithm as this would reduce the serial correlation between successively sampled quantities.We found a high posterior correlation between the genetic variance affecting λ * and genetic covariance (0.93); between the genetic variance affecting p * and genetic covariance (0.86), and between the genetic variances affecting λ * and p * (0.63).Another form of imposing mixing would be via a blockwise Gibbs-Metropolis sampler, as proposed by Gelman et al. (2004).
The fully conditional distribution of λ * and p * are not recognizable, so a Metropolis-Hastings scheme was needed to sample the appropriate unknowns.We found that a normal proposal distribution had a better performance than a random-walk proposal.It would be of interest to examine the performance of the Metropolis-Hastings scheme under different proposal distributions.Besag, York and Mollie (1991), working in a different problem, suggested that although a proper flat prior leads to a proper joint posterior distribution, it may produce a singularity invalidating the Gibbs sampler.This problem was not detected here, but a zero-mean normal prior with a large variance may be a better option.

Figure 1 :
Figure 1: Posterior distributions of the residual, herd, and sire variance affecting λ * and of sire covariances and correlations between λ * and p * .

Figure 2 :
Figure 2: Posterior distribution of the residual, herd, and sire variance affecting p * and distribution of the posterior means of the probability of the perfect state (p) under the ZIP model with correlated λ * and p * parameters.

Figure 3 :
Figure 3: Posterior predictive density of D k = T k (y|.) − T k (y rep |.) and posterior means of the residuals under the ZIP model with correlated λ * and p * parameters.