Measuring the Attenuation in a Subject-speciﬁc Random Eﬀect with Paired Data

: This paper is motivated by an investigation into the growth of pigs, which studied among other things the eﬀect of short–term feed withdrawal on live weight. This treatment was thought to reduce the variability in the weights of the pigs. We represent this reduction as an attenuation in an animal–speciﬁc random eﬀect. Given data on each pig before and after treatment, we consider the problems of testing for a treatment eﬀect and measuring the strength of the eﬀect, if signiﬁcant. These problems are related to those of testing the homogeneity of correlated variances, and regression with errors in variables. We compare three diﬀerent estimates of the attenuation factor using data on the live weights of pigs, and by simulation.


Introduction
This paper is motivated by an investigation into factors influencing variability in live pig weights in the North Island, New Zealand. Part of this study included an investigation into the effect of short-term feed withdrawal on live weight, for a period of approximately twelve hours from the evening until the morning of the next day. It was conjectured that part of the variability in live weight between pigs was due to differences in gut fill, and that a short period of feed withdrawal would eliminate this factor, leading to a slightly less variable, and more useful, measure of pig live weight. (Gut fill is defined as the weight of contents of the stomach, small and large intestine, which represents 3% to 10% of live body weight. See Kyriazakis et al., 2006).
Testing for a reduction in variance with paired data can be regarded as testing the equality of the diagonal elements in a bivariate covariance matrix. This problem was solved for a bivariate normal distribution by Pitman (1939) and Morgan (1939). The Pitman-Morgan method involves testing the correlation between the sum and difference of the two responses, with zero correlation corresponding to equality of the two variances. Later authors (Bhoj, 1984;Harris, 1985;Shapiro and Cohen, 1990) have extended this work to include missing values, multiple repeated measures, or robustness to non-normality. A thorough review, and further references, are given by Piepho (1996).
To measure the size of the treatment effect, if any, we could consider the proportional reduction in variance. Pitman (1939) describes the calculation of fiducial limits for the ratio of the two variances, before and after treatment. However these variances will typically include a within-subject component, which if present will lead to a reduction in the apparent treatment effect, biasing any comparison of treatment effect between different groups with different error variances. Our proposed solution is to model the treatment effect directly as a proportional reduction in an individual-specific random effect. Section 2 presents the general modeling and estimation framework, with the application to pig live weight illustrated in Section 3. In Section 4 we investigate the performance of alternative estimators of the treatment effect using simulation. Finally, the results and methodology are discussed in Section 5.

Hierarchical Model
Denote the observed measurements on individual i before and after treatment as Y 1i and Y 2i respectively. We wish to model the reduction in variance due to treatment, allowing for correlation between Y 1i and Y 2i . To do this we write where θ i is assumed to be normally distributed with zero mean and variance σ 2 θ and e 1i , e 2i are independent and identically normally distributed with mean zero and variance σ 2 e . This decomposes the observed variation into a betweenindividuals component σ 2 θ and a within-individual component σ 2 e . The treatment effect k is thus represented as a proportional reduction in an individual-specific random effect, reducing the first component to k 2 σ 2 θ but not affecting the second component. This can be regarded as a definition of the treatment effect, in which the covariance matrix of (Y 1i , Y 2i ) is parameterized as so that the ratio of variances of the observed data is (k 2 σ 2 θ +σ 2 e )/(σ 2 θ +σ 2 e ). The use of this variance ratio as a measure of the treatment effect ignores the influence of the within-individual variation. If this variation is large it will tend to understate the actual effectiveness of the treatment in reducing between-individual variance.
We suggest using an estimate of k as the measure of treatment effect. We first note that a test for the significance of the treatment effect involves testing the null hypothesis k = 1, which is equivalent to testing σ 11 = σ 22 . For this the Pitman-Morgan test (Pitman,1939;Morgan, 1939) may be used. First we define the pairwise sum and difference noting that e 1i + e 2i and e 1i − e 2i are uncorrelated and therefore, by assumption of normality, independent. The covariance of (S i , D i ) is (2.6) so a test of k = 1, equivalently σ 11 = σ 22 , can be performed by testing the significance of the correlation between S and D, ρ SD = 0. The Pitman-Morgan test calculates the sample correlation coefficient r SD for n individuals and refers SD to a t-distribution with n − 2 degrees of freedom. An equivalent, and perhaps more convenient, way of carrying out this test is to regress D i on S i and to assess the significance of the slope. This can be regarded as an "errors in variables" regression, with e 1i + e 2i as the measurement error in the x variable µ 1 + µ 2 + (1 + k)θ i . In the absence of measurement error, the true slope of the regression would be (1 − k)/(1 + k), or zero if k = 1. It is well-known (see Fuller, 1987;Carroll, Rupert and Stefanski,1995) that the effect of measurement error in x is to bias the estimated slope towards zero. However, as Carroll et al. (1995) point out, the naive test that the slope is zero, using standard regression output, is valid provided there are no other covariates and the measurement error in x is uncorrelated with the error in y.
If the treatment effect is found to be significant, we can use an estimate of k to describe the strength of the effect. We now consider some alternative estimators, using the framework of measurement error models, and noting that in this case the measurement errors e 1i + e 2i and e 1i − e 2i in Equation 2.4 are uncorrelated with equal variances.

Naive regression estimator
A naive approach to the estimation of k would be to simply regress Y 2i on Y 1i . In the absence of measurement error e 1i the regression coefficient would be unbiased for k, but from Equation 2.3 we can see that the expected slope is σ 12 /σ 11 = kσ 2 θ /(σ 2 θ + σ 2 e ). Since 0 < σ 2 θ /(σ 2 θ + σ 2 e ) < 1 the regression slopek nr is biased towards zero. We could attempt to correct for this bias by multiplying by an estimate of (σ 2 θ + σ 2 e )/σ 2 θ , which is then equivalent to the method of moments estimator of the next section. Carroll et al. (1995, p33) point out that such bias-correction will also increase the variance and perhaps lead to a larger mean squared error. Since the variance ofk nr is not mathematically tractable, this issue is investigated using simulation in Section 4.

Method of moments
By equating the components of the covariance matrix in Equation 2.3 to their sample values s 11 , s 12 , s 22 we can solve for k and the variance parameters σ 2 e , σ 2 θ . Writing γ = (s 11 − s 22 )/s 12 we get and, assuming s 12 > 0, the variance components are then If we use these estimates of the variance components to bias-correct the naive estimatek nr as suggested above, we again get the method of moments estimator k mm . Although the sample moments are unbiased for the true values, the estimated correction factor will not be, so we expect this correction to reduce but not eliminate bias. We note too thatk mm is the slope of the first principal component of (Y 1i , Y 2i ) , also the orthogonal regression estimator (Carroll et al.,1995, p28).

Pitman-Morgan regression estimator
A third alternative is to use the regression of the difference D i on the sum S i , as given in Equation 2.4. Again this is an errors-in-variables regression. In the absence of measurement error in D i (e 1i − e 2i ≡ 0) the expected regression slope would be β = (1 − k)/(1 + k) but the covariance matrix of Equation 2.6 shows that so again the measurement error causes a bias towards zero. If we back-transform we get the estimatork (1 + k)σ 2 θ + σ 2 e so assuming k < 1 there is a positive bias ink pm which will be quite small if σ 2 e σ 2 θ or k is close to one. The next term in a higher-order approximation to E(k pm ) involves σ 2 β and will be much smaller than the previous term if the data set is reasonably large. We can expect then that the bias of this estimator will be small so that it may out-performk mm . This too is investigated by simulation in Section 4.

Bayes estimator
If we specify prior distributions for the parameters k, µ 1 , µ 2 , σ 2 θ and σ 2 e then the posterior distribution of k is easily obtained by Markov chain Monte Carlo estimation (Gilks, Richardson and Spiegelhalter, 1996) using, for example, the BUGS program (Spiegelhalter, Thomas, Best and Gilks, 1994). This has the advantage of providing a measure of the uncertainty of the estimate, for example the standard error of the estimated posterior distribution. For the other non-Bayesian estimators, analytical expressions for the standard errors are mathematically intractable, although approximate standard errors can be calculated using the bootstrap as illustrated in the next section. To investigate the effect of feed withdrawal on live weight, pigs in three different age groups ("weaners", "growers" and "finishers") were split randomly into control and treatment groups. They were weighed in the evening and again the following morning after a time lapse of 11 hours for the weaners and 17 hours for the other groups. Those in the control group were fed normally, but food was withheld from the treatment group. It is thought that "gut fill" is a component of the variability in live weight, and that feed withdrawal would reduce gut fill to give a less variable and more useful measure of live weight. The experiment was later repeated, with the same treatment and control groups, after a period of 20 days, giving four measures of live weight per pig. The full dataset is available at http://www-ist.massey.ac.nz/GJones.

Analysis of Pig Data
Here we analyze the results of the first experiment using, in the notation of Section 2, Y 1i = Live weight in the evening Y 2i = Live weight the following morning The data are plotted in Figure 1. There is a clear difference between control and treatment groups in the Growers, less so in the Finishers, and very little apparent difference in the Weaners. Table 1 gives the Pitman-Morgan p-value for each group for testing the null hypothesis of no difference in the variances of Y 1 and Y 2 . These are two-sided p-values allowing for the possibility that the between-pig variance has increased, which is possible since they are continuing to grow and, in the control group, to eat. For all three age groups there is a significant decrease in variance with treatment. The results for the variance decomposition (σ θ , σ e ) show that the within-pig variation is much smaller than the between-pig component. Both show a gradual increase with age. It is noticeable that the within-pig component is similar between control and treatment groups, whereas the betweenpig component is much larger in the control group for both Growers and Finishers. This does not seem to be a failing in the model, since the discrepancy is present in the initial standard deviations σ Y 1 . This perhaps calls into question the random allocation into treatment groups. The Bayesian analysis was performed by running the model (2.1), with vague conjugate prior distributions, in WinBugs for 20 000 iterations after a burn-in of 10 000. Table 1 gives the estimated posterior means for σ θ and σ e , which are seen to be broadly similar to those derived by the method of moments. The Bayesian "p-value" shown is really twice the posterior probability that k ≤ 0 (or k ≥ 0 as appropriate).
The estimated attenuation factors (k) are displayed in Table 2. Standard errors fork mm ,k nr andk pm were obtained by resampling from each group of pigs, whilstk b represents the standard deviation of the posterior distribution of k derived from the MCMC output. The downward bias in the naive regression estimatork nr is apparent. The Pitman-Morgan regression estimatork pm consistently gives results very similar to the method-of-momentsk mm , but is much easier to compute as it involves only a simple regression.
The results suggest that the effect of feed withdrawal is most pronounced in the middle age group, when the growth rate of the pigs is at its highest.  The analysis of the pig data suggests that for estimating the treatment effect k the Pitman-Morgan regression estimator may be competitive with the methodof-moments estimator. The analysis of Section 2 showed that the former is biased upwards, but this bias may be compensated for by a smaller variance. In this Section we investigate this issue by simulating from the model 2.1 for various values of k from 0.5 to 0.95, and for various values of the ratio σ e /σ θ from 0.05 to 0.5. In each simulation 10 000 simulated values ofβ were produced to estimate E(β) and Var(β), from which the root-mean-squared-error RMSE = √ [E(β − β)] 2 + Var(β) was calculated. Sample sizes of n = 30, 60 and 100 were considered.
Typical results, for n = 60, are shown in Figure 2. The bias term dominates the RMSE of the naive regression estimatork nr , which in all cases has the worst performance. There is negligible bias in the method-of-moments estimatork mm but as k increases its variance increases faster than that of the Pitman-Morgan regression estimatork pm , which has a small positive bias. The result is that for large values of k the use ofk pm gives the best result in terms of RMSE. When n = 60 this occurs at approximately k > 0.7, a little earlier (k > 0.65) when n = 30 and a little later (k > 0.75) when n = 100. The difference in RMSE between the two estimators is quite small when the measurement error (within-individual) variance σ 2 e is small, as in the pigs data, but increases as the measurement error increases.

Discussion
We have presented a model for paired data in which the change in variance between the first and second measurements is represented as a re-scaling of a subject-specific random effect. The purpose behind this is to measure the effect of a treatment on the variability of measurements in a way which allows for measurement error or other within-subject variation. The question arises as to whether the model is appropriate for a given data set. Since the model can be regarded as a re-parametrization of the bivariate normal distribution, testing its goodness of fit is equivalent to testing for bivariate normality. The parameter k measuring the attenuation in the subject-specific random effect can be understood as a definition of what we regard as the effect of the treatment. The withinindividual component σ 2 e could be thought of as measurement error, and in some situations it might be possible to estimate it from repeated measures on each subject. We could then test the appropriateness of the assumption that e 1i , e 2i have a common variance. However it has to be remembered that the overall variability in the measuring process is not always easy to reproduce as it may include more factors than just the repeatability of the measuring instrument.
We have investigated some alternative estimators of the size of the treatment effect. A Bayesian analysis is reasonably straightforward and would be particularly useful if prior information were available. If conjugate priors are used, the posterior distribution of the variance of (Y 1i , Y 2i ) is inverse gamma, and the model parameters are functions of its components which, though analytically intractable, could be derived by simulation. MCMC gives a simple alternative which converges quickly.
The naive regression estimator has been found to be significantly biased, but regression of the difference on the sum, which we have termed Pitman-Morgan regression, and then transforming the estimated slope, performs well if the treatment effect is not too large (k close to 1). This is the simplest alternative since it requires only standard regression software. Estimation based on moments is a little more complicated but also has good properties, particularly when the attenuation is large (k 1).