Edition and Imputation of Multiple Time Series Data Generated by Repetitive Surveys

This paper considers the statistical problems of editing and imputing data of multiple time series generated by repetitive surveys. The case under study is that of the Survey of Cattle Slaughter in Mexico’s Municipal Abattoirs. The proposed procedure consists of two phases; firstly the data of each abattoir are edited to correct them for gross inconsistencies. Secondly, the missing data are imputed by means of restricted forecasting. This method uses all the historical and current information available for the abattoir, as well as multiple time series models from which efficient estimates of the missing data are obtained. Some empirical examples are shown to illustrate the usefulness of the method in practice.


Introduction
The National Institute of Statistics and Geography (INEGI) carries out the Survey of Cattle Slaughter in Mexico's Municipal Abattoirs (ESGRM for its name in Spanish).This repetitive survey captures monthly data for each abattoir in a questionnaire that asks questions about the slaughter of cattle for human consumption.Four species of animals are considered here: cattle, swine, sheep and goats.Even though INEGI puts a lot of effort to collect and publish trustworthy data, it is a fact that the quality of some statistical figures published by this official statistical agency can still be greatly improved.Such is the case of the data generated by the ESGRM, since this survey presents the typical problems of: (1) inconsistency of the collected data (the informant at the abattoir responded to the questionnaire, but the answers are not considered valid by some criteria used to verify the information), and (2) missing data (at least one of the variables lacks its value requested in the questionnaire).These problems create the necessity of applying statistical procedures for editing and imputing data.Such tasks should be done for each questionnaire (at the abattoir level) to avoid the accumulation of errors when aggregating data of two or more abattoirs.Therefore, it is desirable to use editing and imputing procedures with solid statistical foundations, which can also be computationally automated for massive and repetitive application (in all the municipal abattoirs, month after month).
In this work we propose a statistical methodology that is supported by multiple time series models.These models take into account historical information on the variables under study as well as their possible interrelations.The following three basic variables of the ESGRM were considered relevant: (1) number of heads (number of animals that are introduced alive to the abattoir), (2) weight on the hoof (weight of the live animal when entering the abattoir) and (3) weight of the beef carcasses (weight of the slaughtered animal after taking out some of its parts, like its skin, its head and its offal).
There is a large body of literature dealing with the problem of missing data in different types of surveys.We refer the reader to such authoritative works as Little and Rubin (1987) and Schafer (1997) for tools designed to perform statistical analysis of incomplete multivariate observations and to Zhang (2003) for a review of multiple imputation methods in use nowadays.Here we just consider the issue of imputation without taking into account the subsequent analyses of the data, because INEGI is a national statistical agency in charge of collecting and publishing data for the general public and it does not necessarily analyze the data.
With regard to the missing data problem in a univariate time series setting, some influential works are those of Kohn and Ansley (1986) and Gómez et al. (1999), although some other works have appeared in the literature (e.g.Guerrero, 1994).All these works suggest building an Auto-Regressive Integrated Moving Average (ARIMA) model for the available data, then use all the data (observed before and after the missing ones) to get efficient estimates of the unobserved values.The problem in this context is also known as interpolation and it consists in essence of predicting the outcome of the unobserved variable by means of the expected value of the predictive distribution.A particular work that deals with edition and imputation, considered as tools for quality control of univariate time series data, is that of Caporello and Maravall (2002).
In the multiple time series case we found only a few proposals, even though the problem appeared in the literature as early as 1974.The solution proposed by Sargan and Drettakis (1974) was very thorough, but the authors accepted explicitly that their method was difficult to apply in practice.The proposal by Luceño (1997) is more general than the previous one, since it can be used with vector Auto-Regressive and Moving Average (ARMA) models, but it relies on Maximum Likelihood Estimation, so that it also becomes difficult for massive application.Pfefferman and Nathan (2002) suggested another method, based on a multivariate state-space representation of the time series that relies also on Maximum Likelihood Estimation.The procedure suggested here differs from the already existing methods in that it is designed for a multiple time series that can be represented by a simple VAR (Vector Auto-Regressive) model that can be efficiently estimated by Ordinary Least Squares, equation by equation, a fact that simplifies considerably its practical implementation.It should be noticed that all the aforementioned methods aim to produce minimum Mean Square Error (MSE) estimates of the missing values.This fact differs from the usual approach used to impute values by drawing simulated values from a predictive distribution.An exception is Pfefferman and Nathan (2002) who did obtain the simulated values to represent uncertainty more appropriately.Thus, in our case we do not attempt to refer to the imputed values as simulated individual observations, but as estimated expected values.In the next section we describe the ESGRM and present some graphs that allow us to appreciate the typical dynamic behavior of the variables under study, along with some suggested transformations, which will serve mainly to edit the data.In the section after we present the VAR model and the restricted forecasting methodology that will be used to impute the missing data.Then, we present some aspects of the model building process and the estimation results, for the abattoir and animal species under consideration.Afterwards we present some results produced by the methodology, as much in the edition as in the imputation of data for the ESGRM.In that section we also show the results of some simulations carried out to verify the effectiveness of the method in practice.We conclude with some practical considerations.

Preliminary Data Analysis
The ESGRM covers the 31 States of Mexico where abattoirs are in operation.This study made use of data from the abattoirs of a State whose name is omitted for confidentiality reasons.Only the results of one abattoir will be used to illustrate the methodology, although the intention is to apply it to all the existing municipal abattoirs in the country (this number changes every year, there were 907 abattoirs in December, 2004 and890 in December, 2007).
The sample period covers data from January of 1998 to December of 2003, since those were the historical data available in the more recent annual publication (see INEGI, 2004) at the beginning of the study.It should be stressed that we had data up to December, 2003 and that we required estimating old data to get a complete record of multiple time series in order to build a model and start the repetitive (monthly) application of our methodology.The joint dynamic behavior of the three variables for each abattoir of the State in consideration was represented by a VAR model.We searched for a generic model and found a specification that provides reasonably valid results, in statistical terms, for all the abattoirs of the State.

Definition of concepts
The following concepts are used in the ESGRM.A municipal abattoir is the basic unit of observation, defined as the building where the slaughter of animals for human consumption takes place.The animal species considered are: cattle (includes bulls, oxen, cows, heifers and yearling calves), swine (includes pigs), sheep (includes lambs), and goats.
Two constructed variables: average weight on the hoof (weight on the hoof divided by number of heads) and yield of meat (ratio of weight of the beef carcass to weight on the hoof) were used to validate the collected data.We observed that these variables tend to stay near a constant value for several consecutive months.
In Figure 1 we show an example of this phenomenon for an abattoir, called A for confidentiality of the data.The species slaughtered are cattle and swine..40 .

64
. 68 1998 1999 2000 2001 2002 2003 Yield of meat The basic behavior observed in these graphs is repeated for all the animal species and abattoirs in consideration.This fact led us to think that the informants tend to report numbers close to the average values or within certain bounds of the constructed variables.The data are usually reported with evident errors and the INEGI personnel edit some of those errors when capturing the data every month, since this is a repetitive survey.Even with this first validation of data, several inconsistencies remain unnoticed by the capture personnel.Besides, some new errors are introduced when capturing the data.Thus, in order to correct these errors in a systematic way, we propose to start the editing process by taking into account the permissible values of the constructed variables.The aggregated figures for all the abattoirs in each Mexican State that appear in INEGI (2004) are official and definitive, nevertheless it was necessary to perform a preliminary analysis of the data pertaining to each and every abattoir under study to make sure that all the data were within the permissible limits.This was done for each and every variable.In fact, if a recorded observation of a variable fell outside those limits, it was replaced by a permissible value, while the observations of the remaining variables were not affected.This way we obtained a set of edited historical time series.Then, we applied some transformations that allowed us to see the natural variability of the data more clearly and proceeded to build multiple time series models, as described below.The transformation procedure must be applied month after month as new data arrive, in order to force them to be consistent with each other and with their historical records.Afterwards, we can use the imputation methodology to estimate missing data in the transformed scale by means of restricted forecasting.The restricted forecasts provide optimal estimates (in a statistical sense defined below) of the missing data.Finally, we retransform the estimates back to the original scale of the variables.

Transformation of variables
The transformations used are intended to complement the data edition in such a way that the transformed data satisfy the criteria that informants have been trying to apply routinely, although in an informal way.Therefore, transformation of the data is a fundamental part of the editing procedure.The variables are defined for month t as follows; N H t denotes number of heads, W H t weight on the hoof measured in kilograms, and W B t weight of beef carcasses also measured in kilograms.Since 0 < N H t for t = 1, ..., N , the proposed transformation becomes and the variable in differences that will appear in the VAR model is the relative growth rate of N H t .The reason for using differences will be explained when considering the model building process.
For the average weight, it is well known that the following restriction must hold, K1 ≤ W H t /N H t ≤ K2 where K1 and K2 are some known structural constants (e.g. for cattle, K1 = 250 and K2 = 550).Then we know that K1 < W H t /N H t + 0.1 and W H t /N H t < K2 + 0.1.The constant 0.1 was added to get strict inequalities without affecting the dynamics of the series involved and its inclusion amounts to extending the original interval by 100 grams at each side.
The proposed transformation is now and the variable to use in the model turns out to be It is also known that the yield of meat must satisfy the restriction K3 ≤ W B t /W H t ≤ K4 for all t, with K3 and K4 some known positive structural constants (for example, for cattle K3 = 0.5 and K4 = 0.55).We decided to use the ratio W B/N H, rather than W B/W H, because N H usually has complete data, in contrast with W H that usually lacks some data values.To make this change, we know that Therefore, as in the previous case, the transformation proposed is The constant 0.01 was chosen as in the previous case and its use implies that the interval [K1 • K3, K2 • K4] is extended 10 grams at each side.Therefore, the variable to be used in the model is The structural constants K1, K2, K3 and K4 used here for transforming the data are usually employed for descriptive purposes (see Ruiz et al., 2001) and T W B) and their first differences (DT N H, DT W H and DT W B), for the abattoir under consideration and the two species of animals.There we can see that the original series have trend and seasonality, we also see that the transformation not only extends the numerical scale from (0, ∞) to (−∞, ∞) but also helps to stabilize the trend in the data.Differencing ensures stationarity of the time series and makes the data fluctuate around zero.

Statistical Methodology
The most important methodological aspect of this work is the use of restricted forecasting, supported by multiple time series models.Such models allow us to capture the dynamics of the time series and here we will place special emphasis on their use for imputing missing data of the ESGRM.

VAR model
Let Z t = (Z 1t , . . ., Z kt ) be a k-dimensional multiple time series observed for t = 1, • • • , N .The VAR representation for this time series is where Π(B) denotes a matrix polynomial of order p < ∞, in the backshift operator B, such that BZ t = Z t−1 for every Z and t, that is, Π(B) = This model takes into account the deterministic vector of variables D t = (D 1t , . . ., D kt ) that may include constant levels, seasonal dummies and intervention variables whose effects on Z t are captured by the parameter matrix Λ{a t } is a zero-mean Gaussian white noise process, so that the a t 's are independent and identically distributed as , where 0 k is the zero vector and Σ a is the contemporaneous error variance-covariance matrix, given by When the series {Z t } is stationary, model (3.1) is well defined.Otherwise, for the model to be well defined, it is necessary to assume that the process started at a finite time point in the past, with fixed initial conditions.When the individual time series are all integrated of at least order one and they are not cointegrated, we can apply the difference operator = 1 − B to each series in order to get an appropriate VAR representation for the new time series {∇Z t }.On the other hand, when a cointegration relationship exists among the variables (see Engle and Granger, 1987) we could work with the following VAR model in Error Correction (VEC) form This model arises from the equation Π(B) = Π(1)B + Π * (B)∇.In this case, the matrix polynomial Π * (B) is of order p − 1, whereas the matrix Π(1) is defined in accordance with the cointegration relationships that exist among the variables.
Let us notice that Π * (B)∇Z t captures the short-run relations in Z t , while Π(1)Z t−1 represents the long-run relations and all the elements of equation (3.2) are stationary.The VEC model is related to economic theory since it allows interpreting the results and making inferences about both short-run and long-run economic relations.However, it should be noticed that the information captured by the V AR and the V EC models is exactly the same, even though their representations are formally different.In the present case, we only want to get one-step-ahead forecasts from the model and we are not interested in the long-run relations that may exist among the variables.Hence, in what follows we will use the V AR form (so that no cointegration analysis is required) and the restricted forecasting methodology will only be presented for that model.

Restricted forecasts
Here we will assume that the model and its parameters are known, so that we do not consider at this time such issues as specification, estimation and validation of the model.In practice however, those issues have to be faced as in the illustrative application shown below.The vector Z = (Z 1 , . . ., Z N ) has the observations of Z t = (T N H t , T W H t , T W B t ) for t = 1, . . ., N , while Z N +1 is the vector of future values to be forecast, with origin at time N. Since the stationary series is Then, the minimum Mean Square Error (MSE) linear forecast of ∇Z N +1 is its conditional expectation given all the historical information, that is, The corresponding forecast error is given by and its MSE matrix is Now, suppose that we also know the vector of observations Y = (Y 1 , . . ., Y M ) that imposes M ≥ 0 linearly independent restrictions on the future values of the vector Z.Such restrictions come from an external source to the time series model and can be expressed as where u = (u 1 , . . ., u M ) is a random vector distributed as N (0 M , Σ u ).The M ×k matrix C is known and has rank M ≤ k.We will show below some particular forms of C for the imputing problem faced by the ESGRM.In those cases, the data Y 1 , . . ., Y M are the observed values of the variables at time N + 1.Expression (3.4) can be deemed as a set of stochastic linear restrictions to be imposed on ∇Z N +1 .We assume E(u|Z) = 0, in such a way that the restrictions are unbiased and their uncertainty is linked to the variance V ar(u|Z) = Σ u , which in the present application is Σ u = 0.If E(a N +1 u |Z) = 0 we get the following optimal restricted forecast of ∇Z N +1 given Z and Y, This result is shown in the Appendix for the general case Σ u = 0 (a different proof, based on signal extraction techniques, can be found in Pankratz, 1989).
It should be stressed that the linear combination (3.5) is optimal in the sense of having minimum M SE within the class of linear predictors for ∇Z N +1 .Further, the restricted forecast has lower MSE than that of the unrestricted forecast because where ACΣ a = Σ a C (CΣ a C ) −1 CΣ a is a positive semi definite matrix.

Forecasts in the original scale
To obtain the forecasts of N H N +1 , W H N +1 and W B N +1 from the forecasts of ) , we first obtain the forecasts in transformed levels by means of Then we go back to the units of the original variables as follows; for N H we know that exp(T N H For W H we have Similarly, for W B we get When the value of W H N +1 is known we should use it to get the forecast of W B N +1 that satisfies the true restriction K3 ≤ W B N +1 /W H N +1 ≤ K4.To do that, instead of the previous expression we should use It is worth noticing that the log transformation is nonlinear, so that backtransforming the forecasts produced by the model in the transformed scale induces bias on the forecasts in the original scale.That happens because the forecast represents a median value in the original scale and the analyst usually expects the forecast to represent a mean value.A correction for bias may be applied, as indicated in Guerrero (1993) or, as we prefer in this case, the uncorrected forecasts in the original scale should be interpreted as median values.

Compatibility test
Since the optimal forecast implies combining information from two different sources, we should be aware of the possibility of combining contradictory information.Thus, we propose to judge the validity of this combination empirically.We say that the extra-model information Y, is compatible with the information provided by the model, CE(∇Z N +1 |Z), if the distance between those vectors is close to zero.We define the random vector of differences which is distributed as N (0 M , CΣ a C ).Therefore, the distance that takes into account the variability of d yields the statistic with χ 2 M (α) the upper α percentage point of the Chi-square distribution with M degrees of freedom.Equivalently, Y is incompatible with CE(∇Z N +1 |Z), at the 100α%, significance level, if (3.8) does not hold.This decision rule is based on the assumption that all the model parameters are known.Then, even when the parameters are consistently estimated, the rule is only asymptotically valid.In case of incompatibility we may conclude that the observed data at time N + 1 are atypical.

Restricted forecasts for the ESGRM
In the case of the ESGRM, the extra-model information is the observed data at time t = N + 1 and the unrestricted VAR forecasts are given by together with the compatibility statistic We should notice in these expressions that the restricted forecasts for data actually observed yield exactly those observations, so that the restrictions imposed are satisfied exactly.Furthermore, the estimated M SE of the restricted forecasts are smaller than those for the unrestricted forecasts.

Building the VAR Model
The VAR model was built from adjusted data (edited as indicated previously, with no allowance for outliers) following standard procedures.That is, we first decided the degree of differencing that renders each individual series stationary.
To that end we applied the variate difference method (see Anderson, 1976) on the transformed series.That is, out of those series with successive degrees of differencing, the one with minimum standard deviation should be used.Of course, we could have used a unit root test, implying a case-by-case analysis for each species.That would have made the proposed method difficult to apply massively (for all the abattoirs in Mexico).

Degree of differencing
The results of applying the variate difference method are shown in Table 1.There we see that the transformed series requires at most one difference to become stationary.This pattern was observed for all the species at every abattoir under study.Thus, by applying one difference to the series we will achieve stationarity, although we may run into overdifferencing, but we did not consider that to be as serious a problem as that of underdifferencing, because the model is only going to be used to produce one-step-ahead forecasts.We refer the reader to Sánchez and Peña (2001) for an argument that favors overdifferencing to underdifferencing a time series, when using an autoregressive model to produce forecasts.Let us recall that we were looking for a generic transformation to stationarity that could be applied to all the series, in order for the procedure to be computationally automated for massive and repetitive application (so that it could be used as a canned package by the capture personnel).

Likelihood ratio tests
Once stationarity was achieved, we had to decide the VAR order for each species at every abattoir.Thus, we applied a likelihood ratio testing scheme (see Lütkepohl, 2005, pp. 128-144) that allowed us to test sequentially the hypotheses H 0 order p vs. H A : order p − 1, starting with the value p = 5.Some examples of the tests results are shown in Table 2.We found a generic specification with p = 3, since that order yielded the first significant value of χ 2 in most cases, when we reduced the number of lags.In all cases, we used the same model specification where the vector of deterministic variables D t contains seasonal dummies.

Model estimation
The resulting VAR model is given by the following three equations where Z stands for N H, W H and W B, for t = 6, . . ., N .Besides, the seasonal dummies are centered (so that the sum of the D i,t values within each year is zero, with i = 1, . . ., 12) that is, Ordinary Least Squares was applied to each equation separately because that method produces efficient joint estimates of all the model parameters, as it is shown in Lütkepohl (2005, pp. 71-72).Therefore, parameter estimation was also carried out easily (we employed the statistical package RATS, version 5, available at http://www.estima.com).Some typical estimation results appear in Tables 3 and 4.There we see that seasonality has significant effects to explain each variable for every species.The adjusted coefficient of determination reaches values around 75% for DTNH, but it is sensibly smaller for DTWH and DTWB (around 35%).Each variable is explained by itself, among others, as shown by the F statistics used to test the null hypothesis of no significant effects on the variable to be explained.

Model validation
Empirical adequacy of the model was checked first by means of visual inspection of residual plots as those shown in Figure 3.There we see that no outliers are present and no serious violation of the homoscedasticity assumption is evident.It should be mentioned that the extreme observations for DTWH (in 1998:07 and 2002:01) as well as those for DTWB (in 1998:07, 1999:02 and 2002:01) were adjusted during the edition procedure of the historical record.We also calculated the Ljung-Box statistic to check for zero autocorrelation and the Jarque-Bera statistic for normality.The corresponding values for cattle appear in Table 5 and none of those calculated statistics show evidence of inadequacy.Therefore, we concluded that the estimated model was reasonably supported by the data at hand.

Application of the Method
This section is devoted to show some empirical results produced by the proposed methodology.These examples are shown only for illustrative purposes of the kind of results produced by the method in practice.It is important to appreciate how the methodology is to be applied separately to data from each abattoir on a monthly basis, once the corresponding VAR model was built from the edited historical time series: 1. Locate any recorded values that are obviously incorrect, for example by checking that the average weight per animal before slaughtering lies in the range (K1, K2), where K1 and K2 are species-specific constants.Replace these with adjusted values that satisfy the corresponding edit rule.
2. Test that each month's vector of observations is compatible with the previous month's one-step-ahead forecast, by computing the compatibility statistic.If the test statistic exceeds some percentage point of the appropriate Chi-square distribution, conclude that the data were recorded incorrectly and replace them with the model-based forecasts.Otherwise, keep the data as recorded.

Edition of data
In Table 6 we show an example of an abattoir without missing data or gross inconsistencies of the observations, as compared with their one-step-ahead forecasts, during each of the months under consideration.The observed data are not significantly different from the unrestricted forecasts since the largest calculated compatibility statistic became K calc = 4.0 which is not significant at the 10% level when compared against a Chi-square distribution with 3 degrees of freedom.Thus, by applying the edition procedure in this case we changed the observations of W B forcing them to satisfy the edit rule imposed by K2 and K4, without changing any observations of N H or W H. All the observations on W B were changed because they were too low (it is well known that some informants tend to lower the values for this variable, trying to hide the actual yield of meat).Structural constants: K1 = 250, K2 = 550, K3 = 0.5, K4 = 0.55

Edition and imputation of missing or inconsistent data
In Table 7 we show two examples where both edition and imputation took place.In order to apply the imputation procedure we decided to fix 1% as the cutoff point for the significance level of the compatibility statistic.The first example corresponds to a gross inconsistency in abattoir A1, found in the observed data of swine in August (compatibility statistic K calc = 9.9).All the W B data were changed essentially by the edition rule, while the W H datum was estimated by restricted forecasting since it was considered too high by the automatic procedure (as compared with its corresponding unrestricted forecast).

Simulation
A small simulation study was carried out to validate the usefulness of the method to approximate the true values of the missing or inconsistent data.The experiment was intended to reproduce the most frequent situations that occur in practice, these are shown in Table 8, where the notation employed is: O = observed datum, − = datum 25% lower than its actual value and + = datum 25% higher than its actual value.Even without a formal statistical analysis of the experimental results, in Table 9 we can clearly see that the estimated values produced by restricted forecasting, either to replace inconsistent data or to impute missing data, are very reasonable.The largest discrepancies are: 36.8% for NH, 38.6% for WH and −52.8% for W B, indicating that W B is the most sensitive variable.Furthermore, the values of the compatibility statistics do not lead us to declare incompatibility between observed data (NH = 492, WH = 196800, WB = 88560) and unrestricted forecasts (N H = 463, W H = 166645, W B = 95247).When the data are set in accordance with the experimental design incompatibility arises at the 5% significance level, in most cases.

Final Considerations
The proposed procedure consists of two phases.Edition is carried out first, in order to produce valid data that can be used as input when estimating a VAR model for the multiple time series under study.Imputation is then applied to estimate missing or inconsistent data.Edition makes use of some known structural constants that define the permissible range for two constructed variables.These constants are employed to transform the data in such a way as to get valid data that can be used when building models for the imputation phase.A VAR model is used to get restricted forecasts which provide statistically efficient estimates of the missing data.This model has to be reestimated each month as new data arrive.
An added benefit of using explicit statistical models for editing and imputing data, rather than numerical algorithms, is that we can measure uncertainty of the imputed values and also make statistical inference, as that provided by the compatibility statistic.Nevertheless, we should be aware that this procedure is applied at the abattoir level, for data of each animal species, so that a measure of uncertainty for the aggregated data at the State level is something to be done in the future (this measure could be useful for performing statistical analysis of the ESGRM data).Finally, in order to generalize the use of the VAR model here employed to other Mexican States we require searching for generic models that provide statistically valid results for the abattoirs in those States.

Figure 1 :
Figure 1: Average weight on the hoof and yield of meat (Cattle)

Figure 2 :
Figure 2: Original and transformed data (Cattle) The pattern followed by the missing data gives raise to eight different cases: zero observations missing, one observation missing, (N H, W H or W B), two observations missing (N H and W H, N H and W B or W H and W B) or the three observations missing.These cases can be expressed in terms of the arrays C, Y and∇ ẐN+1 = ( DT N H N +1 , DT W H N +1 , DT W B N +1 )that appear in expressions (3.5)-(3.7)and (3.8).For instance, when no data are missing at t = N + 1, we have C = I 3 and Y = ∇Z N +1 so that the restricted forecasts are given by ∇ ẐN+1 = Y, with M SE(∇ ẐN+1 ) = 0. Besides, the statisticK calc = [Y − E(∇Z N +1 |Z)] Σa [Y − E(∇Z N +1 |Z)]allows us to validate the joint compatibility of the three newly arrived data with their unrestricted forecasts, by comparing its value against a Chi-square distribution with 3 degrees of freedom.Another example of how the arrays are specified to obtain the restricted forecasts is when DT W H N +1 and DT W B N +1 are missing.Then C = (1, 0, 0), Y = DT N H N +1 and

Figure 3 :
Figure 3: Standardized residuals of the VAR model (Cattle) Appendix.Proof of the restricted forecasting formulasEquations (3.3) and (3.4) can be written as the following system ( E(∇Z N +1 |Z) The final decision of which values to use was made by asking for advice to the veterinary personnel that works at the municipal abattoirs in the State under consideration.Figure2shows the behavior of the time series in the original scale (N H, W H and W B), as well as that of the transformed variables (T N H, T W H . Since there is a great variety of animal species, classified by age and race, we decided to look for the most commonly used values in each Mexican State.We found some values on the following websites: Faculty of Veterinary Medicine at the National Autonomous University of Mexico(Veterinary, 2005, www.veterin.unam.mx),Secretariat of Agriculture (SAGARPA, 2005, www.sagarpa.gob.mx/ganaderito) and cattle dealer associations (Cattle dealers, 2005, www.mexicoganadero.com/limousin).

Table 1 :
Standard deviation for successive degrees of differencing

Table 2 :
Likelihood ratio test results

Table 3 :
Estimation results of the VAR(3) model: Cattle

Table 4 :
Estimation results of the VAR(3) model: Swine

Table 6 :
Application of the editing procedure in real time: Cattle

Table 7 :
Application of the imputation method in real time: Swine

Table 8 :
Experimental design for the simulation study