Statistical Functional Modeling of Quality Changes of Garlic under Different Storage Regimes

In this paper we analyze the weight loss behaviour of Mexican garlic under different storage conditions. Garlic is an important Mexican export product. Quality losses during storage are important to understand due to cost and sale opportunity implications. Weight losses profiles for each experimental conditions, represented as functions, are modeled by means of functional linear models and hypotheses tests are performed to compare treatments. Monte Carlo sampling version of permutation tests are used to obtain p-values. Using the functional approach clearly defined storage regimes that significantly decrease the speed of deterioration of the product relative to traditional Mexican agricultural practices.


Introduction
Garlic (Allium sativum L.) is one of the most important products from Mexican agriculture.Mexico has the second place in the Americas as garlic producer cultivating 9,850 hectares with 75,810 ton per year (Heredia, 1995;SAGAR, 1997); garlic in Mexico is important from an economic perspective because 26% from its total production goes to USA markets (Heredia, 2000).Its production cycle goes from February to August; during the period of low availability, harvested garlic is strored without any control.It is important to understand the effects of different storage conditions on its quality features.
The global objective of this research was to longitudinally study quality changes in garlic due to six different storage regimes (treatments), 0 • C , 20 • C, 30 • C, 5 • C, 0 • C / 70% of relative humidity, and no control, this last one being the traditional storage condition of garlic in Mexican agriculture.In this paper we concentrate in weight loss changes; for each storage regime a batch of garlic (with 360 bulbs) was screened during 190 days.From each batch, three 5-bulbs sets were used as three replicates, therefore 18, the total number of experimental units, were assigned completely at random to the different storage regimes.Weight loss of every experimental unit was repeatedly measured every 10 days (from 0 to 190 days).
Then the corresponding experimental design is a completely randomized one factor design with three replicates.The response is the weight loss profile along 190 days.
In the context of the described experiment, there exist multiple alternatives for the statistical analysis.In this paper we show an application of what is called a functional linear model to estimate contrasts of interest among storage conditions.We will make statistical inference by means of permutation tests (Good, 2000).In the next section basic ideas of functional data analysis and linear modelling are presented.In section 3 we comment on the alternatives to carry out hypothesis testing in the context of the functional linear model.Section 4 is devoted to the application and discussion of the results about the comparison of storage regimes.

Functional representation of discrete data
Functional Data Analysis (FDA) is a useful approach to study variation of responses such as the one described above.Let N be the number of experimental units, n the number of times that the weight loss was recorded and y ij the weight loss values for i = 1, . . ., N and j = 1, . . ., n.These values could be thought of as a discrete manifestation of a weight loss function y i (t) , such that t j , j = 1, . . ., n, the so-called knots.
Then the first step in FDA is to represent these values by a function y i (t) , t ∈ [0, 190(= T )].Assuming observational errors, the estimation process of y i (t) involves smoothing techniques.A popular criterion (Green and Silverman, 1994) to estimate y i (t) is minimizing where λ i ≥ 0 is a smoothing parameter which represent a trade-off between the goodness of fit y i (t) to the data, the first term in (2.1), and its smoothness measure by the integral of y i (t) , the second derivative of y i (t) used as a measure of its smoothness.
Given (2.1) an important point is where to search for y i ; a searching strategy having a sound theoretical background (see for instance Wahba, 1990) is to think of y i (t) as where {B m (t)} M 1 is a set of basis functions to represent y i , and {c im } the corresponding set of coefficients of y i .Under this representation, (2.1) can be written as where y i = (y i1 , . . ., y in ) T , c i = (c i1 , . . ., c i M ) T , B a matrix with columns B m (t j ) , m = 1, . . ., M, and R = {R mu }, The solution to this minimization problem corresponds to The estimated function resulting from the above describe process belongs to the so-called natural cubic smoothing splines (Green and Silverman, 1994).A set of basis functions that efficiently expands a natural cubic spline from the computational perspective are the so-called B-splines.The specification of a Bspline basis requires augmentation of knots t 1 < . . .< t n .Let t 0 and t n+1 denote boundary knots.The augmented knot sequence is defined as follows; The values of the additional knots are arbitrary, and it is customary to fix them all the same and equal to t 0 and t n+1 , respectively.Denote by B u,m (x) the B-spline function of order m for the sequence of knots τ, m ≤ M, recursively defined as: Thus with M = 4, B u,4 (t) , u = 1, . . ., n + 4 are the n + 4 cubic B-spline basis functions for the knot sequence t j s.So defined each B-spline function has a compact support; this implies computational efficiency; for further details see for instance Hastie, Tibshirani and Friedman (2001).
The smoothing parameter λ i is a key component in the estimation process because if λ i → 0, ŷi (t) will be just as rough as y ij , j = 1, . . ., n, while if λ i increases without limit, ŷi (t) will be forced to a linear function (ŷ i (t) = 0) .One way to choose λ i is by minimizing the cross -validation score where ŷ(−l) i (t j ; λ i ) represents the estimated function once y il has been omitted from the estimation process.

A functional linear model
Being y i (t) the functionally represented weight loss function for experimental unit i along storage time, the objective is to study changes in {y i (t)} N 1 due to the above mentioned six storage conditions.Ramsay and Silverman (1997) (see also Ramsay and Silverman, 2002), introduced the following linear model where β (t) represents a vector of parameter functions of interest, i (t) an experimental error function.In our case the purpose is to compare all storage treatments versus the no control condition; therefore model (2.2) is used in this application taking the following form where β 6 (t) = 0 and β j (t) is interpreted as the difference effect of treatment j and treatment 6 (no control), j = 1, . . ., 5; α (t) represents the expected functional response from treatment 6, and then it may be interpreted as a baseline against which the rest of the treatments are going to be compared.An estimation criterion of β (t) is to minimize in model (2.2) If Y (t) represents a N − vector with elements y i (t) , i = 1, . . ., N, the N profiles from N experimental units, and X an N × q design matrix of full rank (q = 6 in this application) with rows x i , i = 1, . . ., N, for actual computation of β (t) an unconstrained minimum of a sum of squares (2.4) should be obtained.Taking advantage of the basis representation of each y i (t) described in the previous section, where B represents a M -vector of basis functions and Y gives the coefficients of the observed vector Y of functions.Expand the estimated parameter vector β (t) in terms of the same basis, that is, for a q × M matrix B. Then B can be obtained from Evaluation of the resulting fit can be done by using the following functions: the functional versions of the sum of squared errors and the mean square error.

Hypotheses Testing in Functional Models
There are many alternatives to test hypothesis about β (t) in model (2.2) .One of these alternatives is to use analysis of variance of data from a repeated measures design, measuring the experimental unit over time where time is one of the factors in the treatment structure of the experiment.By measuring the experimental unit at several different times, the experimental unit is essentially split into parts (time intervals) and response is measured on each part, appearing a split plot structure along time.Time is not randomly assigned, of course, and then the usual analysis of variance may not be valid, because the errors corresponding to the respective experimental units may have a covariance matrix that does not conform to those for which the usual split plot analysis is valid.When usual assumptions do not hold there exist different approaches to adjust the analysis of variance; imposing a working assumption on the covariance matrix of errors (usually compound symmetry on the experimental unit errors or the Huynh -Feldt condition on errors within each experimental unit and for each experimental unit).See for further details Milliken and Johnson (1992).Nevertheless the emphasis in this approach is on means comparisons, therefore leaving aside the analysis of each complete experimental unit profile as a whole.
Another alternative is to carry out a two -step modelling process (also called a hierarchical modelling); in the first step it is proposed for the response a parsimonious parametric linear or nonlinear model on time and then, in the second step, modelling the estimated parameters as a function of the treatment structure of the experiment.There is a vast literature on the use of this modelling strategy, being representative Davidian and Giltinan (1995), Diggle, Liang and Zeger (1994), Verbeke and Molenberghs (2000).Most of the corresponding inferential procedures are based on an important number of assumptions and results are approximate.
Another form to contrast hypotheses of interest in the context of model (2.2) is to consider multivariate analysis based -methods, considering a grid of values along the domain of the functional observations; as it is explained by Faraway (1997), likelihood ratio statistics will become dominated by terms representing unimportant sources of variation as soon as the size grid becomes large, as it is the natural case with functional data.Faraway (1997) also proposed bootstrap testing methods from the computation of residual curves under the null hypothesis of interest.
An additional approach is testing hypotheses by a permutation based testing approach.Ramsay and Silverman (1997) proposed the usage of this approach in the context of functional linear modelling.Nichols and Holmes (2001) have reported applications of this kind of tests for functional neuroimaging.
Permutation tests were proposed in the early twentieth century, but now with powerful computers, are feasible (Good, 2000).In an experimental context in which treatments are compared, if treatment randomization was done, under a null hypothesis of no treatment differences, the observed data can be observed under any treatment labeling.Given a meaningful statistic T to test a null hypothesis, effectively permuting the treatment labels, and computing for each permutation T , gives us a way to calculate a p-value under a specific cdf F governing the actually observed data under H 0 , F being completely specified.If F is not specified under H 0 , the empirical cdf F is minimal sufficient for F. For instance in the case of the comparison of two means, H 0 : where µ 1 and µ 2 represent the means for the respective populations, H 0 does not specify F ; however if we believe that cdf´s F 1 and F 2 have the special forms for some unknown G, then H 0 implies a common cdf for the populations; under H 0 Ĝ as sufficient statistics, is comprised of the ordered set of the pooled sample, that is, the sufficient statistics S is the set of order statistics for the pooled sample .Therefore the p-value is calculated as Actual computation of (3.1) , when S = s, implies that the pooled sample must form a permutation of s.When H 0 is true all such permutations are equally likely and then p = # of permutations such that T ≥ t total # of permutations .
In a functional context, a nonparametric permutation test represents an important approach due to the difficulty to specify under H 0 a non -stationary stochastic process generating the observed functional data.
In the garlic experiment experimental units were assigned at random to the different regimes, and then a permutation test is applicable permuting labels of the functional observations.
The null hypotheses of interest are A permutation test to be done requires the following tasks (Good, 2000): 1. Analyze the problem.
2. Choose a test statistic.
3. Compute the test statistic from the original labeling of the observations.4. Rearrange (permute) the labels and recompute the test statistic for the rearranged labels.Repeat until you obtain the distribution of the test statistic for all possible permutations.
5. Accept or reject the hypothesis using this permutation distribution as a guide.
In step 2, following Ramsay and Silverman (1997), we proposed a permutation test using as the test statistic, avoiding the problem of multiplicity of testing (Nichols and Holmes, 2001), where and u j a q− vector with 1 in position j and zero elsewhere.We additionally used the statistic In step 4, among other approaches, we follow a Monte Carlo strategy to fix the number of permutations (500) in order to build the permutation distributions of above mentioned statistics.

Data Analysis
All computations were done using S-PLUS 6 for Windows and the subroutines developed by Ramsay and Silverman (1997); beginning with these subroutines, ad hoc modifications were done to set the specific functional model, building statistics and to get permutation tests results.
Using M equal to 20 cubic B-spline functions were use to represent functionally each weight loss profile, using cross validation to choose λ i , i = 1, . . ., 18.
In Figure 1 2. In Table 1 are shown the estimated p-values corresponding to the permutation distributions of statistics (3.3) and (3.4) to test hypotheses (3.2) .Comparing to the no control condition, both treatments under 0 • C induce a highly significant reduction in weight loss during the storage period, specially after 75 days.In comparison to the no control condition, 5 • C induces an important increment in weight loss after 150 days.Storage condition under 30 • C is different from no control condition inducing a greater loss weight until 150 days, and then inducing a gradual decrement in the weight loss.Finally with 20 • C the loss weight pattern is statistically equivalent to the no control storage condition.
In Figure 3 are shown the average derivatives of the weight loss functions showed in Figure 1.We can observe that the derivatives of weight loss functions at 5 • C and no control show the most important changes in speed especially after 100 days.The other treatments show relatively constant speeds along the observation period.
are shown the N = 18 weight loss profiles with n = 20 data each, of the 3 replicates from each storage regime.Descriptively, from this figure there are clear differences among treatment conditions.There are 3 groups: 0 • C and 0 • C / 70%RH with linear profiles and weight loss less than 10% until 190 days; 5 • C with the worst weight loss profile reaching around 30% of weight loss; and 30 • C, 20 • C and no -control with moderate weight losses at the of the period but with different trends to reach their 190-day state, linear trend for 30 • C, quadratic trends for 20 • C and no control.

Figure 1 :Figure 2 :
Figure 1: Weight loss profiles under different storage regimes

Figure 3 :Figure 4 :
Figure 3: Derivatives of weight loss functions Using the corresponding functional linear model (2.3) but with the velocities of weight loss as functional responses, in Figure 4 are shown the corresponding