Multivariate Regression Modeling for Functional Data

We propose functional multivariate regression modeling, using Gaussian basis functions along with the technique of regularization. In order to evaluate the model estimated by the regularization method, we derive model selection criteria from information-theoretic and Bayesian viewpoints. Monte Carlo simulations are conducted to investigate the efficiency of the proposed model. We also apply our modeling strategy to the analysis of spectrometric data.


Introduction
Functional data analysis (FDA) has received considerable attention in different fields of application, including bioscience, system engineering, and meteorology, and a number of applications have been reported (see e.g., Ramsay andSilverman, 2002, 2005;Ferraty and Vieu, 2006).The basic idea behind functional data analysis is to express discrete data as a smooth function and then draw information from the collection of functional data.We focus on the problem of constructing functional regression models, where observed values can be interpreted as a discretized realization of a function evaluated at possibly differing time points for each subject.
Functional regression models have been studied extensively.For a functional predictor and a scalar response, Cardot et al. (2003) proposed a principal components regression model.Goutis (1998) considered the functional regression model with the use of derivatives of functional predictors, and Rossi et al. (2005) described neural network models for functional data.Moreover, James (2002) and Müller and Stadtmüller (2005) extended the model to the functional version of the generalized linear model.In many studies, functional data have mainly been expressed by Fourier basis or splines, and cross-validation has been used to evaluate the model.On the other hand, Araki et al. (2007) proposed the use of Gaussian basis functions and AIC (Akaike, 1973(Akaike, , 1974) ) type criteria, constructing functional regression models with a scalar response and a functional predictor.
In practice there often occur cases in which measurements are taken on more than one response and one predictor.For example, near-infrared absorbance spectra of meat samples observed discretely at different wavelengths are associated with fat content, which is related to other contents, such as moisture and protein.
In the present paper, we propose functional multivariate regression modeling that directly models the relationship between multiple scalar responses and functional predictors using Gaussian basis functions along with the technique of regularization.The model considers all of the responses together, rather than a sequence of separate regression modeling of each response individually.Advantages of the proposed modeling strategy are that the proposed strategy provides a flexible instrument for transforming discrete observations into a functional form and can be applied to analyze a set of surface fitting data.
A crucial issue is the choice of tuning parameters involved in the modeling procedure, because regularization plays an essential role in constructing functional multivariate regression models.Although in previous studies cross-validation is mainly used for model selection, it may often be computationally expensive and produce unstable estimates.We derive model selection criteria based on an information-theoretic and Bayesian approach for evaluating models estimated by the method of regularization in the context of functional multivariate regression modeling.To investigate the effectiveness of the proposed modeling strategy we conducted Monte Carlo simulation, in comparison with the results obtained by ordinary regression methods.We observe that the proposed method performs well, especially in terms of flexibility and stable estimation.In addition, we apply the proposed modeling strategy to spectrometric data, in comparison with the method used in the chemometrics field (Frank and Friedman, 1993).
The remainder of the present paper is organized as follows.In Section 2 we introduce a multivariate regression model with multiple scalar responses and functional predictors.In Section 3, we estimate the proposed model using the regularized maximum likelihood method.For the selection of regularization parameters involved in the estimation, we use some model selection criteria derived from Bayesian inference, described in Section 4. Finally, we analyze meat sample spectra data, verifying the effectiveness of the proposed model in Section 5. Concluding remarks and future research are described in Section 6.
Functional predictors x αm (t) are expressed as where w αm = (w αm1 , . . ., w αmpm ) are vectors estimated by the method described in Appendix A. Note that φ mh (t) may differ among m = 1, . . ., M .Furthermore, we assume that β mk (t) are represented by linear combinations of q m basis functions {ψ m1 (t), ..., ψ mqm (t)}, that is, where ψ m (t) = (ψ m1 (t), ..., ψ mqm (t)) and b * mk = (b * mk1 , . . ., b * mkqm ) are vectors of coefficient parameters.Here, we assume that φ m (t) and ψ m (t) are Gaussian basis functions given by respectively, where c φ mj and c ψ ml are centers, s φ mj and s ψ ml are dispersion parameters of each basis functions and ν φm and ν ψm are hyperparameters (Ando et al., 2008).Numbers of basis functions q m and values of the hyperparameters ν ψm involved in ψ(t) are determined using the model selection criteria described in Section 4. Substituting (2.2) and (2.3) into (2.1),we have where dt are p m ×q m cross-product matrices.When using Gaussian basis functions given in (2.4) as basis functions, the (j, l)-elements of J (m) φψ are calculated directly as follows: Again, (2.5) can be rewritten as where Y is a n × K response matrix, Z and B are n × ( ∑ M m=1 q m + 1) and ( ∑ M m=1 q m + 1) × K matrices defined as respectively, and E is an n×K error matrix.Therefore, the problem of estimating β 0k and β mk (t) in (2.1) is replaced by the problem of estimating the parameter matrix B. An advantage with respect to the use of Gaussian bases is that the parameter matrix B can be easily estimated, as will be shown in the next section, because the integral of a product of any two Gaussian basis functions involved in equation (2.7) can be directly calculated.

Estimation
The probability density functions of y α = (y α1 , . . ., y αK ) are given by where x α (t) = (x α1 (t), . . ., x αM (t)) .Thus, the log-likelihood function is The maximum likelihood estimators (MLEs) of the parameter matrix B and the variance covariance matrix Σ are, respectively, given by Note that the maximum likelihood estimates often cause unstable or degenerate estimates.To overcome these problems, we estimate them by the regularized maximum likelihood method that maximizes the regularized log-likelihood function given by subtracting a penalty related to a fluctuation of B from the log-likelihood function, that is, where Λ is a ( Here λ mk (m = 1, . . ., M, k = 1, . . ., K) are regularization parameters that adjust the amount of the penalty.In addition, represents the Hadamard product, i.e., ( with Ω m (m = 1, . . ., M ) being q m × q m positive semi-definite matrices.A typical form for the Ω m is Ω m = D s D s where D s is an (q m + 1 − s) × (q m + 1) matrix that represents the difference operator, and we use this form in examples given in Section 5.
The regularized maximum likelihood estimators (RMLEs) of B and Σ are, respectively, given by vec where vec(•) is an operator that transforms the column-wise vectors of a matrix into a vector, i.e., vec(A) m(j−1)+i = (A) ij for an arbitrary matrix A consisting of m rows, and ⊗ represents the Kronecker product, that is, sub-matrices of semi-definite matrix including regularization parameters represented by where ∑ M m=1 q m + 1) matrices.From (3.2), the RMLEs of this model differ from those of the single-response model, if ε αk are correlated among variables.

Model Selection Criteria
(1) Generalized BIC Schwarz (1978) proposed the Bayesian information criterion (BIC), from the Bayesian viewpoint, based on the idea of maximizing the posterior probability of candidate models.However, the BIC only covers models estimated by the maximum likelihood method.Konishi et al. (2004) extended the BIC for evaluating models fitted by the regularized maximum likelihood method and introduced GBIC.We derive the GBIC type of criterion for evaluating the functional multivariate regression model fitted by the regularized maximum likelihood method.The result is given by where γ = η − rank(Ω 0 ), η = ∑ M m=1 q m + 1, r = K(K + 1)/2, |A| + represents the product of nonnegative eigenvalues of a square matrix A, θ is a (Kη + r)dimensional vector that consists of elements of B and Σ, and R Λ (θ) is given by We choose the regularization parameters that minimize GBIC F M and consider the corresponding model to be the best approximating model.The details of the derivation are given in Appendix B.
(2) Generalized Information Criterion Imoto and Konishi (2003) derived an information criterion evaluating the model estimated by the regularization method, using the result of Konishi and Kitagawa (1996).We derive an information criterion evaluating the functional multivariate regression model, which is given by (3) Modified AIC Akaike (1973Akaike ( , 1974) ) derived an information criterion called AIC for evaluating the model fitted by the maximum likelihood method.Hastie and Tibshirani (1990) used the trace of the hat matrix as an approximation of the number of parameters, because it is regarded as the complexity of the model estimated by the regularization method.
The estimated value Ŷ is expressed as where ) is a hat matrix.We used the trace of S Λ as the effective number of parameters, obtaining the following criterion.
(4) Generalized cross-validation Generalized cross-validation (Craven and Wahba, 1979) for the functional multivariate regression model is given in the form

Numerical Results and Practical Example
We conducted Monte Carlo simulations to examine the effectiveness of the proposed procedures.We also applied the proposed modeling strategy to the analysis of spectrometric data.
(1) Monte Carlo simulations We simulated n sets of a functional predictor and double responses {(x α (t), y α ); α = 1, . . ., n, t ∈ [0, 1]}, where y α = (y α1 , y α2 ) , , and functional predictor x αi is assumed to be obtained by here r xα = max i (x αi ) − min i (x αi ).We assumed that u α (t) and β k (t) are given by Examples of true functions for each case are shown in Figure 1.First, we transfer the discrete data {x αi ; α = 1, . . ., n, i = 1 . . ., N } to a functional data set {x α (t); α = 1, . . ., n}.The nonlinear function u α (t) expressed as a linear combination of Gaussian basis functions are estimated in two-stage procedure; position the centers and determine the dispersions first by k-means clustering algorithm, then estimate the weights using the regularization method.The tuning parameters contained in the model are selected by using the Bayesian model selection criterion GBIC, due to Konishi et al. (2004).In smoothing functional data, all individual data should be fitted by using the common basis functions.In other words, numbers of basis functions are fixed, even though the amount of smoothness imposed on a set of discrete data will differ from each other.The regularization method allows us to adjust individual differences by smoothing parameters.Next, we fitted the functional multivariate regression models to the obtained functional predictor and responses {(x α (t), y α ); α = 1, . . ., n, t ∈ [0, 1]}.
We used four criteria described in the previous section for evaluating the estimated models and examined their effectiveness.We fitted the functional univariate regression models separately to {(x α (t), y α1 ); α = 1, . . ., n, t ∈ [0, 1]} and {(x α (t), y α2 ); α = 1, . . ., n, t ∈ [0, 1]} by using GBIC to evaluate functional univariate regression models.Fourier series are also compared as basis functions.For simplicity, the basis functions of predictors φ m (t) and of coefficient functions ψ m (t) are restricted to be common in this simulation study.
In the tables, we use the notation AMSE k = ∑ α (g k (x α )− ŷαk ) 2 /n, mean(λ k ), and SD(λ k ) (k = 1, 2) to denote the average squared errors, the means, and the standard deviations of the regularization parameters, respectively, where ŷαk denote the fitted values of y αk .It may be seen from the tables that for almost all cases the GBIC F M gives the lowest AMSEs and the most stable models in four model selection criteria.Furthermore, in all cases, the functional multivariate regression model minimized AMSE k more than functional univariate regression models.Moreover, the AMSEs of the functional multivariate regression model based on Gaussian basis functions are lower than those based on Fourier series.(2) Tecator spectra benchmark We applied the proposed modeling strategy to the analysis of spectrometric data2 .Near-infrared absorbance spectra of a meat sample are observed at equal intervals with 100 channels from the wavelength range of from 850 nm to 1,050 nm (Figure 2(a)).The spectra are associated with contents of the meat sample, such as moisture, fat, and protein.In this example, we predict the contents of the meat sample using these spectra.Rossi et al. (2005) approached this problem using a B-spline approximation and modeling based on Neural-Networks to predict the fat content.Here, we predict all three contents simultaneously using the proposed model, treating the spectra as functional data.From n = 215 observed samples, 172 samples were used as a training set for model estimatinon ad the remaining 43 samples were used as a test set.In this experiment, we performed two preprocessings.First, the spectra were standardized, that is, for each α = 1, . . ., n we converted the observed data x αi into , where xα = ∑ i x αi /N and N = 100.The standardized data are shown in Figure 2(b).Second, functionalization based on Gaussian basis functions is performed for each spectrum α = 1, . . ., n to obtain a functional data set (Figure 2(c)).The number of basis functions and the value of the hyperparameter are selected as p = 30 and ν = 30 respectively, by using the GBIC.
In order to investigate the effectiveness of the proposed model, we prepared three different models and compared the results obtained using these models.The three models are as follows: (a) The functional multivariate regression model (proposed model) was fitted by the regularization method.The regularization parameters were selected by GBIC F M described in the previous section.Since there are 5 tuning parameters containing three regularization parameters, the number of basis functions and a hyperparameter, we employed the simulated annealing method to determine these values.
(b) The functional univariate regression model was fitted such that correlation among the contents was not taken into consideration.The regularization parameters are selected using GBIC (Konishi et al., 2004).
(c) The multivariate linear regression model was fitted by treating the spectra not as a functional but rather as conventional discrete data.That is, where X = (x (s) αi ) αi .A coefficient matrix B and a variance covariance matrix Σ are estimated by the regularization method.This modeling strategy corresponds to the ridge regression described by Frank and Friedman (1993).
The MSEs of each method are shown in Table 5.As a result of test set, the proposed model minimizes the MSE of fat content.In addition, rather than the multivariate regression model for the discrete data, the multivariate regression model for functional data minimized the MSE.
Figure 3 shows the coefficient functions of each content.This figure shows that a positive weight is placed on the moisture and protein contents, and a negative weight is placed on the fat content in 60 to 80 channels.This reveals that the spectra absorbance in this interval strongly contributes to the contents.

Conclusion
We constructed multivariate regression models with multiple scalar responses and functional predictors based on Gaussian basis functions, considering the relationship among responses.We estimated the model by the regularized maximum likelihood method.We also derived model selection criteria based on an information-theoretic and Bayesian approach to evaluate the model estimated by the regularization method.The proposed method is applied to a benchmark in spectroscopy, comparing the results of prediction with those of functional univariate regression models and ordinary multivariate linear regression models.Consequently, the proposed model minimized the prediction error of fat content the most.Based on the results of the proposed model, it is effective to take the relationship of three contents into consideration.Future research should consider the construction of functional regression modeling strategies in which the response and predictive variables are surfaces.functional form requires efficient smoothing techniques.We use Gaussian basis function expansion along with the technique of regularization.
The parameters c j and s 2 j are first determined by k-means clustering algorithms.The observational points {t α1 , . . ., t αNα } are divided into p clusters {C 1 , . . ., C p }, and then c j and s 2 j are determined by respectively, where N j = #{t αi ∈ C j }.Thus, the regression model (A.1) has a probability density function The parameters w α and σ 2 xα are then estimated using the regularization method, which maximizes the following regularized log-likelihood function: where ζ α are regularization parameters that adjust the smoothness of the estimated function, and Ω is a p × p positive semi-definite matrix.The regularized maximum likelihood estimators ŵα and σ2 xα are given by ŵα respectively, where Φ α = (φ(t α1 ), . . ., φ(t αNα )) and x α = (x α1 , . . ., x αNα ) .
The regularized maximum likelihood estimates based on the Gaussian basis functions depend on the regularization parameters ζ α , the number of basis functions p, and the hyperparameter ν in the Gaussian basis functions.For the choice of these parameters, Konishi and Kitagawa (2008) summarize the use of some model selection criteria.The result of these criteria gives appropriate values ζ α , p, and ν, and leads to appropriate estimates ûα (t).Therefore, we obtain functional data as follows: x α (t) = ûα (t) = ŵ α φ(t).

Figure 1 :
Figure 1: Thirty examples of true functions for Case a (left) and Case b (right)

Figure 3 :
Figure 3: Coefficient functions of each content.Right: Moisture, Center: Fat, and Left: Protein.

Table 1 :
Simulation results for model (a) when n = 50GBIC F M GIC F M mAIC F M GCV F M Univariate Fourier

Table 2 :
Simulation results for model (a) when n = 100GBIC F M GIC F M mAIC F M GCV F M Univariate Fourier

Table 3 :
Simulation results for model (b) when n = 50GBIC F M GIC F M mAIC F M GCV F M Univariate Fourier c = 0.05, AMSE 1 × 10 2

Table 4 :
Simulation results for model (b) when n = 100GBIC F M GIC F M mAIC F M GCV F M Univariate Fourier

Table 5 :
MSE on training set and test set