Partial Least Squares Analysis in Electrical Brain Activity

Abstract: Partial least squares (PLS) method has been designed for handling two common problems in the data that are encountered in most of the applied sciences including the neuroimaging data: 1) Collinearity problem among explanatory variables (X) or among dependent variables (Y); 2) Small number of observations with large number of explanatory variables. The idea behind this method is to explain as much as possible covariance between two blocks of X and Y variables by a small number of uncorrelated variables. Apart from the other applied sciences in which PLS are used, in the application of imaging data PLS has been used to identify task dependent changes in activity, changes in the relations between brain and behavior, and to examine functional connectivity of one or more brain regions. The aim of this paper is to give some information about PLS and apply on electroencephalography (EEG) data to identify stimulation dependent changes in EEG activity.


Introduction
In the applied studies, researchers can encounter with two common problems which cause some problems in statistical modeling: 1) Having many variables but not many observations 2) The collinearity among the explanatory or dependent variables.In order to use traditional statistical methods such as multiple linear regression analysis, we must employ data reduction techniques on these kinds of data.Partial least squares (PLS) method is one of these techniques.The idea behind PLS is to explain as much as possible covariance between two blocks of explanatory (X) and dependent (Y) variables by a small number of uncorrelated variables known as "components" or "latent (intuitive or hidden) vectors".The pioneering work of PLS was largely done by Herman Wold who gives the end of 1977 as the birth date of PLS.This analysis has received a great amount of attention in the field of chemometrics and in other scientific areas including education, psychology, management sciences, economics, environmental science and medicine.PLS was introduced to the neuroimaging community by McIntosh et al. (1996).There are different applications of PLS to the neuroimaging data such as the studies of Gurrera et al. (2001), Düzel et al. (2003), Martínez-Montes et al. (2004), Lehmann et al. (2006) and O'Toole et al. (2007) among others.Basar et al. (2006) analyzed stimulation dependent changes in EEG data at alpha, delta, theta, gamma and beta frequency bands by means of univariate methods.In this study, the aim is to show that as a multivariate method PLS is a very useful tool for analyzing stimulations dependent changes in EEG data which have the collinearity problem as illustrated with Figure 4.1 in Section 4. The results obtained in Section 4 for EEG data at delta frequency band support the results obtained by Basar et al. (2006).
Organization of the paper is as follows: After giving brief description of PLS in the following section, we will describe two validation methods used for PLS analysis in Section 3, and finally we will give the results for the EEG data in Section 4. for i = 1, 2, . . ., k, respectively.The i-th latent vectors for the X and Y blocks equal to t i = ∑ N j=1 x j w ji and u i = ∑ M s=1 y s c si for i = 1, 2, . . ., k, respectively.PLS creates these orthogonal latent vectors using different algorithms among which Nonlinear Iterative Partial Least Squares (NIPALS) algorithm is the mostly used.As mentioned by Wold et al. (2001), this algorithm is the one led to the acronym PLS with the way used for estimating the weight vectors w i and c i .In NIPALS, PLS weights are iteratively estimated as

Partial Least
where w i and c i are the least squares estimation for the slopes of simple regression of X on u i and Y on t i at i-th iteration for i = 1, 2, . . ., k, respectively.We should mention that X and Y are deflated at the end of each iteration as X = X − t i p T i and Y = Y − t i c T i where p i = X T t i /t T i t i , and next iteration continues with these deflated matrices.The "partial" in PLS indicates that this is a partial regression since u i and t i are considered as fixed in the estimation (Wold et al., 2001).Apart from being the least squares estimates for the slopes, the way of calculating w i and c i suggest that each w ji for j = 1, 2, . . ., N is the weight of the variable x j on the i-th latent vector representing the Y block, and each c si for s = 1, 2, . . ., M is the weight of the variable y s on the i-th latent vector representing the X block.Using u i instead of t i in the calculation of w i and using t i instead of u i in the calculation of c i improve the inner relation between X and Y.
In this paper, we used The Singular Value Decomposition (SVD) of cov(X, Y), X T Y, which is another method used in PLS to extract the latent vectors.This method decomposes the covariance matrix X T Y into three parts as follows: where C and W are the M × k and N × k column-wise orthonormal matrices (w T i w i = 1 and c T i cw i = 1 for i = 1, 2, . . ., k and w T i w j = 0 for i = j) containing the right and left singular vectors, respectively.S is the diagonal matrix of k nonzero singular values, S 1 , S 2 , . . ., S k which are equal to the square root of the eigen values of X T YY T X or Y T XX T Y. Eigen vectors of X T YY T X and Y T XX T Y give the left and right singular vector matrices W and C; that is; W and C matrices obtained by Eq. (2.2) or Eq.(2.4) and Eq.(2.4) contain the weights of w i and c i which are mentioned before but this time they are obtained at once instead of one per iteration.Höskuldson (1988) was the first in reformulating the PLS as an eigenvalue/eigenvector problem (Lindgren and Ränar, 1998).In PLS, the aim is to find matrices T and U such that these two latent vector matrices have maximal covariance, T T U among all in X and Y space subject to the constraints length of each w i and c i will be one, that is: Höskuldson (1988), the W and C matrices estimated with NIPALS or SVD satisfy this aim.Höskuldson (1988) also suggested that T and U given with Eq. (2.1) are the matrices of eigen vectors of XX T YY T and YY T XX T .As mentioned by Lindgren and Rännar (1998), the advantage of the matrices X T YY T X, Y T XX T Y, XX T YY T and YY T XX T is their sizes.Since X T YY T X and Y T XX T Y are N × N and M × M matrices the size of these matrices will not depend on the how many observations there are in the original X and Y matrices while the sizes of the n x n matrices XX T YY T and YY T XX T will not depend upon the number of variables in the original X and Y matrices.Hence, matrices with either a large number of objects or a large number of variables can be summarized into small matrices, making computation easier and containing all information necessary for developing PLS model.
In the application of imaging data, PLS has been used to identify task dependent changes in activity, changes in the relations between brain and behavior, and to examine functional connectivity of one or more brain regions.In this study, we will use PLS to identify changes in EEG activity for "q" different stimulations.In the application, X will be the orthogonal linear contrasts matrix.Each stimulation will be applied on each of n person.Hence, Y will be the data matrix of nq × M .Since the means and standard deviations of the Y variables differ, we will center and scale these variables such that they will have zero mean and one standard deviation.Centering means subtracting the column averages and it corresponds to moving the coordinate system to the centre of the data.Scaling geometrically corresponds to changing the length of the coordinate axes.It is customary to standardize the data matrix, Y, so that each column has a variance one.We will employ SVD on covariance between orthogonal contrasts X and standardized Y.The contrasts are made for each subject so that X has nq rows and q − 1 columns.So the number of extracted latent vector, k, will be equal to q − 1.However, not all the latent vectors do have to be significant and not all the variables may have significant effect on these latent vectors.The brief information about two methods which are used to determine the significant latent vectors and significant variables are given in the following section.

Assessment of Significance
The decisions regarding the number of significant latent vectors and variables with statistically significant weights on these latent vectors are determined using permutation tests and bootstrap estimation of the standard errors for the weights of the original variables, respectively.The permutation test assesses whether the effect represented in a given latent vector is sufficiently strong, in a statistical sense, to be different from random noise (McIntosh and Lobaugh, 2004).In this test, we assess the magnitude of the singular values by asking the question: "With any other random set of data, how often is the value for S i , i = 1, 2, . . ., k as large as the one obtained originally?".To answer this question, subjects are randomly reassigned (without replacement) to different tasks (or stimulations in our case).It is accomplished by permuting tasks within each subject, then applying permutation for subjects across tasks.To permute conditions within each subject, a random vector containing the values between one and number of tasks is created for each subject and then the original values are reordered according to these vectors.For instance; lets assume there are two tasks and three subjects with following data matrix, Y, containing two variables y 1 and y 2 .To permute tasks within each subject we create a random vector containing the integers between one and two for each subject and then we reorder the original values for subject according to this vector.If we create the random vectors [1 2], [2 1] and [2 1] for the first, second and third subjects, respectively then we obtain reordered matrix given with Y 1 .Then, by creating the random vector of integers between one and three, we can reorder subjects across tasks.For instance; if we create a random vector of [3 2 1], then new reordered Y will be as given with Y 2 .We changed the places of first and third row for both tasks, that is; the values of the first subject changed with the values of the third subject.After reordering the data matrix, the PLS is recalculated for new sample, new SVD is computed and the new singular values is compared to the ones obtained from original sample.This process repeats as the number of permutations and the number of times the permuted singular values exceed the observed singular values are determined.Then, this number is divided by the total number of permuted samples to get the p-value for the permutation distribution, and if this probability is small (usually p-value < 0.01 for two-tailed distribution is the criterion used) then this latent vector is said to be significant and should retain.This method enables us determining the significance of a particular latent vector without relying on the distributional assumptions.The only requirement is the consistency with the null hypothesis of no significant difference between stimulations which provides us with the ability to exchange the data between rows.To identify task dependent changes, 500 permutations are generally sufficient, although probability estimates are typically stable at about 100 permutations.
To determine the significance of nonzero weights for the variables on the corresponding latent vector, bootstrap tests are used for calculating the standard errors of these weights.The name bootstrap originates from the expression "pulling yourself up by your own bootstraps" and refers to the basic idea of the bootstrap, sampling with replacement from the data (Wehrens and Van Der Linden, 1997).As mentioned by Boos (2003), the logic behind this method is to create the data as closely as possible to the original data and replace the unknown aspects of the statistical model with bootstrap sample estimates.In the bootstrap sampling procedure, the unknown probability distribution F of the independent and identically distributed variables y 1 , y 2 , . . ., y M is replaced by the empirical distribution F , i.e. by the probability function 1/n.Therefore, as mentioned by Efron and Tibshirani (1986) a bootstrap sample turns out to be the same as a random sample of size n drawn with replacement from the original sample.More detailed information about the bootstrap sampling can be found from Efron (1981), Efron and Tibshirani (1986), Wehrens andVan Der Linden (1997), andBoos (2003).In the case of identifying task dependent changes, bootstrap sampling includes the sampling with replacement from the original sample by keeping the assignment of conditions fixed for all observations.From these bootstrap samples, the standard errors of the variable weights, c si for s = 1, 2, . . ., M ; i = 1, 2, . . ., k, are estimated as follows: where csi = ∑ B b=1 c sib /B and B is the number of bootstrap samples.The estimates of the standard errors are usually stable after 100 resamplings.As mentioned by McIntosh and Lobaugh (2004), a weight whose value depends greatly on which observations are in the sample is less precise than one that remains stable regardless of the sample chosen.The absolute value of the ratio of the weight to its standard error can be used to determine if the variable's weight depends on the sample chosen, i.e. if this variable is significant or not .If this ratio exceeds 2.57 which has an approximate two tailed probability of 0.01 assuming the standard normal distribution then this variable is said to be significant for the corresponding latent vector.

Application
In this study, PLS has been applied on the EEG data taken from 20 healthy subjects to determine if three different stimulations significantly differ and which part of the brain is mostly related to this differentiation.For this analysis, the PLS algorithm written in Matlab (Mathworks Inc.) 1 was used.The stimulations comprise of two-complex and one-simple stimulations.Complex stimulations were the pictures of an anonymous elder lady and the picture of a known face while simple stimulation was the light.The illuminations and the other physical attributes were standardized among (app.30 cd/m 2 ) the stimuli which were presented with a duration of 1000 ms (milliseconds).The interstimulus interval varied between 3.5-6 ms and the analysis was based on the post stimulus 0 to 500 ms period.The subjects were mostly composed of medical students, who volunteered to participate in the study, after filling the consent form and receiving the ethical approval.The recordings were performed in an isolated room located in the Biophysics Dept. of Dokuz Eylul University.EEG records were taken from F z , F 3 , F 4 , C z , C 3 , C 4 , T 3 , T 4 , T 5 , T 6 , P 3 , P 4 , O 1 and O 2 locations using the EEG-CAP.The electrophysiological and psychophysiological discussion of the results will not be a concern of this preliminary analysis. where X 23 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.
Each of the squared singular values divided by the sum of the squared singular values indicates the proportion of the total sum of squares accounted for from X T Y.So first latent vector explains the 88.8 % of X T Y while second vector explains 11.2% of X T Y which shows that second latent vector may not be significant.However, we need more evidence to conclude that second latent vector is not significant.Hence, 500 permutations have been performed for the statistical significance of the latent vectors.The probabilities obtained from permutation test are 0.0000 and 0.35130 which show only the first latent vector representing the contrast between light stimulation and two-complex stimulations is significant at 0.01 significance level.The standard errors of the weights of the dependent variables are calculated through 100 bootstrap samples to determine the significance of variable y s for s = 1, 2, . . ., 14 on the first latent vector.These values are given with Table 1.The variables with |c s1 orig /SE(c se )| > 2.57 for s = 1, 2, . . ., 14 are given in bold.The results reveal that P 3 , P 4 , O 1 , O 2 , T 5 , and T 6 are the electrodes that differ most, between one-simple and two-complex stimulations in the delta frequency band.These results support the result obtained by Basar et al. (2006).
As to conclude, this preliminary analysis has provided a basic approach that can be suggested to be applied to EEG data.The results and this application can further be extended to other attributes of electrophysiological data, such as the other frequency bands, spatial relationships and correlation with possible behavioral data.
Squares T = XW and U = YC (2.1) where the T = [t 1 , t 2 , . . ., t k ] and U = [u 1 , u 2 , . . ., u k ] are the n × k matrices of the k extracted latent vectors such that t i ⊥ t j for i = j and u i ⊥ u j for j > i. T summarizes the X variables for every object while U summarizes the Y variables for every object.The number of extracted latent vectors may be determined from the analysis or given in advance.The N × k matrix W = [w 1 , w 2 , . . ., w k ] and M × k matrix C = [c 1 , c 2 , . . ., c k ] represent the matrices of weights with N × 1 and M × 1 column vectors w i = [w 1i , w 2i , . . ., w N i ] T and c i = [c 1i , c 2i , . . ., c M i ] T

]
First three rows of Y correspond to the observation values of three subjects for the first task and rest of the rows correspond to the second task.First column represents the first variable y 1 and second column represents the y 2 .
illustrate the collinearity problem among the Y variables.

Figure 1 :
Figure 1: Matrix plot for the correlations of F 3 , F 4 , P 3 , P 4 , T 3 , T 4 , C z , O 1 , O 2 , C 3 , C 4 , T 5 , T 6 and F z electrodes Hence, C orig containing the original weight vectors c 1 orig and c 2 orig are obtained by C orig = CS where S is the diagonal matrix of singular values

Table 1 :
Standard errors of the weights on the first latent vector