Application of Skew-normal in Classification of Satellite Image

The aim of this paper is to investigate the flexibility of the skewnormal distribution to classify the pixels of a remotely sensed satellite image. In the most of remote sensing packages, for example ENVI and ERDAS, it is assumed that populations are distributed as a multivariate normal. Then linear discriminant function (LDF) or quadratic discriminant function (QDF) is used to classify the pixels, when the covariance matrix of populations are assumed equal or unequal, respectively. However, the data was obtained from the satellite or airplane images suffer from non-normality. In this case, skew-normal discriminant function (SDF) is one of techniques to obtain more accurate image. In this study, we compare the SDF with LDF and QDF using simulation for different scenarios. The results show that ignoring the skewness of the data increases the misclassification probability and consequently we get wrong image. An application is provided to identify the effect of wrong assumptions on the image accuracy.


Introduction
Discriminant (or classification) analysis is one of major multivariate method which is widely used in remote sensing, biostatistics, econometrics, and many other areas (Abbey and Eckstein, 2002;Salovaara et al. 2005;Dixon and Brereton, 2008;De la Cruz, 2008).
In many applications, the populations distribution are assumed to have multivariate normal distribution and either LDF or QDF is used, when the covariance matrix of populations are assumed equal or unequal, respectively.However the normality assumption of the data in many situations does not agree with the reality.Several authors investigated the performance of the LDF and QDF under non-normality assumptions.They identify that the LDF and QDF were greatly affected by non-normality of the populations (Lachenbruch and Sneeringer, 1973;Nakanishi and Sato, 1985;Koutras, 1987).
The most common approach adopted to solve this disagreement is transformation of the variable.However, there are many problems with the choice of the transformation especially for multivariate data (Azzalini and Capitanio, 1999).One technique towards improving the model is to apply a more flexible multivariate distribution in order to create a logical extension to the original multivariate normal distribution.Azzalini and Capitanio (1999) firstly introduced the skewnormal discriminant function (SDF) for only two populations with equal skew parameters and variance covariance matrices, and unequal means.
Discriminant analysis is widely used in the classification of satellite image analysis.In classification of satellite or airplane images in the remote sensing, the images are taken from different channels (bands).The process is known as remote sensing since the recording of objects is done at a distance, forming the image by gathering, focusing and recording reflected light from the sun, or reflected radio waves emitted by spacecraft.A channel is a slice of wavelengths from the electromagnetic spectrum, measured by the instrument onboard the satellite.This data is usually skew in some of the components.Some authors investigated the effect of asymmetric distributions on the pixels classification (Lachenbruch 1973;Ince, 1987;Kershaw, 1987;Ripley, 1996).However, the assumption of normality played an important role in the classification of the pixels in the most of remote sensing packages.These packages, for example ENVI and ERDAS, are using only LDF or QDF methods to classify the pixels.Azzalini and Dalla Valle (1996) introduced the multivariate skew-normal distribution which is the logical extension of the class of the multivariate normal.
The SDF for two populations, when the populations have equal and known skew parameters and covariance matrices but unequal means, was introduced by Azzalini and Capitanio (1999).They notified that the LDF and SDF should be compared numerically.In this paper, we investigated the SDF for two populations when the populations have unequal skew parameter and covariance matrices.In addition, the misclassification rate for multivariate normal and multivariate skew-normal is compared in different cases.Finally, we used the real data to compare the misclassification rate of discriminant functions.Section 2 includes the description of multivariate skew-normal and SDF.In section 3 numerical results are presented.Application is presented in section 4, and in section 5 concluding remarks are offered.

Multivariate skew-normal distribution
A k-dimensional random variable Z is said to have the multivariate skewnormal distribution if it is continuous with density function where φ k (z; Ω z ) is the k-dimensional normal density with zero mean and covariance matrix Ω z , Φ(•) is the standard normal distribution function, and α is a k-dimensional vector.For simplicity, Ω z is assumed to be full rank.The notation Z ∼ SN k (Ω, α) is used to denote this distribution, which will be reduced to N (0, Ω z ) density where α = 0.The mean vector and the covariance matrix are Now, we introduce location and scale parameters, which have been omitted in the expression (2.1) of the density Z. Then we have are location and scale parameters, respectively; the components of ω are assumed to be positive.The density of Y is where Ω = ωΩ z ω T is a covariance matrix.This distribution is denoted by SN (ζ, Ω, α).

Skew-normal discrimination function
2 denote the random variables associated to the two populations.The likelihood-based (Bayes) discrimination rule allocates a new unit with observed vector y to population one if where and ξ 0 (x) = ln{2Φ(x)}.Nonlinearity of the left-hand side of the above inequality prevents explicit solution.However, the likelihood-based discriminant function is a linear function of when either of the following conditions holds: where c is non-zero scalar constant, Azzalini and Capitanio (1999).
It can be shown that discrimination function for two multivariate skew-normal populations with location parameters ζ 1 and ζ 2 , covariance matrices Ω 1 and Ω 2 , and skew vectors α 1 and α 2 is: where . Therefore, the observed vector y is allocated to population one if 1 2 ln

Simulation Study
A simulation study is conducted in order to compare the LDF or QDF with corresponding types of the SDF.We considered 8 different situations to compare the misclassification probability of classification methods in the various scenarios.The simulation procedure is done using R package.
Comparing LDF with the corresponding type of SDF, the various cases are considered different for the relative positions of the skew parameters and covariance matrices.However, the maximum likelihood estimators were replaced with unknown parameters in the discriminant functions.Similarly, in comparing QDF with the corresponding type of SDF, we considered unequal covariance matrices and means.However, the skew parameter of two populations could be equal.It should be notified that we replaced the unknown parameters of the discriminant functions by the maximum likelihood estimators (MLE).
For the evaluation of the discriminant functions in each case, we simulated randomly two training samples of size n 1 = n 2 = 1000 from the k-dimensional multivariate skew-normal (k = 2, 3, 4, 5) and built the corresponding discriminant functions with these training samples.Now, two another random samples of size 500 with the same parameters of the first step are generated as test samples; the test samples individuals are classified and the probability of misclassification is recorded.The procedure is repeated 100,000 times.Finally, the mean of misclassification probabilities for the LDF or QDF and the corresponding type of SDF are calculated.
Table 1: Comparison of the LDF and the corresponding type of SDF (e 1 and e 2 are the mean of misclassification probability for the LDF and the SDF, respectively).

Known Parameters Unknown Parameters
Dim. Tables 1 and 2 contain summary values of the numerical work, in particular misclassification probabilities.The main conclusions from these tables are as follows: 1.The mean misclassification probability of the LDF or QDF is higher than that for the corresponding type of SDF.
2. The mean of misclassification probabilities increases when the dimension of multivariate skew-normal becomes higher in all situations.
3. The mean of misclassification probabilities decreased when the parameters of dicriminant functions are unknown and we replaced them by the MLEs.
In general, QDF or LDF performs more misclassification probability when compared with corresponding type of SDF.
Table 2: Comparison of the QDF and the corresponding type of SDF (e 1 and e 2 are the mean of misclassification probability for the QDF and the SDF, respectively).

Known Parameters
Unknown Parameters

Application
For numerical illustration, we have applied the discrimination functions to classify the pixels of a three-channel satellite image of one area of the Shadegan wetland.The Shadegan wetland is in the south-west of Iran at the head of the Persian Gulf.It is the largest wetland of Iran covering about 400,000 hectares.The wetland plays a significant hydrological and ecological role in the natural functioning of the northern Gulf.The image was obtained by Landsat ETM+ satellite.The data is used to compare the pixels classification by the two discrimination functions: LDF and the corresponding type of SDF.
In image classification, the input image should be a digital data for digitally processing the remotely sensed data.Digital data can be obtained from instruments that calibrated onto the satellite or airplane by recording the reflected or emitted radiation from individual patches of ground, know as pixels.Digital data is composed of these pixels which are recorded digitally by numeric values.These values are popularly known as digital numbers (DN) or brightness values and these values do not represent the true radiometric values and because of the radiometric distortions.The digital data can be expressed as L = y(i, j) where y(i, j) = (y 1 (i, j), y 2 (i, j), • • • , y d (i, j)) is a vector representing the features of the pixel with a location (i, j).Here, y 1 (i, j), y 2 (i, j), . . ., y d (i, j) represents the features describing the object which may be spectral reflectance or emittance values form optical or infrared imagery, radar backscatter values, secondary measurements derived from the image, or geographical features such as terrain elevation, slope and aspect.This set of gray-scale values for a single pixel is known as a pattern.Thus, a pattern is a set of measurements on the chosen features for the individual that is to be classified.We identified four classes of homogeneous regions by a topographic map or other knowledge about the region as described in Table 3.The digital number (DN) matrices for three channels: red, green and blue are obtained using ERDAS software.The data are classified by LDF, using ERDAS, and corresponding type of SDF, using R.
The highest classification producer's and user's accuracy were obtained when SDF was used (Tables 4 and 5), where producer's accuracy is the percentage of the sampling units predicted to belong to the correct class, and user's accuracy is the percentage of the sampling units predicted to belong to a particular class that actually belongs to that class.In fact, the overall classification accuracy for the four-class classification for LDF and SDF were 78.06 and 89.03, respectively.
The results of LDF show low classification accuracy for class C 2 : producer's accuracy was 68% and user's accuracy 69.39% (Table 4).However, class C 1 showed the highest producer's and user's accuracy values (81.67% and 85.96%, respectively).When using SDF , class C 2 again showed the lowest producer's and user's accuracy (both 80%).On the other hand, class C 1 showed the highest producer's accuracy (93.33%), but class C 3 showed the highest user's accuracy (91.06%).Therefore, SDF achieved high accuracy for all classes, considering this fact that there are no generally accepted limits on how accurate a classification should be in order to qualify as reliable, but usually an overall accuracy exceeding 85% is considered reasonable, often with the additional criterion that the accuracy should not be lower than 70% for any class (Salovaara et al., 2005).Then the assumption of the skew-normal distribution increases the accuracy of the pixels classification and consequently it succeeded in improving the overall accuracy of the image.The misclassification probability (producer) for classes: C 1 , C 2 , C 3 and C 4 for the SDF are 0.067, 0.20, 0.1125 and 0.1167 respectively.However those probabilities for the LDF are 0.1833, 0.32, 0.20 and 0.2333 respectively.
We can see the effect of classification methods on the producing of the Shadegan wetland image from Figure 1.Consider the skewness of the data when using SDF to classify the data results in getting the more accurate image.

Conclusion
The skew-normal approach has been proposed in this paper in order to perform a pixels classification of the satellite image (see Figures 1).The approach has the advantage of being appropriate for classifying satellite image pixels when the data suffers from the non-normality.We demonstrate our approach with real data set and it is showed that the skew-normal models have better performance than that of the normal models.The results indicated that the skew-normal models proposed here are quite flexible and accurate.

Figure 1 :
Figure 1: The image which is produced by LDF (above) and Corresponding type of SDF (below).
2 denote the random variables associated with the two populations.The likelihood-based discrimination rule allocates a new unit with observed vector to population one if 2φ k

Table 3 :
Characters of the samples

Table 4 :
The number of individuals classified by LDF (Fisher discriminant function).The diagonal shows the number of correctly classified sample units for each class.

Table 5 :
The number of individuals classified by SDF.The diagonal shows the number of correctly classified sample units for each class.