SPA: Signflip Parallel Analysis to Optimize the Number of Principal Components in Two-dimensional PCA

Li, Zhaoyuan; Kuang, Yiling

doi:10.6339/24-JDS1158

Journal of Data Science

SPA: Signflip Parallel Analysis to Optimize the Number of Principal Components in Two-dimensional PCA

Zhaoyuan Li

Yiling Kuang

https://doi.org/10.6339/24-JDS1158

Pub. online: 22 November 2024 Type: Statistical Data Science

Open Access

Received
23 July 2024

Accepted
20 October 2024

Published
22 November 2024

Abstract

Yang et al. (2004) developed the two-dimensional principal component analysis (2DPCA) for image representation and recognition, widely used in different fields, including face recognition, biometrics recognition, cancer diagnosis, tumor classification, and others. 2DPCA has been proven to perform better and computationally more efficiently than traditional principal component analysis (PCA). However, some theoretical properties of 2DPCA are still unknown, including determining the number of principal components (PCs) in the training set, which is the critical step in applying 2DPCA. Without rigorous criteria for determining the number of PCs hampers the generalization of the application of 2DPCA. Given this issue, we propose a new method based on parallel analysis to determine the number of PCs in 2DPCA with statistical justification. Several image classification experiments demonstrate that the proposed method compares favourably to other state-of-the-art approaches regarding recognition accuracy and storage requirement, with a low computational cost.

Supplementary material

Supplementary Material

The supplementary material contains a zipped folder, which contains codes and three data sets for reproducing all results. Please go to https://figshare.com/s/824176b60a12b8ee0535.

References

Ahn SC, Horenstein AR (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3): 1203–1227. https://doi.org/10.3982/ECTA8968

Bai Z, Silverstein JW (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer, New York.

Buja A, Eyuboglu N (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27(4): 509–540. https://doi.org/10.1207/s15327906mbr2704_2

Cattell RB, Vogelmann S (1977). A comprehensive trial of the scree and KG criteria for determining the number of factors. Multivariate Behavioral Research, 12(3): 289–325. https://doi.org/10.1207/s15327906mbr1203_2

Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W, Faisal Nagi M (2019). Automated breast cancer diagnosis based on machine learning algorithms. Journal of Healthcare Engineering, 2019(1): 4253641.

Ejaz MS, Islam MR, Sifatullah M, Sarker A (2019). Implementation of principal component analysis on masked and non-masked face recognition. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 1–5. IEEE.

Georghiades AS, Belhumeur PN, Kriegman DJ (2001). From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6): 643–660. https://doi.org/10.1109/34.927464

Gumaei A, Hassan MM, Hassan MR, Alelaiwi A, Fortino G (2019). A hybrid feature extraction method with regularized extreme learning machine for brain tumor classification. IEEE Access, 7: 36266–36273. https://doi.org/10.1109/ACCESS.2019.2904145

Hair JF Jr, Anderson RE, Tatham RL (1986). Multivariate Data Analysis with Readings. Macmillan Publishing Co., Inc.

Hong D, Sheng Y, Dobriban E (2020). Selecting the number of components in PCA via random signflips. arXiv preprint arXiv:2012.02985.

Horn JL (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30: 179–185. https://doi.org/10.1007/BF02289447

Lam C, Yao Q (2012). Factor modeling for high-dimensional time series: inference for the number of factors. The Annals of Statistics, 694–726.

Onatski A (2010). Determining the number of factors from empirical distribution of eigenvalues. Review of Economics and Statistics, 92(4): 1004–1016. https://doi.org/10.1162/REST_a_00043

Owen AB, Wang J (2016). Bi-cross-validation for factor analysis. Statistical Science, 31(1): 119–139. https://doi.org/10.1214/15-STS539

Steven Eyobu O, Han DS (2018). Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network. Sensors, 18(9): 2892. https://doi.org/10.3390/s18092892

Turk MA, Pentland AP (1991). Face recognition using eigenfaces. In: Proceedings of 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 586–587. IEEE Computer Society.

Uddin MP, Mamun MA, Hossain MA (2021). PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Technical Review, 38(4): 377–396. https://doi.org/10.1080/02564602.2020.1740615

Wan S, Xia Y, Qi L, Yang YH, Atiquzzaman M (2020). Automated colorization of a grayscale image with seed points propagation. IEEE Transactions on Multimedia, 22(7): 1756–1768. https://doi.org/10.1109/TMM.2020.2976573

Wang H (2012). Factor profiled sure independence screening. Biometrika, 99(1): 15–28. https://doi.org/10.1093/biomet/asr074

Wang P, Li Z, Wei Z, Wu T, Luo C, Jiang W, et al. (2024). Space-time-coding digital metasurface element design based on state recognition and mapping methods with CNN-LSTM-DNN. IEEE Transactions on Antennas and Propagation, 72(6): 4962–4975. https://doi.org/10.1109/TAP.2024.3349778

Wang Q, Gao Q, Gao X, Nie F (2017). Optimal mean two-dimensional principal component analysis with F-norm minimization. Pattern Recognition, 68: 286–294. https://doi.org/10.1016/j.patcog.2017.03.026

Yang J, Zhang D, Frangi AF, Jy Y (2004). Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1): 131–137. https://doi.org/10.1109/TPAMI.2004.1261097

Yang W, Wang S, Hu J, Tao X, Li Y (2024). Feature extraction and learning approaches for cancellable biometrics: a survey. CAAI Transactions on Intelligence Technology, 9(1): 4–25. https://doi.org/10.1049/cit2.12283

Yilmaz A, Gokmen M (2000). Eigenhill vs. eigenface and eigenedge. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, volume 2, 827–830. IEEE.

Zabalza J, Ren J, Yang M, Zhang Y, Wang J, Marshall S, et al. (2014). Novel folded-PCA for improved feature extraction and data reduction with hyperspectral imaging and SAR in remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 93: 112–122. https://doi.org/10.1016/j.isprsjprs.2014.04.006

Zeng X, Wang X, Xie Y (2024). Multiple pseudo-siamese network with supervised contrast learning for medical multi-modal retrieval. ACM Transactions on Multimedia Computing Communications and Applications, 20(5): 1–23.

2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

2DPCA feature extraction image analysis

Funding

Zhaoyuan Li’s research is partially supported by National Natural Science Foundation of China (No. 11901492) and Shenzhen Science and Technology Program (ZDSYS 20211021111415025).

Metrics

since February 2021

179

Article info
views

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file