Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation

Zeng, Liyun; Zhang, Hao Helen

doi:10.6339/22-JDS1069

Journal of Data Science

Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation

Volume 21, Issue 4 (2023), pp. 658–680

Liyun Zeng Hao Helen Zhang

https://doi.org/10.6339/22-JDS1069

Pub. online: 3 November 2022 Type: Statistical Data Science

Open Access

Received
3 June 2022

Accepted
25 September 2022

Published
3 November 2022

Abstract

Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.

References

Alimoglu F, Alpaydin E (1997). Combining multiple representations and classifiers for pen-based handwritten digit recognition. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition (J Schürmann, ed.), volume 2, 637–640. Ulm, Germany.

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403(6769): 503–511.

Breiman L, Friedman J H, Olshen R A, Stone C J (1984). Classification and Regression Trees. Wadsworth Publishing Company, Belmont, California, USA.

Burges C (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2: 121–167.

Cairano SD, Brand M, Bortoff SA (2013). Projection-free parallel quadratic programming for linear model predictive control. International Journal of Control, 86(8): 1367–1385.

Chamasemani FF, Singh YP (2011). Multi-class support vector machine (SVM) classifiers – an application in hypothyroid detection and classification. In: Proceedings of the Sixth International Conference on Bio-Inspired Computing: Theories and Applications (R Abdullah, ed.), 351–356. Penang, Malaysia.

Chen T, Guestrin C (2016). XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (A Smola, C Aggarwal, D Shen, R Rastogi, eds.), In: KDD ’16, 785–794. ACM, New York, New York, USA.

Crammer K, Singer Y (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2: 265–292.

Cristianini N, Shawe-Taylor J (2000). An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge, UK.

Ding S, Zhao X, Zhang J, Zhang X, Xue Y (2019). A review on multi-class TWSVM. Artificial Intelligence Review, 52(2): 775–801.

Dua D, Graff C (2019). UCI machine learning repository. http://archive.ics.uci.edu/ml.

Dudoit S, Fridlyand J, Speed TP (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457): 77–87.

Guo C, Pleiss G, Sun Y, Weinberger KQ (2017). On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning (D Precup, YW Teh, eds.), volume 70, 1321–1330. Sydney, Australia.

Hastie T, Tibshirani R, Friedman J (2009). The Elements of Statistical Learning: Data mining, Inference and Prediction. Springer, New York, New York, USA. 2 edition.

Herbei R, Wegkamp MH (2006). Classification with reject option. Canadian Journal of Statistics, 34(4): 709–721.

Ho TK (1995). Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition (CY Suen, ed.), volume 1, 278–282. Montreal, Canada.

Horton P, Nakai K (1996). A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceeding of the Fourth International Conference on Intelligent Systems for Molecular Biology (DJ States, P Agarwal, T Gaasterland, L Hunter, RF Smith, eds.), 109–115. St. Louis, Missouri, USA.

Huang H, Liu Y, Du Y, Perou CM, Hayes DN, Todd MJ, et al. (2013). Multiclass distance-weighted discrimination. Journal of Computational and Graphical Statistics, 22(4): 953–969.

Islam R, Khan SA, Jm K (2016). Discriminant feature distribution analysis-based hybrid feature selection for online bearing fault diagnosis in induction motors. Journal of Sensors, 2016: 1–16.

Kallas M, Francis C, Kanaan L, Merheb D, Honeine P, Amoud H (2012). Multi-class SVM classification combined with kernel PCA feature extraction of ECG signals. In: Proceeding of the 19th International Conference on Telecommunications (H Abumarshoud, A Shojaeifard, H Aghvami, F Marvasti, eds.), 1–5. Jounieh, Lebanon.

Kimeldorf G, Wahba G (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33: 82–95.

Krawczyk B, Woźniak M, Cyganek B (2014). Clustering-based ensembles for one-class classification. Information Sciences, 264: 182–195.

Lee Y, Lin Y, Wahba G (2004). Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99: 67–81.

Lei Y, Dogan U, Binder A, Kloft M (2015). Multi-class SVMs: from tighter data-dependent generalization bounds to novel algorithms. In: Proceedings of the 28th International Conference on Neural Information Processing Systems (C Cortes, DD Lee, M Sugiyama, R Garnett, eds.), volume 2, 2035–2043. Montreal, Canada.

Lin Y (2002). Support vector machines and the bayes rule in classification. Data Mining and Knowledge Discovery, 6: 259–275.

Liu Y (2007). Fisher consistency of multicategory support vector machines. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (M Meila, X Shen, eds.), 291–298. San Juan, Puerto Rico.

Liu Y, Yuan M (2011). Reinforced multicategory support vector machine. Journal of Computational and Graphical Statistics, 20: 901–919.

McCullagh P, Nelder J (1989). Generalized Linear Models. Chapman and Hall, London, UK.

Mezzoudj F, Benyettou A (2012). On the optimization of multiclass support vector machines dedicated to speech recognition. In: Proceedings of the 19th International Conference on Neural Information Processing (T Huang, Z Zeng, C Li, CS Leung, eds.), volume 2, 1–8. Berlin, Germany.

Minderer M, Djolonga J, Romijnders R, Hubis F, Zhai X, Houlsby N, et al. (2021). Revisiting the calibration of modern neural networks. In: Proceedings of the 35th Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 15682–15694.

Rifkin R, Klautau A (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5: 101–141.

Saigal P, Khanna V (2020). Multi-category news classification using support vector machine based classifiers. SN Applied Sciences, 2(3): 458.

Tomar D, Agarwal S (2015). A comparison on multi-class classification methods based on least squares twin support vector machine. Knowledge-Based Systems, 81: 131–147.

Vapnik V (1998). Statistical Learning Theory. Wiley, New York, New York, USA.

Wahba G (1990). Spline Models for Observational Data CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, Pennsylvania, USA.

Wang J, Shen X (2006). Estimation of generalization error: random and fixed inputs. Statistica Sinica, 16(2): 569–588.

Wang J, Shen X, Liu Y (2008). Probability estimation for large margin classifiers. Biometrika, 95: 149–167.

Wang L, Shen X (2007). On ${L_{1}}$-norm multiclass support vector machines. Journal of the American Statistical Association, 102: 583–594.

Wang X, Zhang HH, Wu Y (2019). Multiclass probability estimation with support vector machines. Journal of Computational and Graphical Statistics, 28(3): 586–595.

Weston J, Watkins C (1999). Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks (M Gori, ed.), 21–23. Bruges, Belgium.

Wu Y, Zhang HH, Liu Y (2010). Robust model-free multiclass probability estimation. Journal of the American Statistical Association, 105: 424–436.

Ye Y, Tse E (1989). An extension of Karmarkar’s projective algorithm for convex quadratic programming. Mathematical Programming, 44(1–3): 157–179.

Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. (2002). Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1(2): 133–143.

Zhang C, Liu Y (2013). Multicategory large-margin unified machines. Journal of Machine Learning Research, 14: 1349–1386.

Zhu J, Hastie T (2005). Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics, 14: 185–205.

Zhu J, Rosset S, Hastie T, Tibshirani R (2003). 1-norm support vector machines. In: Proceedings of the 16th International Conference on Neural Information Processing Systems (S Thrun, LK Saul, B Schölkopf, eds.), 49–56. Whistler, Canada.

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

linear time algorithm multiclass classification non-parametric probability estimation scalability support vector machines

Metrics

since February 2021

1253

Article info
views

439

PDF
downloads

RSS

Authors

Abstract

References

Export citation

Copy and paste formatted citation

Download citation in file