Sparse Learning with Non-convex Penalty in Multi-classification

Li, Nan; Zhang, Hao Helen

doi:10.6339/20-JDS1000

Journal of Data Science

Sparse Learning with Non-convex Penalty in Multi-classification

Volume 19, Issue 1 (2021), pp. 56–74

Nan Li Hao Helen Zhang

https://doi.org/10.6339/20-JDS1000

Pub. online: 10 February 2021 Type: Statistical Data Science

Received
1 November 2020

Accepted
1 December 2020

Published
10 February 2021

Abstract

Multi-classification is commonly encountered in data science practice, and it has broad applications in many areas such as biology, medicine, and engineering. Variable selection in multiclass problems is much more challenging than in binary classification or regression problems. In addition to estimating multiple discriminant functions for separating different classes, we need to decide which variables are important for each individual discriminant function as well as for the whole set of functions. In this paper, we address the multi-classification variable selection problem by proposing a new form of penalty, supSCAD, which first groups all the coefficients of the same variable associated with all the discriminant functions altogether and then imposes the SCAD penalty on the supnorm of each group. We apply the new penalty to both soft and hard classification and develop two new procedures: the supSCAD multinomial logistic regression and the supSCAD multi-category support vector machine. Our theoretical results show that, with a proper choice of the tuning parameter, the supSCAD multinomial logistic regression can identify the underlying sparse model consistently and enjoys oracle properties even when the dimension of predictors goes to infinity. Based on the local linear and quadratic approximation to the non-concave SCAD and nonlinear multinomial log-likelihood function, we show that the new procedures can be implemented efficiently by solving a series of linear or quadratic programming problems. Performance of the new methods is illustrated by simulation studies and real data analysis of the Small Round Blue Cell Tumors and the Semeion Handwritten Digit data sets.

Supplementary material

Supplementary Material

A zip file includes all the computation code and data for the numerical experiments is available.

References

Bradley PS, Mangasarian OL (1998). Feature selection via concave minimization and support vector machines. In: ICML, volume 98, 82–90.

Breheny P, Huang J (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2(3): 369.

Crammer K, Singer Y (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2: 265–292.

Dudoit S, Fridlyand J, Speed TP (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457): 77–87.

Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456): 1348–1360.

Fan J, Peng H, et al. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3): 928–961.

Hastie T, Tibshirani R, Friedman J (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.

Holmström K, Göran AO, Edvall MM (2010). User’s Guide for Tomlab 7.

Huang J, Breheny P, Ma S (2012). A selective review of group selection in high-dimensional models. Statistical Science, 27(4): 481–499.

Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, et al. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6): 673–679.

Lange K, Hunter DR, Yang I (2000). Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9(1): 1–20.

Le Thi Hoai A, Tao PD (1997). Solving a class of linearly constrained indefinite quadratic problems by dc algorithms. Journal of Global Optimization, 11(3): 253–285.

Lee Y, Lin Y, Wahba G (2004). Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465): 67–81.

Liu Y, Shen X (2006). Multicategory ψ-learning. Journal of the American Statistical Association, 101(474): 500–509.

Liu Y, Yuan M (2011). Reinforced multicategory support vector machines. Journal of Computational and Graphical Statistics, 20(4): 901–919.

Mangasarian O, Wild E (2001). Proximal support vector machine classifiers. Proceedings KDD-2001: Knowledge Discovery and Data Mining. Citeseer.

MATLAB (2014). version 8.3 (R2014a). The MathWorks Inc., Natick, Massachusetts.

McCullagh P, Nelder JA (1989). Generalized Linear Models, 2nd edition. Chapman and Hall, London, UK.

Suykens JA, Vandewalle J (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3): 293–300.

Tang Y, Zhang HH (2006). Multiclass proximal support vector machines. Journal of Computational and Graphical Statistics, 15(2): 339–355.

Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Methodological, 58(1): 267–288.

Tutz G, Pößnecker W, Uhlmann L (2015). Variable selection in general multinomial logit models. Computational Statistics & Data Analysis, 82: 207–222.

Vapnik V (1998). Statistical Learning Theory. Wiley-Interscience, New York.

Vapnik VN (1995). The Nature of Statistical Learning Theory. Springer-Verlag.

Wang L, Shen X (2007). On l 1-norm multiclass support vector machines: Methodology and theory. Journal of the American Statistical Association, 102(478): 583–594.

Weston J, Watkins C, et al. (1999). Support vector machines for multi-class pattern recognition. In: Esann, volume 99, 219–224.

Wu Y, Liu Y (2009). Variable selection in quantile regression. Statistica Sinica, 19(2): 801–817.

Yuan M, Lin Y (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 68(1): 49–67.

Zhang C, Liu Y (2014). Multicategory angle-based large-margin classification. Biometrika, 101(3): 625–640.

Zhang CH, et al. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2): 894–942.

Zhang HH, Ahn J, Lin X, Park C (2006). Gene selection using support vector machines with non-convex penalty. Bioinformatics, 22(1): 88–95.

Zhang HH, Liu Y, Wu Y, Zhu J, et al. (2008). Variable selection for the multicategory svm via adaptive sup-norm regularization. Electronic Journal of Statistics, 2: 149–167.

Zhao P, Rocha G, Yu B, et al. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37(6A): 3468–3497.

Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429.

Zou H, Li R (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4): 1509.

Zou H, Yuan M (2008). The ${f_{\infty }}$-norm support vector machine. Statistica Sinica, 18(1): 379–398.

This is a free to read article.

Keywords

logistic regression SCAD supnorm SVM variable selection

Metrics

since February 2021

1192

Article info
views

551

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file