<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn>
<issn pub-type="ppub">1680-743X</issn>
<issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1000</article-id>
<article-id pub-id-type="doi">10.6339/20-JDS1000</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Sparse Learning with Non-convex Penalty in Multi-classification</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Li</surname><given-names>Nan</given-names></name><xref ref-type="aff" rid="j_jds1000_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname><given-names>Hao Helen</given-names></name><email xlink:href="mailto:hzhang@math.arizona.edu">hzhang@math.arizona.edu</email><xref ref-type="aff" rid="j_jds1000_aff_002">2</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1000_aff_001"><label>1</label>Department of Epidemiology and Cancer Control, <institution>St. Jude Children’s Research Hospital</institution>, Memphis, Tennessee, <country>U.S.A.</country></aff>
<aff id="j_jds1000_aff_002"><label>2</label>Department of Mathematics, <institution>University of Arizona</institution>, Tucson, Arizona, <country>U.S.A.</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:hzhang@math.arizona.edu">hzhang@math.arizona.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>10</day><month>2</month><year>2021</year></pub-date>
<volume>19</volume><issue>1</issue><fpage>56</fpage><lpage>74</lpage>
<supplementary-material id="S1" content-type="archive" xlink:href="jds1000_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>A zip file includes all the computation code and data for the numerical experiments is available.</p>
</caption>
</supplementary-material>
<history>
<date date-type="received"><month>11</month><year>2020</year></date>
<date date-type="accepted"><month>12</month><year>2020</year></date>
</history>
<permissions><copyright-statement>2021 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Multi-classification is commonly encountered in data science practice, and it has broad applications in many areas such as biology, medicine, and engineering. Variable selection in multiclass problems is much more challenging than in binary classification or regression problems. In addition to estimating multiple discriminant functions for separating different classes, we need to decide which variables are important for each individual discriminant function as well as for the whole set of functions. In this paper, we address the multi-classification variable selection problem by proposing a new form of penalty, supSCAD, which first groups all the coefficients of the same variable associated with all the discriminant functions altogether and then imposes the SCAD penalty on the supnorm of each group. We apply the new penalty to both soft and hard classification and develop two new procedures: the supSCAD multinomial logistic regression and the supSCAD multi-category support vector machine. Our theoretical results show that, with a proper choice of the tuning parameter, the supSCAD multinomial logistic regression can identify the underlying sparse model consistently and enjoys oracle properties even when the dimension of predictors goes to infinity. Based on the local linear and quadratic approximation to the non-concave SCAD and nonlinear multinomial log-likelihood function, we show that the new procedures can be implemented efficiently by solving a series of linear or quadratic programming problems. Performance of the new methods is illustrated by simulation studies and real data analysis of the Small Round Blue Cell Tumors and the Semeion Handwritten Digit data sets.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>logistic regression</kwd>
<kwd>SCAD</kwd>
<kwd>supnorm</kwd>
<kwd>SVM</kwd>
<kwd>variable selection</kwd>
</kwd-group>
</article-meta>
</front>
<body/>
<back>
<ref-list id="j_jds1000_reflist_001">
<title>References</title>
<ref id="j_jds1000_ref_001">
<mixed-citation publication-type="chapter"> <string-name><surname>Bradley</surname> <given-names>PS</given-names></string-name>, <string-name><surname>Mangasarian</surname> <given-names>OL</given-names></string-name> (<year>1998</year>). <chapter-title>Feature selection via concave minimization and support vector machines</chapter-title>. In: <source>ICML</source>, volume <volume>98</volume>, <fpage>82</fpage>–<lpage>90</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Breheny</surname> <given-names>P</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>J</given-names></string-name> (<year>2009</year>). <article-title>Penalized methods for bi-level variable selection</article-title>. <source>Statistics and Its Interface</source>, <volume>2</volume>(<issue>3</issue>): <fpage>369</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Crammer</surname> <given-names>K</given-names></string-name>, <string-name><surname>Singer</surname> <given-names>Y</given-names></string-name> (<year>2001</year>). <article-title>On the algorithmic implementation of multiclass kernel-based vector machines</article-title>. <source>Journal of Machine Learning Research</source>, <volume>2</volume>: <fpage>265</fpage>–<lpage>292</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Dudoit</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fridlyand</surname> <given-names>J</given-names></string-name>, <string-name><surname>Speed</surname> <given-names>TP</given-names></string-name> (<year>2002</year>). <article-title>Comparison of discrimination methods for the classification of tumors using gene expression data</article-title>. <source>Journal of the American Statistical Association</source>, <volume>97</volume>(<issue>457</issue>): <fpage>77</fpage>–<lpage>87</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Fan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Li</surname> <given-names>R</given-names></string-name> (<year>2001</year>). <article-title>Variable selection via nonconcave penalized likelihood and its oracle properties</article-title>. <source>Journal of the American Statistical Association</source>, <volume>96</volume>(<issue>456</issue>): <fpage>1348</fpage>–<lpage>1360</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Fan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>H</given-names></string-name>, <etal>et al.</etal> (<year>2004</year>). <article-title>Nonconcave penalized likelihood with a diverging number of parameters</article-title>. <source>The Annals of Statistics</source>, <volume>32</volume>(<issue>3</issue>): <fpage>928</fpage>–<lpage>961</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_007">
<mixed-citation publication-type="book"> <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name> (<year>2009</year>). <source>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</source>. <publisher-name>Springer Science &amp; Business Media</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_008">
<mixed-citation publication-type="other"> <string-name><surname>Holmström</surname> <given-names>K</given-names></string-name>, <string-name><surname>Göran</surname> <given-names>AO</given-names></string-name>, <string-name><surname>Edvall</surname> <given-names>MM</given-names></string-name> (2010). User’s Guide for Tomlab 7.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Huang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Breheny</surname> <given-names>P</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>S</given-names></string-name> (<year>2012</year>). <article-title>A selective review of group selection in high-dimensional models</article-title>. <source>Statistical Science</source>, <volume>27</volume>(<issue>4</issue>): <fpage>481</fpage>–<lpage>499</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Khan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>JS</given-names></string-name>, <string-name><surname>Ringner</surname> <given-names>M</given-names></string-name>, <string-name><surname>Saal</surname> <given-names>LH</given-names></string-name>, <string-name><surname>Ladanyi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Westermann</surname> <given-names>F</given-names></string-name>, <etal>et al.</etal> (<year>2001</year>). <article-title>Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks</article-title>. <source>Nature Medicine</source>, <volume>7</volume>(<issue>6</issue>): <fpage>673</fpage>–<lpage>679</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Lange</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hunter</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>I</given-names></string-name> (<year>2000</year>). <article-title>Optimization transfer using surrogate objective functions</article-title>. <source>Journal of Computational and Graphical Statistics</source>, <volume>9</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>20</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Le Thi Hoai</surname> <given-names>A</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>PD</given-names></string-name> (<year>1997</year>). <article-title>Solving a class of linearly constrained indefinite quadratic problems by dc algorithms</article-title>. <source>Journal of Global Optimization</source>, <volume>11</volume>(<issue>3</issue>): <fpage>253</fpage>–<lpage>285</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Lee</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wahba</surname> <given-names>G</given-names></string-name> (<year>2004</year>). <article-title>Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data</article-title>. <source>Journal of the American Statistical Association</source>, <volume>99</volume>(<issue>465</issue>): <fpage>67</fpage>–<lpage>81</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>X</given-names></string-name> (<year>2006</year>). <article-title>Multicategory <italic>ψ</italic>-learning</article-title>. <source>Journal of the American Statistical Association</source>, <volume>101</volume>(<issue>474</issue>): <fpage>500</fpage>–<lpage>509</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>M</given-names></string-name> (<year>2011</year>). <article-title>Reinforced multicategory support vector machines</article-title>. <source>Journal of Computational and Graphical Statistics</source>, <volume>20</volume>(<issue>4</issue>): <fpage>901</fpage>–<lpage>919</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_016">
<mixed-citation publication-type="book"> <string-name><surname>Mangasarian</surname> <given-names>O</given-names></string-name>, <string-name><surname>Wild</surname> <given-names>E</given-names></string-name> (<year>2001</year>). <article-title>Proximal support vector machine classifiers</article-title>. <source>Proceedings KDD-2001: Knowledge Discovery and Data Mining</source>. <publisher-name>Citeseer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_017">
<mixed-citation publication-type="other"> MATLAB (2014). <italic>version 8.3 (R2014a)</italic>. The MathWorks Inc., Natick, Massachusetts.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_018">
<mixed-citation publication-type="book"> <string-name><surname>McCullagh</surname> <given-names>P</given-names></string-name>, <string-name><surname>Nelder</surname> <given-names>JA</given-names></string-name> (<year>1989</year>). <source>Generalized Linear Models</source>, <edition>2</edition>nd edition. <publisher-name>Chapman and Hall</publisher-name>, <publisher-loc>London, UK</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Suykens</surname> <given-names>JA</given-names></string-name>, <string-name><surname>Vandewalle</surname> <given-names>J</given-names></string-name> (<year>1999</year>). <article-title>Least squares support vector machine classifiers</article-title>. <source>Neural Processing Letters</source>, <volume>9</volume>(<issue>3</issue>): <fpage>293</fpage>–<lpage>300</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Tang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>HH</given-names></string-name> (<year>2006</year>). <article-title>Multiclass proximal support vector machines</article-title>. <source>Journal of Computational and Graphical Statistics</source>, <volume>15</volume>(<issue>2</issue>): <fpage>339</fpage>–<lpage>355</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name> (<year>1996</year>). <article-title>Regression shrinkage and selection via the lasso</article-title>. <source>Journal of the Royal Statistical Society, Series B, Methodological</source>, <volume>58</volume>(<issue>1</issue>): <fpage>267</fpage>–<lpage>288</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Tutz</surname> <given-names>G</given-names></string-name>, <string-name><surname>Pößnecker</surname> <given-names>W</given-names></string-name>, <string-name><surname>Uhlmann</surname> <given-names>L</given-names></string-name> (<year>2015</year>). <article-title>Variable selection in general multinomial logit models</article-title>. <source>Computational Statistics &amp; Data Analysis</source>, <volume>82</volume>: <fpage>207</fpage>–<lpage>222</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_023">
<mixed-citation publication-type="book"> <string-name><surname>Vapnik</surname> <given-names>V</given-names></string-name> (<year>1998</year>). <source>Statistical Learning Theory</source>. <publisher-name>Wiley-Interscience</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_024">
<mixed-citation publication-type="book"> <string-name><surname>Vapnik</surname> <given-names>VN</given-names></string-name> (<year>1995</year>). <source>The Nature of Statistical Learning Theory</source>. <publisher-name>Springer-Verlag</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>X</given-names></string-name> (<year>2007</year>). <article-title>On l 1-norm multiclass support vector machines: Methodology and theory</article-title>. <source>Journal of the American Statistical Association</source>, <volume>102</volume>(<issue>478</issue>): <fpage>583</fpage>–<lpage>594</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_026">
<mixed-citation publication-type="chapter"> <string-name><surname>Weston</surname> <given-names>J</given-names></string-name>, <string-name><surname>Watkins</surname> <given-names>C</given-names></string-name>, <etal>et al.</etal> (<year>1999</year>). <chapter-title>Support vector machines for multi-class pattern recognition</chapter-title>. In: <source>Esann</source>, volume <volume>99</volume>, <fpage>219</fpage>–<lpage>224</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_027">
<mixed-citation publication-type="journal"> <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2009</year>). <article-title>Variable selection in quantile regression</article-title>. <source>Statistica Sinica</source>, <volume>19</volume>(<issue>2</issue>): <fpage>801</fpage>–<lpage>817</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_028">
<mixed-citation publication-type="journal"> <string-name><surname>Yuan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>Y</given-names></string-name> (<year>2006</year>). <article-title>Model selection and estimation in regression with grouped variables</article-title>. <source>Journal of the Royal Statistical Society, Series B, Statistical Methodology</source>, <volume>68</volume>(<issue>1</issue>): <fpage>49</fpage>–<lpage>67</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_029">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2014</year>). <article-title>Multicategory angle-based large-margin classification</article-title>. <source>Biometrika</source>, <volume>101</volume>(<issue>3</issue>): <fpage>625</fpage>–<lpage>640</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_030">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>CH</given-names></string-name>, <etal>et al.</etal> (<year>2010</year>). <article-title>Nearly unbiased variable selection under minimax concave penalty</article-title>. <source>The Annals of Statistics</source>, <volume>38</volume>(<issue>2</issue>): <fpage>894</fpage>–<lpage>942</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_031">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>HH</given-names></string-name>, <string-name><surname>Ahn</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>X</given-names></string-name>, <string-name><surname>Park</surname> <given-names>C</given-names></string-name> (<year>2006</year>). <article-title>Gene selection using support vector machines with non-convex penalty</article-title>. <source>Bioinformatics</source>, <volume>22</volume>(<issue>1</issue>): <fpage>88</fpage>–<lpage>95</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_032">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>HH</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <etal>et al.</etal> (<year>2008</year>). <article-title>Variable selection for the multicategory svm via adaptive sup-norm regularization</article-title>. <source>Electronic Journal of Statistics</source>, <volume>2</volume>: <fpage>149</fpage>–<lpage>167</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Zhao</surname> <given-names>P</given-names></string-name>, <string-name><surname>Rocha</surname> <given-names>G</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>B</given-names></string-name>, <etal>et al.</etal> (<year>2009</year>). <article-title>The composite absolute penalties family for grouped and hierarchical variable selection</article-title>. <source>The Annals of Statistics</source>, <volume>37</volume>(<issue>6A</issue>): <fpage>3468</fpage>–<lpage>3497</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_034">
<mixed-citation publication-type="journal"> <string-name><surname>Zou</surname> <given-names>H</given-names></string-name> (<year>2006</year>). <article-title>The adaptive lasso and its oracle properties</article-title>. <source>Journal of the American Statistical Association</source>, <volume>101</volume>(<issue>476</issue>): <fpage>1418</fpage>–<lpage>1429</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_035">
<mixed-citation publication-type="journal"> <string-name><surname>Zou</surname> <given-names>H</given-names></string-name>, <string-name><surname>Li</surname> <given-names>R</given-names></string-name> (<year>2008</year>). <article-title>One-step sparse estimates in nonconcave penalized likelihood models</article-title>. <source>The Annals of Statistics</source>, <volume>36</volume>(<issue>4</issue>): <fpage>1509</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1000_ref_036">
<mixed-citation publication-type="journal"> <string-name><surname>Zou</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>M</given-names></string-name> (<year>2008</year>). <article-title>The <inline-formula id="j_jds1000_ineq_001"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mi>∞</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${f_{\infty }}$]]></tex-math></alternatives></inline-formula>-norm support vector machine</article-title>. <source>Statistica Sinica</source>, <volume>18</volume>(<issue>1</issue>): <fpage>379</fpage>–<lpage>398</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
