<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1069</article-id>
<article-id pub-id-type="doi">10.6339/22-JDS1069</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zeng</surname><given-names>Liyun</given-names></name><email xlink:href="mailto:hzhang@math.arizona.edu">hzhang@math.arizona.edu</email><xref ref-type="aff" rid="j_jds1069_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname><given-names>Hao Helen</given-names></name><xref ref-type="aff" rid="j_jds1069_aff_001">1</xref><xref ref-type="aff" rid="j_jds1069_aff_002">2</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1069_aff_001"><label>1</label>Statistics and Data Science GIDP, <institution>University of Arizona</institution>, Tucson, Arizona, <country>USA</country></aff>
<aff id="j_jds1069_aff_002"><label>2</label>Department of Mathematics, <institution>University of Arizona</institution>, Tucson, Arizona, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:hzhang@math.arizona.edu">hzhang@math.arizona.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>3</day><month>11</month><year>2022</year></pub-date><volume>21</volume><issue>4</issue><fpage>658</fpage><lpage>680</lpage><history><date date-type="received"><day>3</day><month>6</month><year>2022</year></date><date date-type="accepted"><day>25</day><month>9</month><year>2022</year></date></history>
<permissions><copyright-statement>2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for <italic>K</italic>-class problems (<xref ref-type="bibr" rid="j_jds1069_ref_041">Wu et al.</xref>, <xref ref-type="bibr" rid="j_jds1069_ref_041">2010</xref>; <xref ref-type="bibr" rid="j_jds1069_ref_039">Wang et al.</xref>, <xref ref-type="bibr" rid="j_jds1069_ref_039">2019</xref>), where <italic>K</italic> is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in <italic>K</italic>. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in <italic>K</italic>. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>linear time algorithm</kwd>
<kwd>multiclass classification</kwd>
<kwd>non-parametric</kwd>
<kwd>probability estimation</kwd>
<kwd>scalability</kwd>
<kwd>support vector machines</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1069_reflist_001">
<title>References</title>
<ref id="j_jds1069_ref_001">
<mixed-citation publication-type="chapter"> <string-name><surname>Alimoglu</surname> <given-names>F</given-names></string-name>, <string-name><surname>Alpaydin</surname> <given-names>E</given-names></string-name> (<year>1997</year>). <chapter-title>Combining multiple representations and classifiers for pen-based handwritten digit recognition</chapter-title>. In: <source><italic>Proceedings of the Fourth International Conference on Document Analysis and Recognition</italic></source> (<string-name><given-names>J</given-names> <surname>Schürmann</surname></string-name>, ed.), volume <volume>2</volume>, <fpage>637</fpage>–<lpage>640</lpage>. <publisher-loc>Ulm, Germany</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Alizadeh</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Eisen</surname> <given-names>MB</given-names></string-name>, <string-name><surname>Davis</surname> <given-names>RE</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lossos</surname> <given-names>IS</given-names></string-name>, <string-name><surname>Rosenwald</surname> <given-names>A</given-names></string-name>, <etal>et al.</etal> (<year>2000</year>). <article-title>Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling</article-title>. <source><italic>Nature</italic></source>, <volume>403</volume>(<issue>6769</issue>): <fpage>503</fpage>–<lpage>511</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_003">
<mixed-citation publication-type="book"> <string-name><surname>Breiman</surname> <given-names>L</given-names></string-name>, <string-name><surname>Friedman J</surname> <given-names>H</given-names></string-name>, <string-name><surname>Olshen R</surname> <given-names>A</given-names></string-name>, <string-name><surname>Stone C</surname> <given-names>J</given-names></string-name> (<year>1984</year>). <source><italic>Classification and Regression Trees</italic></source>. <publisher-name>Wadsworth Publishing Company</publisher-name>, <publisher-loc>Belmont, California, USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Burges</surname> <given-names>C</given-names></string-name> (<year>1998</year>). <article-title>A tutorial on support vector machines for pattern recognition</article-title>. <source><italic>Data Mining and Knowledge Discovery</italic></source>, <volume>2</volume>: <fpage>121</fpage>–<lpage>167</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Cairano</surname> <given-names>SD</given-names></string-name>, <string-name><surname>Brand</surname> <given-names>M</given-names></string-name>, <string-name><surname>Bortoff</surname> <given-names>SA</given-names></string-name> (<year>2013</year>). <article-title>Projection-free parallel quadratic programming for linear model predictive control</article-title>. <source><italic>International Journal of Control</italic></source>, <volume>86</volume>(<issue>8</issue>): <fpage>1367</fpage>–<lpage>1385</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_006">
<mixed-citation publication-type="chapter"> <string-name><surname>Chamasemani</surname> <given-names>FF</given-names></string-name>, <string-name><surname>Singh</surname> <given-names>YP</given-names></string-name> (<year>2011</year>). <chapter-title>Multi-class support vector machine (SVM) classifiers – an application in hypothyroid detection and classification</chapter-title>. In: <source><italic>Proceedings of the Sixth International Conference on Bio-Inspired Computing: Theories and Applications</italic></source> (<string-name><given-names>R</given-names> <surname>Abdullah</surname></string-name>, ed.), <fpage>351</fpage>–<lpage>356</lpage>. <publisher-name>Penang</publisher-name>, <publisher-loc>Malaysia</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_007">
<mixed-citation publication-type="chapter"> <string-name><surname>Chen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Guestrin</surname> <given-names>C</given-names></string-name> (<year>2016</year>). <chapter-title>XGBoost: A scalable tree boosting system</chapter-title>. In: <source><italic>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</italic></source> (<string-name><given-names>A</given-names> <surname>Smola</surname></string-name>, <string-name><given-names>C</given-names> <surname>Aggarwal</surname></string-name>, <string-name><given-names>D</given-names> <surname>Shen</surname></string-name>, <string-name><given-names>R</given-names> <surname>Rastogi</surname></string-name>, eds.), In: <series><italic>KDD ’16</italic></series>, <fpage>785</fpage>–<lpage>794</lpage>. <publisher-name>ACM</publisher-name>, <publisher-loc>New York, New York, USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Crammer</surname> <given-names>K</given-names></string-name>, <string-name><surname>Singer</surname> <given-names>Y</given-names></string-name> (<year>2001</year>). <article-title>On the algorithmic implementation of multiclass kernel-based vector machines</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>2</volume>: <fpage>265</fpage>–<lpage>292</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_009">
<mixed-citation publication-type="book"> <string-name><surname>Cristianini</surname> <given-names>N</given-names></string-name>, <string-name><surname>Shawe-Taylor</surname> <given-names>J</given-names></string-name> (<year>2000</year>). <source><italic>An Introduction to Support Vector Machines and other Kernel-based Learning Methods</italic></source>. <publisher-name>Cambridge University Press</publisher-name>, <publisher-loc>Cambridge, UK</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Ding</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>Y</given-names></string-name> (<year>2019</year>). <article-title>A review on multi-class TWSVM</article-title>. <source><italic>Artificial Intelligence Review</italic></source>, <volume>52</volume>(<issue>2</issue>): <fpage>775</fpage>–<lpage>801</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_011">
<mixed-citation publication-type="other"> <string-name><surname>Dua</surname> <given-names>D</given-names></string-name>, <string-name><surname>Graff</surname> <given-names>C</given-names></string-name> (2019). UCI machine learning repository. <uri>http://archive.ics.uci.edu/ml</uri>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Dudoit</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fridlyand</surname> <given-names>J</given-names></string-name>, <string-name><surname>Speed</surname> <given-names>TP</given-names></string-name> (<year>2002</year>). <article-title>Comparison of discrimination methods for the classification of tumors using gene expression data</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>97</volume>(<issue>457</issue>): <fpage>77</fpage>–<lpage>87</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_013">
<mixed-citation publication-type="chapter"> <string-name><surname>Guo</surname> <given-names>C</given-names></string-name>, <string-name><surname>Pleiss</surname> <given-names>G</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Weinberger</surname> <given-names>KQ</given-names></string-name> (<year>2017</year>). <chapter-title>On calibration of modern neural networks</chapter-title>. In: <source><italic>Proceedings of the 34th International Conference on Machine Learning</italic></source> (<string-name><given-names>D</given-names> <surname>Precup</surname></string-name>, <string-name><given-names>YW</given-names> <surname>Teh</surname></string-name>, eds.), volume <volume>70</volume>, <fpage>1321</fpage>–<lpage>1330</lpage>. <publisher-name>Sydney</publisher-name>, <publisher-loc>Australia</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_014">
<mixed-citation publication-type="book"> <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name> (<year>2009</year>). <source><italic>The Elements of Statistical Learning: Data mining, Inference and Prediction</italic></source>. <publisher-name>Springer</publisher-name>, <publisher-loc>New York, New York, USA</publisher-loc>. <comment>2 edition</comment>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Herbei</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wegkamp</surname> <given-names>MH</given-names></string-name> (<year>2006</year>). <article-title>Classification with reject option</article-title>. <source><italic>Canadian Journal of Statistics</italic></source>, <volume>34</volume>(<issue>4</issue>): <fpage>709</fpage>–<lpage>721</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_016">
<mixed-citation publication-type="chapter"> <string-name><surname>Ho</surname> <given-names>TK</given-names></string-name> (<year>1995</year>). <chapter-title>Random decision forests</chapter-title>. In: <source><italic>Proceedings of the Third International Conference on Document Analysis and Recognition</italic></source> (<string-name><given-names>CY</given-names> <surname>Suen</surname></string-name>, ed.), volume <volume>1</volume>, <fpage>278</fpage>–<lpage>282</lpage>. <publisher-loc>Montreal, Canada</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_017">
<mixed-citation publication-type="chapter"> <string-name><surname>Horton</surname> <given-names>P</given-names></string-name>, <string-name><surname>Nakai</surname> <given-names>K</given-names></string-name> (<year>1996</year>). <chapter-title>A probabilistic classification system for predicting the cellular localization sites of proteins</chapter-title>. In: <source><italic>Proceeding of the Fourth International Conference on Intelligent Systems for Molecular Biology</italic></source> (<string-name><given-names>DJ</given-names> <surname>States</surname></string-name>, <string-name><given-names>P</given-names> <surname>Agarwal</surname></string-name>, <string-name><given-names>T</given-names> <surname>Gaasterland</surname></string-name>, <string-name><given-names>L</given-names> <surname>Hunter</surname></string-name>, <string-name><given-names>RF</given-names> <surname>Smith</surname></string-name>, eds.), <fpage>109</fpage>–<lpage>115</lpage>. <publisher-name>St. Louis</publisher-name>, <publisher-loc>Missouri, USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Huang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Du</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Perou</surname> <given-names>CM</given-names></string-name>, <string-name><surname>Hayes</surname> <given-names>DN</given-names></string-name>, <string-name><surname>Todd</surname> <given-names>MJ</given-names></string-name>, <etal>et al.</etal> (<year>2013</year>). <article-title>Multiclass distance-weighted discrimination</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>22</volume>(<issue>4</issue>): <fpage>953</fpage>–<lpage>969</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Islam</surname> <given-names>R</given-names></string-name>, <string-name><surname>Khan</surname> <given-names>SA</given-names></string-name>, <string-name><surname>Jm</surname> <given-names>K</given-names></string-name> (<year>2016</year>). <article-title>Discriminant feature distribution analysis-based hybrid feature selection for online bearing fault diagnosis in induction motors</article-title>. <source><italic>Journal of Sensors</italic></source>, <volume>2016</volume>: <fpage>1</fpage>–<lpage>16</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_020">
<mixed-citation publication-type="chapter"> <string-name><surname>Kallas</surname> <given-names>M</given-names></string-name>, <string-name><surname>Francis</surname> <given-names>C</given-names></string-name>, <string-name><surname>Kanaan</surname> <given-names>L</given-names></string-name>, <string-name><surname>Merheb</surname> <given-names>D</given-names></string-name>, <string-name><surname>Honeine</surname> <given-names>P</given-names></string-name>, <string-name><surname>Amoud</surname> <given-names>H</given-names></string-name> (<year>2012</year>). <chapter-title>Multi-class SVM classification combined with kernel PCA feature extraction of ECG signals</chapter-title>. In: <source><italic>Proceeding of the 19th International Conference on Telecommunications</italic></source> (<string-name><given-names>H</given-names> <surname>Abumarshoud</surname></string-name>, <string-name><given-names>A</given-names> <surname>Shojaeifard</surname></string-name>, <string-name><given-names>H</given-names> <surname>Aghvami</surname></string-name>, <string-name><given-names>F</given-names> <surname>Marvasti</surname></string-name>, eds.), <fpage>1</fpage>–<lpage>5</lpage>. <publisher-name>Jounieh</publisher-name>, <publisher-loc>Lebanon</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Kimeldorf</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wahba</surname> <given-names>G</given-names></string-name> (<year>1971</year>). <article-title>Some results on Tchebycheffian spline functions</article-title>. <source><italic>Journal of Mathematical Analysis and Applications</italic></source>, <volume>33</volume>: <fpage>82</fpage>–<lpage>95</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Krawczyk</surname> <given-names>B</given-names></string-name>, <string-name><surname>Woźniak</surname> <given-names>M</given-names></string-name>, <string-name><surname>Cyganek</surname> <given-names>B</given-names></string-name> (<year>2014</year>). <article-title>Clustering-based ensembles for one-class classification</article-title>. <source><italic>Information Sciences</italic></source>, <volume>264</volume>: <fpage>182</fpage>–<lpage>195</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_023">
<mixed-citation publication-type="journal"> <string-name><surname>Lee</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wahba</surname> <given-names>G</given-names></string-name> (<year>2004</year>). <article-title>Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>99</volume>: <fpage>67</fpage>–<lpage>81</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_024">
<mixed-citation publication-type="chapter"> <string-name><surname>Lei</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Dogan</surname> <given-names>U</given-names></string-name>, <string-name><surname>Binder</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kloft</surname> <given-names>M</given-names></string-name> (<year>2015</year>). <chapter-title>Multi-class SVMs: from tighter data-dependent generalization bounds to novel algorithms</chapter-title>. In: <source><italic>Proceedings of the 28th International Conference on Neural Information Processing Systems</italic></source> (<string-name><given-names>C</given-names> <surname>Cortes</surname></string-name>, <string-name><given-names>DD</given-names> <surname>Lee</surname></string-name>, <string-name><given-names>M</given-names> <surname>Sugiyama</surname></string-name>, <string-name><given-names>R</given-names> <surname>Garnett</surname></string-name>, eds.), volume <volume>2</volume>, <fpage>2035</fpage>–<lpage>2043</lpage>. <publisher-loc>Montreal, Canada</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Lin</surname> <given-names>Y</given-names></string-name> (<year>2002</year>). <article-title>Support vector machines and the bayes rule in classification</article-title>. <source><italic>Data Mining and Knowledge Discovery</italic></source>, <volume>6</volume>: <fpage>259</fpage>–<lpage>275</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_026">
<mixed-citation publication-type="chapter"> <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2007</year>). <chapter-title>Fisher consistency of multicategory support vector machines</chapter-title>. In: <source><italic>Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics</italic></source> (<string-name><given-names>M</given-names> <surname>Meila</surname></string-name>, <string-name><given-names>X</given-names> <surname>Shen</surname></string-name>, eds.), <fpage>291</fpage>–<lpage>298</lpage>. <comment>San Juan, Puerto Rico</comment>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_027">
<mixed-citation publication-type="journal"> <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>M</given-names></string-name> (<year>2011</year>). <article-title>Reinforced multicategory support vector machine</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>20</volume>: <fpage>901</fpage>–<lpage>919</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_028">
<mixed-citation publication-type="book"> <string-name><surname>McCullagh</surname> <given-names>P</given-names></string-name>, <string-name><surname>Nelder</surname> <given-names>J</given-names></string-name> (<year>1989</year>). <source><italic>Generalized Linear Models</italic></source>. <publisher-name>Chapman and Hall</publisher-name>, <publisher-loc>London, UK</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_029">
<mixed-citation publication-type="chapter"> <string-name><surname>Mezzoudj</surname> <given-names>F</given-names></string-name>, <string-name><surname>Benyettou</surname> <given-names>A</given-names></string-name> (<year>2012</year>). <chapter-title>On the optimization of multiclass support vector machines dedicated to speech recognition</chapter-title>. In: <source><italic>Proceedings of the 19th International Conference on Neural Information Processing</italic></source> (<string-name><given-names>T</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Z</given-names> <surname>Zeng</surname></string-name>, <string-name><given-names>C</given-names> <surname>Li</surname></string-name>, <string-name><given-names>CS</given-names> <surname>Leung</surname></string-name>, eds.), volume <volume>2</volume>, <fpage>1</fpage>–<lpage>8</lpage>. <publisher-loc>Berlin, Germany</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_030">
<mixed-citation publication-type="chapter"> <string-name><surname>Minderer</surname> <given-names>M</given-names></string-name>, <string-name><surname>Djolonga</surname> <given-names>J</given-names></string-name>, <string-name><surname>Romijnders</surname> <given-names>R</given-names></string-name>, <string-name><surname>Hubis</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zhai</surname> <given-names>X</given-names></string-name>, <string-name><surname>Houlsby</surname> <given-names>N</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <chapter-title>Revisiting the calibration of modern neural networks</chapter-title>. In: <source><italic>Proceedings of the 35th Advances in Neural Information Processing Systems</italic></source> (<string-name><given-names>M</given-names> <surname>Ranzato</surname></string-name>, <string-name><given-names>A</given-names> <surname>Beygelzimer</surname></string-name>, <string-name><given-names>Y</given-names> <surname>Dauphin</surname></string-name>, <string-name><given-names>PS</given-names> <surname>Liang</surname></string-name>, <string-name><given-names>JW</given-names> <surname>Vaughan</surname></string-name>, eds.), volume <volume>34</volume>, <fpage>15682</fpage>–<lpage>15694</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_031">
<mixed-citation publication-type="journal"> <string-name><surname>Rifkin</surname> <given-names>R</given-names></string-name>, <string-name><surname>Klautau</surname> <given-names>A</given-names></string-name> (<year>2004</year>). <article-title>In defense of one-vs-all classification</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>5</volume>: <fpage>101</fpage>–<lpage>141</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_032">
<mixed-citation publication-type="journal"> <string-name><surname>Saigal</surname> <given-names>P</given-names></string-name>, <string-name><surname>Khanna</surname> <given-names>V</given-names></string-name> (<year>2020</year>). <article-title>Multi-category news classification using support vector machine based classifiers</article-title>. <source><italic>SN Applied Sciences</italic></source>, <volume>2</volume>(<issue>3</issue>): <fpage>458</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Tomar</surname> <given-names>D</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>S</given-names></string-name> (<year>2015</year>). <article-title>A comparison on multi-class classification methods based on least squares twin support vector machine</article-title>. <source><italic>Knowledge-Based Systems</italic></source>, <volume>81</volume>: <fpage>131</fpage>–<lpage>147</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_034">
<mixed-citation publication-type="book"> <string-name><surname>Vapnik</surname> <given-names>V</given-names></string-name> (<year>1998</year>). <source><italic>Statistical Learning Theory</italic></source>. <publisher-name>Wiley</publisher-name>, <publisher-loc>New York, New York, USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_035">
<mixed-citation publication-type="book"> <string-name><surname>Wahba</surname> <given-names>G</given-names></string-name> (<year>1990</year>). <source><italic>Spline Models for Observational Data</italic></source> <series><italic>CBMS-NSF Regional Conference Series in Applied Mathematics</italic></series>. <publisher-name>SIAM</publisher-name>, <publisher-loc>Philadelphia, Pennsylvania, USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_036">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>X</given-names></string-name> (<year>2006</year>). <article-title>Estimation of generalization error: random and fixed inputs</article-title>. <source><italic>Statistica Sinica</italic></source>, <volume>16</volume>(<issue>2</issue>): <fpage>569</fpage>–<lpage>588</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_037">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2008</year>). <article-title>Probability estimation for large margin classifiers</article-title>. <source><italic>Biometrika</italic></source>, <volume>95</volume>: <fpage>149</fpage>–<lpage>167</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_038">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>X</given-names></string-name> (<year>2007</year>). <article-title>On <inline-formula id="j_jds1069_ineq_001"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${L_{1}}$]]></tex-math></alternatives></inline-formula>-norm multiclass support vector machines</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>102</volume>: <fpage>583</fpage>–<lpage>594</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_039">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>HH</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name> (<year>2019</year>). <article-title>Multiclass probability estimation with support vector machines</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>28</volume>(<issue>3</issue>): <fpage>586</fpage>–<lpage>595</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_040">
<mixed-citation publication-type="chapter"> <string-name><surname>Weston</surname> <given-names>J</given-names></string-name>, <string-name><surname>Watkins</surname> <given-names>C</given-names></string-name> (<year>1999</year>). <chapter-title>Support vector machines for multi-class pattern recognition</chapter-title>. In: <source><italic>Proceedings of the Seventh European Symposium on Artificial Neural Networks</italic></source> (<string-name><given-names>M</given-names> <surname>Gori</surname></string-name>, ed.), <fpage>21</fpage>–<lpage>23</lpage>. <publisher-loc>Bruges, Belgium</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_041">
<mixed-citation publication-type="journal"> <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>HH</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2010</year>). <article-title>Robust model-free multiclass probability estimation</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>105</volume>: <fpage>424</fpage>–<lpage>436</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_042">
<mixed-citation publication-type="journal"> <string-name><surname>Ye</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Tse</surname> <given-names>E</given-names></string-name> (<year>1989</year>). <article-title>An extension of Karmarkar’s projective algorithm for convex quadratic programming</article-title>. <source><italic>Mathematical Programming</italic></source>, <volume>44</volume>(<issue>1–3</issue>): <fpage>157</fpage>–<lpage>179</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_043">
<mixed-citation publication-type="journal"> <string-name><surname>Yeoh</surname> <given-names>EJ</given-names></string-name>, <string-name><surname>Ross</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Shurtleff</surname> <given-names>SA</given-names></string-name>, <string-name><surname>Williams</surname> <given-names>WK</given-names></string-name>, <string-name><surname>Patel</surname> <given-names>D</given-names></string-name>, <string-name><surname>Mahfouz</surname> <given-names>R</given-names></string-name>, <etal>et al.</etal> (<year>2002</year>). <article-title>Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling</article-title>. <source><italic>Cancer Cell</italic></source>, <volume>1</volume>(<issue>2</issue>): <fpage>133</fpage>–<lpage>143</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_044">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2013</year>). <article-title>Multicategory large-margin unified machines</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>14</volume>: <fpage>1349</fpage>–<lpage>1386</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_045">
<mixed-citation publication-type="journal"> <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name> (<year>2005</year>). <article-title>Kernel logistic regression and the import vector machine</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>14</volume>: <fpage>185</fpage>–<lpage>205</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1069_ref_046">
<mixed-citation publication-type="chapter"> <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Rosset</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name> (<year>2003</year>). <chapter-title>1-norm support vector machines</chapter-title>. In: <source><italic>Proceedings of the 16th International Conference on Neural Information Processing Systems</italic></source> (<string-name><given-names>S</given-names> <surname>Thrun</surname></string-name>, <string-name><given-names>LK</given-names> <surname>Saul</surname></string-name>, <string-name><given-names>B</given-names> <surname>Schölkopf</surname></string-name>, eds.), <fpage>49</fpage>–<lpage>56</lpage>. <publisher-loc>Whistler, Canada</publisher-loc>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
