<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1138</article-id>
<article-id pub-id-type="doi">10.6339/24-JDS1138</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Computing in Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Unified Robust Boosting</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-0773-0052</contrib-id>
<name><surname>Wang</surname><given-names>Zhu</given-names></name><email xlink:href="mailto:zwang145@uthsc.edu">zwang145@uthsc.edu</email><xref ref-type="aff" rid="j_jds1138_aff_001">1</xref><xref ref-type="fn" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1138_aff_001"><label>1</label>Memphis, TN, <institution>Department of Preventive Medicine, The University of Tennessee Health Science Center</institution>, <country>United States</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Email: <ext-link ext-link-type="uri" xlink:href="mailto:zwang145@uthsc.edu">zwang145@uthsc.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>28</day><month>6</month><year>2024</year></pub-date><volume>23</volume><issue>1</issue><fpage>90</fpage><lpage>108</lpage><supplementary-material id="S1" content-type="document" xlink:href="jds1138_s001.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<title>Supplementary Material</title>
<p>The R code necessary to reproduce the analysis presented in the manuscript is provided.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>18</day><month>3</month><year>2024</year></date><date date-type="accepted"><day>22</day><month>4</month><year>2024</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Boosting is a popular algorithm in supervised machine learning with wide applications in regression and classification problems. It combines weak learners, such as regression trees, to obtain accurate predictions. However, in the presence of outliers, traditional boosting may yield inferior results since the algorithm optimizes a convex loss function. Recent literature has proposed boosting algorithms that optimize robust nonconvex loss functions. Nevertheless, there is a lack of weighted estimation to indicate the outlier status of observations. This article introduces the iteratively reweighted boosting (IRBoost) algorithm, which combines robust loss optimization and weighted estimation. It can be conveniently constructed with existing software. The output includes weights as valuable diagnostics for the outlier status of observations. For practitioners interested in the boosting algorithm, the new method can be interpreted as a way to tune robust observation weights. IRBoost is implemented in the <sans-serif>R</sans-serif> package <italic>irboost</italic> and is demonstrated using publicly available data in generalized linear models, classification, and survival data analysis.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>boosting</kwd>
<kwd>CC-family</kwd>
<kwd>IRBoost</kwd>
<kwd>IRCO</kwd>
<kwd>machine learning</kwd>
<kwd>robust method</kwd>
</kwd-group>
<funding-group><funding-statement>This work was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health under Award Number R21DK130006.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1138_reflist_001">
<title>References</title>
<ref id="j_jds1138_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Barnwal</surname> <given-names>A</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hocking</surname> <given-names>T</given-names></string-name> (<year>2022</year>). <article-title>Survival regression with accelerated failure time model in XGBoost</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>31</volume>(<issue>4</issue>): <fpage>1292</fpage>–<lpage>1302</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/10618600.2022.2067548" xlink:type="simple">https://doi.org/10.1080/10618600.2022.2067548</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Bühlmann</surname> <given-names>P</given-names></string-name>, <string-name><surname>Hothorn</surname> <given-names>T</given-names></string-name> (<year>2007</year>). <article-title>Boosting algorithms: Regularization, prediction and model fitting (with discussion)</article-title>. <source><italic>Statistical Science</italic></source>, <volume>22</volume>(<issue>4</issue>): <fpage>477</fpage>–<lpage>505</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_003">
<mixed-citation publication-type="chapter"> <string-name><surname>Chen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Guestrin</surname> <given-names>C</given-names></string-name> (<year>2016</year>). <chapter-title>Xgboost: A scalable tree boosting system</chapter-title>. In: <source><italic>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</italic></source>, <fpage>785</fpage>–<lpage>794</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_004">
<mixed-citation publication-type="other"> <string-name><surname>Chen</surname> <given-names>T</given-names></string-name>, <string-name><surname>He</surname> <given-names>T</given-names></string-name>, <string-name><surname>Benesty</surname> <given-names>M</given-names></string-name>, <string-name><surname>Khotilovich</surname> <given-names>V</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>H</given-names></string-name>, et al. (<year>2024</year>). Xgboost: extreme gradient boosting. <sans-serif>R</sans-serif> package version 1.7.7.1.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name> (<year>2001</year>). <article-title>Greedy function approximation: A gradient boosting machine</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>29</volume>(<issue>5</issue>): <fpage>1189</fpage>–<lpage>1232</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/aos/1013203451" xlink:type="simple">https://doi.org/10.1214/aos/1013203451</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name> (<year>2000</year>). <article-title>Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors)</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>28</volume>(<issue>2</issue>): <fpage>337</fpage>–<lpage>407</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/aos/1016218223" xlink:type="simple">https://doi.org/10.1214/aos/1016218223</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_007">
<mixed-citation publication-type="book"> <string-name><surname>Heritier</surname> <given-names>S</given-names></string-name>, <string-name><surname>Cantoni</surname> <given-names>E</given-names></string-name>, <string-name><surname>Copt</surname> <given-names>S</given-names></string-name>, <string-name><surname>Victoria-Feser</surname> <given-names>MP</given-names></string-name> (<year>2009</year>). <source><italic>Robust Methods in Biostatistics</italic></source>, volume <volume>825</volume>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_008">
<mixed-citation publication-type="other"> <string-name><surname>Hothorn</surname> <given-names>T</given-names></string-name>, <string-name><surname>Bühlmann</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kneib</surname> <given-names>T</given-names></string-name>, <string-name><surname>Schmid</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hofner</surname> <given-names>B</given-names></string-name>, <string-name><surname>Otto-Sobotka</surname> <given-names>F</given-names></string-name>, et al. (<year>2023</year>). <italic>mboost</italic>: Model-Based Boosting. <sans-serif>R</sans-serif> package version 2.9-9.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Li</surname> <given-names>AH</given-names></string-name>, <string-name><surname>Bradic</surname> <given-names>J</given-names></string-name> (<year>2018</year>). <article-title>Boosting in the presence of outliers: Adaptive classification with nonconvex loss functions</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>113</volume>(<issue>522</issue>): <fpage>660</fpage>–<lpage>674</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Long</surname> <given-names>PM</given-names></string-name>, <string-name><surname>Servedio</surname> <given-names>RA</given-names></string-name> (<year>2010</year>). <article-title>Random classification noise defeats all convex potential boosters</article-title>. <source><italic>Machine Learning</italic></source>, <volume>78</volume>(<issue>3</issue>): <fpage>287</fpage>–<lpage>304</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10994-009-5165-z" xlink:type="simple">https://doi.org/10.1007/s10994-009-5165-z</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_011">
<mixed-citation publication-type="chapter"> <string-name><surname>Mairal</surname> <given-names>J</given-names></string-name> (<year>2013</year>). <chapter-title>Stochastic majorization-minimization algorithms for large-scale optimization</chapter-title>. In: <source><italic>NIPS 2013 - Advances in Neural Information Processing Systems, 26, Dec 2013, South Lake Tahoe, United States</italic></source>, <fpage>2283</fpage>–<lpage>2291</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_012">
<mixed-citation publication-type="book"> <string-name><surname>Maronna</surname> <given-names>RA</given-names></string-name>, <string-name><surname>Martin</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Yohai</surname> <given-names>VJ</given-names></string-name>, <string-name><surname>Salibián-Barrera</surname> <given-names>M</given-names></string-name> (<year>2019</year>). <source><italic>Robust Statistics: Theory and Methods (with R)</italic></source>. <publisher-name>John Wiley &amp; Sons</publisher-name>, <publisher-loc>Hoboken, NJ</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Park</surname> <given-names>SY</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2011</year>). <article-title>Robust penalized logistic regression with truncated loss functions</article-title>. <source><italic>Canadian Journal of Statistics</italic></source>, <volume>39</volume>(<issue>2</issue>): <fpage>300</fpage>–<lpage>323</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/cjs.10105" xlink:type="simple">https://doi.org/10.1002/cjs.10105</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Sigrist</surname> <given-names>F</given-names></string-name> (<year>2021</year>). <article-title>Gradient and Newton boosting for classification and regression</article-title>. <source><italic>Expert Systems with Applications</italic></source>, <volume>167</volume>: <fpage>114080</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.eswa.2020.114080" xlink:type="simple">https://doi.org/10.1016/j.eswa.2020.114080</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name> (<year>2018</year>a). <article-title>Quadratic majorization for nonconvex loss with applications to the boosting algorithm</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>27</volume>(<issue>3</issue>): <fpage>491</fpage>–<lpage>502</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/10618600.2018.1424635" xlink:type="simple">https://doi.org/10.1080/10618600.2018.1424635</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name> (<year>2018</year>b). <article-title>Robust boosting with truncated loss functions</article-title>. <source><italic>Electronic Journal of Statistics</italic></source>, <volume>12</volume>(<issue>1</issue>): <fpage>599</fpage>–<lpage>650</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/18-EJS1434" xlink:type="simple">https://doi.org/10.1214/18-EJS1434</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_017">
<mixed-citation publication-type="other"> <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name> (<year>2024</year>a). <italic>irboost</italic>: Iteratively Reweighted Boosting for Robust Analysis. <sans-serif>R</sans-serif> package version 0.1-15.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name> (<year>2024</year>b). <article-title>Unified robust estimation</article-title>. <source><italic>Australian &amp; New Zealand Journal of Statistics</italic></source>, <volume>66</volume>(<issue>1</issue>): <fpage>77</fpage>–<lpage>102</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/anzs.12409" xlink:type="simple">https://doi.org/10.1111/anzs.12409</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_019">
<mixed-citation publication-type="other"> <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Hothorn</surname> <given-names>T</given-names></string-name> (<year>2023</year>). <italic>bst</italic>: Gradient Boosting. <sans-serif>R</sans-serif> package version 0.3-24.</mixed-citation>
</ref>
<ref id="j_jds1138_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name> (<year>2007</year>). <article-title>Robust truncated hinge loss support vector machines</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>102</volume>(<issue>479</issue>): <fpage>974</fpage>–<lpage>983</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1198/016214507000000617" xlink:type="simple">https://doi.org/10.1198/016214507000000617</ext-link></mixed-citation>
</ref>
<ref id="j_jds1138_ref_021">
<mixed-citation publication-type="chapter"> <string-name><surname>Zhao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Mammadov</surname> <given-names>M</given-names></string-name>, <string-name><surname>Yearwood</surname> <given-names>J</given-names></string-name> (<year>2010</year>). <chapter-title>From convex to nonconvex: A loss function analysis for binary classification</chapter-title>. In: <source><italic>2010 IEEE International Conference on Data Mining Workshops</italic></source>, <fpage>1281</fpage>–<lpage>1288</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
