Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 23, Issue 1 (2025)
  4. Unified Robust Boosting

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Unified Robust Boosting
Volume 23, Issue 1 (2025), pp. 90–108
Zhu Wang ORCID icon link to view author Zhu Wang details  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1138
Pub. online: 28 June 2024      Type: Computing In Data Science      Open accessOpen Access

Received
18 March 2024
Accepted
22 April 2024
Published
28 June 2024

Abstract

Boosting is a popular algorithm in supervised machine learning with wide applications in regression and classification problems. It combines weak learners, such as regression trees, to obtain accurate predictions. However, in the presence of outliers, traditional boosting may yield inferior results since the algorithm optimizes a convex loss function. Recent literature has proposed boosting algorithms that optimize robust nonconvex loss functions. Nevertheless, there is a lack of weighted estimation to indicate the outlier status of observations. This article introduces the iteratively reweighted boosting (IRBoost) algorithm, which combines robust loss optimization and weighted estimation. It can be conveniently constructed with existing software. The output includes weights as valuable diagnostics for the outlier status of observations. For practitioners interested in the boosting algorithm, the new method can be interpreted as a way to tune robust observation weights. IRBoost is implemented in the R package irboost and is demonstrated using publicly available data in generalized linear models, classification, and survival data analysis.

Supplementary material

 Supplementary Material
The R code necessary to reproduce the analysis presented in the manuscript is provided.

References

 
Barnwal A, Cho H, Hocking T (2022). Survival regression with accelerated failure time model in XGBoost. Journal of Computational and Graphical Statistics, 31(4): 1292–1302. https://doi.org/10.1080/10618600.2022.2067548
 
Bühlmann P, Hothorn T (2007). Boosting algorithms: Regularization, prediction and model fitting (with discussion). Statistical Science, 22(4): 477–505.
 
Chen T, Guestrin C (2016). Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
 
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. (2024). Xgboost: extreme gradient boosting. R package version 1.7.7.1.
 
Friedman J (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5): 1189–1232. https://doi.org/10.1214/aos/1013203451
 
Friedman J, Hastie T, Tibshirani R (2000). Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2): 337–407. https://doi.org/10.1214/aos/1016218223
 
Heritier S, Cantoni E, Copt S, Victoria-Feser MP (2009). Robust Methods in Biostatistics, volume 825. John Wiley & Sons.
 
Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B, Otto-Sobotka F, et al. (2023). mboost: Model-Based Boosting. R package version 2.9-9.
 
Li AH, Bradic J (2018). Boosting in the presence of outliers: Adaptive classification with nonconvex loss functions. Journal of the American Statistical Association, 113(522): 660–674.
 
Long PM, Servedio RA (2010). Random classification noise defeats all convex potential boosters. Machine Learning, 78(3): 287–304. https://doi.org/10.1007/s10994-009-5165-z
 
Mairal J (2013). Stochastic majorization-minimization algorithms for large-scale optimization. In: NIPS 2013 - Advances in Neural Information Processing Systems, 26, Dec 2013, South Lake Tahoe, United States, 2283–2291.
 
Maronna RA, Martin RD, Yohai VJ, Salibián-Barrera M (2019). Robust Statistics: Theory and Methods (with R). John Wiley & Sons, Hoboken, NJ.
 
Park SY, Liu Y (2011). Robust penalized logistic regression with truncated loss functions. Canadian Journal of Statistics, 39(2): 300–323. https://doi.org/10.1002/cjs.10105
 
Sigrist F (2021). Gradient and Newton boosting for classification and regression. Expert Systems with Applications, 167: 114080. https://doi.org/10.1016/j.eswa.2020.114080
 
Wang Z (2018a). Quadratic majorization for nonconvex loss with applications to the boosting algorithm. Journal of Computational and Graphical Statistics, 27(3): 491–502. https://doi.org/10.1080/10618600.2018.1424635
 
Wang Z (2018b). Robust boosting with truncated loss functions. Electronic Journal of Statistics, 12(1): 599–650. https://doi.org/10.1214/18-EJS1434
 
Wang Z (2024a). irboost: Iteratively Reweighted Boosting for Robust Analysis. R package version 0.1-15.
 
Wang Z (2024b). Unified robust estimation. Australian & New Zealand Journal of Statistics, 66(1): 77–102. https://doi.org/10.1111/anzs.12409
 
Wang Z, Hothorn T (2023). bst: Gradient Boosting. R package version 0.3-24.
 
Wu Y, Liu Y (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479): 974–983. https://doi.org/10.1198/016214507000000617
 
Zhao L, Mammadov M, Yearwood J (2010). From convex to nonconvex: A loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, 1281–1288. IEEE.

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
boosting CC-family IRBoost IRCO machine learning robust method

Funding
This work was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health under Award Number R21DK130006.

Metrics
since February 2021
250

Article info
views

122

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy