Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 22, Issue 4 (2024)
  4. Tuning Support Vector Machines and Boost ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Tuning Support Vector Machines and Boosted Trees Using Optimization Algorithms
Volume 22, Issue 4 (2024), pp. 575–590
Jill F. Lundell ORCID icon link to view author Jill F. Lundell details  

Authors

 
Placeholder
https://doi.org/10.6339/23-JDS1106
Pub. online: 5 July 2023      Type: Computing In Data Science      Open accessOpen Access

Received
17 March 2023
Accepted
29 May 2023
Published
5 July 2023

Abstract

Statistical learning methods have been growing in popularity in recent years. Many of these procedures have parameters that must be tuned for models to perform well. Research has been extensive in neural networks, but not for many other learning methods. We looked at the behavior of tuning parameters for support vector machines, gradient boosting machines, and adaboost in both a classification and regression setting. We used grid search to identify ranges of tuning parameters where good models can be found across many different datasets. We then explored different optimization algorithms to select a model across the tuning parameter space. Models selected by the optimization algorithm were compared to the best models obtained through grid search to select well performing algorithms. This information was used to create an R package, EZtune, that automatically tunes support vector machines and boosted trees.

Supplementary material

 Supplementary Material
The following supplementary material are available: Appendixes A: Description of optimization algorithms B: Performance tables R-package for EZtune: R-package EZtune that can implement autotuning of SVMs, GBMs, and adaboost using the Hooke-Jeeves algorithm and genetic algorithm. The package also contains Lichen and Mullein datasets used in the examples in the article. The package is currently available on CRAN and updates are available at https://github.com/jillbo1000/EZtune (GNU zipped tar file). Code and data for creating grids and performing optimization tests: The code and data used to create the error and time response surfaces and the code for testing the optimization algorithms is available at https://github.com/jillbo1000/autotune.

References

 
Bates D, Mullen KM, Nash JC, Varadhan R (2022). minqa: Derivative-free optimization algorithms by quadratic approximation. R package version 1.2.5.
 
Birgin EG, Martínez JM, Raydan M (2000). Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10(4): 1196–1211. https://doi.org/10.1137/S1052623497330963
 
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32. https://doi.org/10.1023/A:1010933404324
 
Byrd RH, Lu P, Nocedal J, Zhu C (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5): 1190–1208. https://doi.org/10.1137/0916069
 
Cortes C, Vapnik V (1995). Support-vector networks. Machine Learning, 20(3): 273–297.
 
Culp M, Johnson K, Michailidis G (2016). ada: The R package ada for stochastic boosting. R package version 2.0-5.
 
Dai YH, Yuan Y (2001). An efficient hybrid conjugate gradient method for unconstrained optimization. Annals of Operations Research, 103(1–4): 33–47. https://doi.org/10.1023/A:1012930416777
 
De Cock D (2011). Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. Journal of Statistics Education, 19: 3.
 
Freund Y, Schapire RE (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): 119–139. https://doi.org/10.1006/jcss.1997.1504
 
Friedman JH (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): 1189–1232. https://doi.org/10.1214/aos/1013203451
 
Goldberg D (1999). Genetic algorithms in search optimization and machine learning. Addison-Wesley Longman Publishing Company, Boston, MA, USA.
 
Greenwell B, Boehmke B, Cunningham J, Developers G (2022). gbm: Generalized Boosted Regression Models. R package version 2.1.8.1.
 
Hastie T, Tibshirani R, Friedman J (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer, New York, NY, USA.
 
Hooke R, Jeeves TA (1961). “Direct Search” solution of numerical and statistical problems. Journal of the ACM, 8(2): 212–229. https://doi.org/10.1145/321062.321069
 
Kaggle (2019). Ames housing dataset. https://www.kaggle.com/datasets/prevek18/ames-housing-dataset. Accessed: 2019-02-13.
 
Kelley CT (1999). Iterative methods for optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
 
Kuhn M, Johnson K (2018). AppliedPredictiveModeling: Functions and data sets for ‘Applied Predictive Modeling’. R package version 1.1-7.
 
Kuiper S, Sklar J (2013). Practicing statistics: Guided investigations for the second course. Pearson, Boston, MA, USA.
 
Lundell J (2023). Eztune: A package for automated hyperparameter tuning in R. arXiv preprint arXiv:2303.12177.
 
Lundell JF (2017). There has to be an easier way: A simple alternative for parameter tuning of supervised learning methods. In: JSM Proceedings, Statistical Computing Section, 3028–3036. American Statistical Association, Alexandria, VA.
 
Lundell JF (2019). Tuning hyperparameters in supervised learning models and applications of statistical learning in genome-wide association studies with emphasis on heritability, Ph.D. thesis, Utah State University.
 
Mahdavi M, Fesanghary M, Damangir E (2007). An improved harmony search algorithm for solving optimization problems. Applied Mathematics and Computation, 188(2): 1567–1579. https://doi.org/10.1016/j.amc.2006.11.033
 
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2022). e1071: Misc functions of the department of statistics. probability theory group (Formerly: E1071), TU Wien. R package version 1.7-12.
 
Mirjalili S (2015a). The ant lion optimizer. Advances in Engineering Software, 83: 80–98. https://doi.org/10.1016/j.advengsoft.2015.01.010
 
Mirjalili S (2015b). Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Systems, 89: 228–249. https://doi.org/10.1016/j.knosys.2015.07.006
 
Mirjalili S (2016a). Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Computing & Applications, 27(4): 1053–1073. https://doi.org/10.1007/s00521-015-1920-1
 
Mirjalili S (2016b). SCA: A sine cosine algorithm for solving optimization problems. Knowledge-Based Systems, 96: 120–133. https://doi.org/10.1016/j.knosys.2015.12.022
 
Mirjalili S, Lewis A (2016). The whale optimization algorithm. Advances in Engineering Software, 95: 51–67. https://doi.org/10.1016/j.advengsoft.2016.01.008
 
Mirjalili S, Mirjalili SM, Lewis A (2014). Grey wolf optimizer. Advances in Engineering Software, 69: 46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007
 
Nash JC (2014a). On best practice optimization methods in R. Journal of Statistical Software, 60(2): 1–14. https://doi.org/10.18637/jss.v060.i02
 
Nash JC (2014b). Rcgmin: Conjugate gradient minimization of nonlinear functions. R package version 2013-2.21.
 
Nash JC, Zhu C, Byrd R, Nocedal J, Morales JL (2020). lbfgsb3: Limited memory BFGS minimizer with bounds on parameters. R package version 2020-3.2.
 
Newman D, Hettich S, Blake C, Merz C (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
 
Powell MJD (2009). The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, 26–46.
 
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
 
Saremi S, Mirjalili S, Lewis A (2017). Grasshopper optimisation algorithm: Theory and application. Advances in Engineering Software, 105: 30–47. https://doi.org/10.1016/j.advengsoft.2017.01.004
 
Schumacher C, Vose MD, Whitley LD (2001). The no free lunch and problem description length. In: Spector L, Goodman ED, Wu A, Langdon WB, Voight HM (eds.), Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, 565–570. Morgan Kaufmann Publishers Inc.
 
Scrucca L (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4): 1–37. https://doi.org/10.18637/jss.v053.i04
 
Septem Riza L, Iip, Prasetyo Nugroho E (2017). metaheuristicOpt: Metaheuristic for optimization. R package version 1.0.0.
 
Shi Y, Eberhart R (1998). A modified particle swarm optimizer. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), 69–73. IEEE.
 
Smola AJ, Schölkopf B (2004). A tutorial on support vector regression. Statistics and Computing, 14(3): 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
 
Varadhan R, Gilbert P (2009). BB: An R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. Journal of Statistical Software, 32(4): 1–26. https://doi.org/10.18637/jss.v032.i04
 
Varadhan R, Hopkins University J, Borchers HW (2020). dfoptim: Derivative-free optimization. In: ABB Corporate Research. R package version 2020.10-1.
 
Yang XS (2009). Firefly algorithms for multimodal optimization. In: Watanabe O, Zeugmann T (eds.), International Symposium on Stochastic Algorithms, 169–178. Springer.

Related articles PDF XML
Related articles PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
machine learning optimization R programming

Metrics
since February 2021
409

Article info
views

331

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy