Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 23, Issue 2 (2025): Special Issue: the 2024 Symposium on Data Science and Statistics (SDSS)
  4. The Double Descent Behavior in Two Layer ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

The Double Descent Behavior in Two Layer Neural Network for Binary Classification
Volume 23, Issue 2 (2025): Special Issue: the 2024 Symposium on Data Science and Statistics (SDSS), pp. 370–388
Chathurika S. Abeykoon ORCID icon link to view author Chathurika S. Abeykoon details   Aleksandr Beknazaryan ORCID icon link to view author Aleksandr Beknazaryan details   Hailin Sang ORCID icon link to view author Hailin Sang details  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1175
Pub. online: 1 April 2025      Type: Statistical Data Science      Open accessOpen Access

Received
1 October 2024
Accepted
6 March 2025
Published
1 April 2025

Abstract

Recent studies observed a surprising concept on model test error called the double descent phenomenon where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this, we work on a two-layer neural network model with a ReLU activation function designed for binary classification under supervised learning. Our aim is to observe and investigate the mathematical theory behind the double descent behavior of model test error for varying model sizes. We quantify the model size by the ration of number of training samples to the dimension of the model. Due to the complexity of the empirical risk minimization procedure, we use the Convex Gaussian MinMax Theorem to find a suitable candidate for the global training loss.

Supplementary material

 Supplementary Material
We have included two supplementary files where Supplementary material 1 contains detailed calculations, theorems and proofs and Supplementary material 2 contains the R/RStudio codes used to draw the curves presented in the paper.

References

 
Advani MS, Saxe AM, Sompolinsky H (2020). High-dimensional dynamics of generalization error in neural networks. Neural Networks, 132: 428–446. https://doi.org/10.1016/j.neunet.2020.08.022
 
Amir I, Koren T, Livni R (2021). Sgd generalizes better than gd (and regularization doesn’t help). In: Conference on Learning Theory (M Belkin, S Kpotufe, eds.), 63–92. PMLR.
 
Belkin M, Hsu D, Ma S, Mandal S (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32): 15849–15854.
 
Bhavsar H, Ganatra A (2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering, 2(4): 2231–2307.
 
Bonaccorso G (2018). Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning. Packt Publishing Ltd.
 
D’Ascoli S, Refinetti M, Biroli G, Krzakala F (2020). Double trouble in double descent: Bias and variance(s) in the lazy regime. In: Proceedings of the 37th International Conference on Machine Learning (HD III, A Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 2280–2290. PMLR.
 
Deng Z, Kammoun A, Thrampoulidis C (2022). A model of double descent for high-dimensional binary linear classification. Information and Inference, 11(2): 435–495.
 
Geiger M, Jacot A, Spigler S, Gabriel F, Sagun L, d’Ascoli S, et al. (2020). Scaling description of generalization with number of parameters in deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2020(2): 023401.
 
Hutter F, Kotthoff L, Vanschoren J (2019). Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.
 
Kini GR, Thrampoulidis C (2020). Analytic study of double descent in binary classification: The impact of loss. In: 2020 IEEE International Symposium on Information Theory (ISIT), 2527–2532. IEEE.
 
Lee EH, Cherkassky V (2024). Understanding double descent using vc-theoretical framework. IEEE Transactions on Neural Networks and Learning Systems, 169: 242–256.
 
Mahesh B, et al. (2020). Machine learning algorithms-a review. International Journal of Science and Research, 9(1): 381–386.
 
Mignacco F, Krzakala F, Lu Y, Urbani P, Zdeborova L (2020). The role of regularization in classification of high-dimensional noisy Gaussian mixture. In: Proceedings of the 37th International Conference on Machine Learning (HD III, A Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 6874–6883. PMLR.
 
Nakkiran P (2019). More data can hurt for linear regression: Sample-wise double descent. arXiv preprint: https://arxiv.org/abs/1912.07242.
 
Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I (2021). Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12): 124003.
 
Nakkiran P, Venkat P, Kakade S, Ma T (2020). Optimal regularization can mitigate double descent. arXiv preprint: https://arxiv.org/abs/2003.01897.
 
Simon CP, Blume L, et al. (1994). Mathematics for Economists, volume 7. Norton, New York.
 
Spigler S, Geiger M, d’Ascoli S, Sagun L, Biroli G, Wyart M (2019). A jamming transition from under-to over-parametrization affects generalization in deep learning. Journal of Physics A: Mathematical and Theoretical, 52(47): 474001.
 
Thrampoulidis C, Oymak S, Hassibi B (2014). The gaussian min-max theorem in the presence of convexity. arXiv preprint: https://arxiv.org/abs/1408.4837.
 
Thrampoulidis C, Oymak S, Hassibi B (2015). Regularized linear regression: A precise analysis of the estimation error. In: Conference on Learning Theory (P Grünwald, E Hazan, S Kale, eds.), volume 40, 1683–1709. PMLR.

PDF XML
PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
generalization error model complexity over and under parameterization ReLU activation testing error

Funding
The research of Hailin Sang is partially supported by the Simons Foundation Grant 586789, USA.

Metrics
since February 2021
109

Article info
views

45

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy