The Double Descent Behavior in Two Layer Neural Network for Binary Classification

Abeykoon, Chathurika S.; Beknazaryan, Aleksandr; Sang, Hailin

doi:10.6339/25-JDS1175

Journal of Data Science

The Double Descent Behavior in Two Layer Neural Network for Binary Classification

Volume 23, Issue 2 (2025): Special Issue: the 2024 Symposium on Data Science and Statistics (SDSS), pp. 370–388

Chathurika S. Abeykoon

Aleksandr Beknazaryan

Hailin Sang

https://doi.org/10.6339/25-JDS1175

Pub. online: 1 April 2025 Type: Statistical Data Science

Open Access

Received
1 October 2024

Accepted
6 March 2025

Published
1 April 2025

Abstract

Recent studies observed a surprising concept on model test error called the double descent phenomenon where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this, we work on a two-layer neural network model with a ReLU activation function designed for binary classification under supervised learning. Our aim is to observe and investigate the mathematical theory behind the double descent behavior of model test error for varying model sizes. We quantify the model size by the ration of number of training samples to the dimension of the model. Due to the complexity of the empirical risk minimization procedure, we use the Convex Gaussian MinMax Theorem to find a suitable candidate for the global training loss.

Supplementary material

Supplementary Material

We have included two supplementary files where Supplementary material 1 contains detailed calculations, theorems and proofs and Supplementary material 2 contains the R/RStudio codes used to draw the curves presented in the paper.

References

Advani MS, Saxe AM, Sompolinsky H (2020). High-dimensional dynamics of generalization error in neural networks. Neural Networks, 132: 428–446. https://doi.org/10.1016/j.neunet.2020.08.022

Amir I, Koren T, Livni R (2021). Sgd generalizes better than gd (and regularization doesn’t help). In: Conference on Learning Theory (M Belkin, S Kpotufe, eds.), 63–92. PMLR.

Belkin M, Hsu D, Ma S, Mandal S (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32): 15849–15854.

Bhavsar H, Ganatra A (2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering, 2(4): 2231–2307.

Bonaccorso G (2018). Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning. Packt Publishing Ltd.

D’Ascoli S, Refinetti M, Biroli G, Krzakala F (2020). Double trouble in double descent: Bias and variance(s) in the lazy regime. In: Proceedings of the 37th International Conference on Machine Learning (HD III, A Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 2280–2290. PMLR.

Deng Z, Kammoun A, Thrampoulidis C (2022). A model of double descent for high-dimensional binary linear classification. Information and Inference, 11(2): 435–495.

Geiger M, Jacot A, Spigler S, Gabriel F, Sagun L, d’Ascoli S, et al. (2020). Scaling description of generalization with number of parameters in deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2020(2): 023401.

Hutter F, Kotthoff L, Vanschoren J (2019). Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.

Kini GR, Thrampoulidis C (2020). Analytic study of double descent in binary classification: The impact of loss. In: 2020 IEEE International Symposium on Information Theory (ISIT), 2527–2532. IEEE.

Lee EH, Cherkassky V (2024). Understanding double descent using vc-theoretical framework. IEEE Transactions on Neural Networks and Learning Systems, 169: 242–256.

Mahesh B, et al. (2020). Machine learning algorithms-a review. International Journal of Science and Research, 9(1): 381–386.

Mignacco F, Krzakala F, Lu Y, Urbani P, Zdeborova L (2020). The role of regularization in classification of high-dimensional noisy Gaussian mixture. In: Proceedings of the 37th International Conference on Machine Learning (HD III, A Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 6874–6883. PMLR.

Nakkiran P (2019). More data can hurt for linear regression: Sample-wise double descent. arXiv preprint: https://arxiv.org/abs/1912.07242.

Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I (2021). Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12): 124003.

Nakkiran P, Venkat P, Kakade S, Ma T (2020). Optimal regularization can mitigate double descent. arXiv preprint: https://arxiv.org/abs/2003.01897.

Simon CP, Blume L, et al. (1994). Mathematics for Economists, volume 7. Norton, New York.

Spigler S, Geiger M, d’Ascoli S, Sagun L, Biroli G, Wyart M (2019). A jamming transition from under-to over-parametrization affects generalization in deep learning. Journal of Physics A: Mathematical and Theoretical, 52(47): 474001.

Thrampoulidis C, Oymak S, Hassibi B (2014). The gaussian min-max theorem in the presence of convexity. arXiv preprint: https://arxiv.org/abs/1408.4837.

Thrampoulidis C, Oymak S, Hassibi B (2015). Regularized linear regression: A precise analysis of the estimation error. In: Conference on Learning Theory (P Grünwald, E Hazan, S Kale, eds.), volume 40, 1683–1709. PMLR.

2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

generalization error model complexity over and under parameterization ReLU activation testing error

Funding

The research of Hailin Sang is partially supported by the Simons Foundation Grant 586789, USA.

Metrics

since February 2021

197

Article info
views

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file