Supplementary Material

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1175

10.6339/25-JDS1175

Statistical Data Science

The Double Descent Behavior in Two Layer Neural Network for Binary Classification

https://orcid.org/0009-0008-3914-5107

Abeykoon

Chathurika S.

abeykoonc@rhodes.edu1∗

https://orcid.org/0000-0002-3110-4799

Beknazaryan

Aleksandr

https://orcid.org/0000-0002-9155-4636

Sang

Hailin

3 1Department of Mathematics, 2000 North Pkwy, Rhodes College, Memphis, TN, 38112, United States 2Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221, United States 3Department of Mathematics, University of Mississippi, University, MS, 38677, United States

∗Corresponding author. Email: abeykoonc@rhodes.edu.

2025

142025

232370388

Supplementary Material

We have included two supplementary files where Supplementary material 1 contains detailed calculations, theorems and proofs and Supplementary material 2 contains the R/RStudio codes used to draw the curves presented in the paper.

1102024632025

2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

2025

Open access article under the CC BY license.

Recent studies observed a surprising concept on model test error called the double descent phenomenon where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this, we work on a two-layer neural network model with a ReLU activation function designed for binary classification under supervised learning. Our aim is to observe and investigate the mathematical theory behind the double descent behavior of model test error for varying model sizes. We quantify the model size by the ration of number of training samples to the dimension of the model. Due to the complexity of the empirical risk minimization procedure, we use the Convex Gaussian MinMax Theorem to find a suitable candidate for the global training loss.

Keywords generalization error model complexity over and under parameterization ReLU activation testing error

The research of Hailin Sang is partially supported by the Simons Foundation Grant 586789, USA.

References

Advani

, Saxe

, Sompolinsky

(2020). High-dimensional dynamics of generalization error in neural networks. Neural Networks, 132: 428–446. https://doi.org/10.1016/j.neunet.2020.08.022

Amir

, Koren

, Livni

(2021). Sgd generalizes better than gd (and regularization doesn’t help). In: Conference on Learning Theory (

Belkin,

Kpotufe, eds.), 63–92. PMLR.

Belkin

, Hsu

, Ma

, Mandal

(2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32): 15849–15854.

Bhavsar

, Ganatra

(2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering, 2(4): 2231–2307.

Bonaccorso

(2018). Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning. Packt Publishing Ltd.

D’Ascoli

, Refinetti

, Biroli

, Krzakala

(2020). Double trouble in double descent: Bias and variance(s) in the lazy regime. In: Proceedings of the 37th International Conference on Machine Learning (

III,

Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 2280–2290. PMLR.

Deng

, Kammoun

, Thrampoulidis

(2022). A model of double descent for high-dimensional binary linear classification. Information and Inference, 11(2): 435–495.

Geiger

, Jacot

, Spigler

, Gabriel

, Sagun

, d’Ascoli

, et al. (2020). Scaling description of generalization with number of parameters in deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2020(2): 023401.

Hutter

, Kotthoff

, Vanschoren

(2019). Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.

Kini

, Thrampoulidis

(2020). Analytic study of double descent in binary classification: The impact of loss. In: 2020 IEEE International Symposium on Information Theory (ISIT), 2527–2532. IEEE.

Lee

, Cherkassky

(2024). Understanding double descent using vc-theoretical framework. IEEE Transactions on Neural Networks and Learning Systems, 169: 242–256.

Mahesh

, et al. (2020). Machine learning algorithms-a review. International Journal of Science and Research, 9(1): 381–386.

Mignacco

, Krzakala

, Lu

, Urbani

, Zdeborova

(2020). The role of regularization in classification of high-dimensional noisy Gaussian mixture. In: Proceedings of the 37th International Conference on Machine Learning (

III,

Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 6874–6883. PMLR.

Nakkiran

(2019). More data can hurt for linear regression: Sample-wise double descent. arXiv preprint: https://arxiv.org/abs/1912.07242.

Nakkiran

, Kaplun

, Bansal

, Yang

, Barak

, Sutskever

(2021). Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12): 124003.

Nakkiran

, Venkat

, Kakade

, Ma

(2020). Optimal regularization can mitigate double descent. arXiv preprint: https://arxiv.org/abs/2003.01897.

Simon

, Blume

, et al. (1994). Mathematics for Economists, volume 7. Norton, New York.

Spigler

, Geiger

, d’Ascoli

, Sagun

, Biroli

, Wyart

(2019). A jamming transition from under-to over-parametrization affects generalization in deep learning. Journal of Physics A: Mathematical and Theoretical, 52(47): 474001.

Thrampoulidis

, Oymak

, Hassibi

(2014). The gaussian min-max theorem in the presence of convexity. arXiv preprint: https://arxiv.org/abs/1408.4837.

Thrampoulidis

, Oymak

, Hassibi

(2015). Regularized linear regression: A precise analysis of the estimation error. In: Conference on Learning Theory (

Grünwald,

Hazan,

Kale, eds.), volume 40, 1683–1709. PMLR.