<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1175</article-id>
<article-id pub-id-type="doi">10.6339/25-JDS1175</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>The Double Descent Behavior in Two Layer Neural Network for Binary Classification</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0008-3914-5107</contrib-id>
<name><surname>Abeykoon</surname><given-names>Chathurika S.</given-names></name><email xlink:href="mailto:abeykoonc@rhodes.edu">abeykoonc@rhodes.edu</email><xref ref-type="aff" rid="j_jds1175_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-3110-4799</contrib-id>
<name><surname>Beknazaryan</surname><given-names>Aleksandr</given-names></name><xref ref-type="aff" rid="j_jds1175_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-9155-4636</contrib-id>
<name><surname>Sang</surname><given-names>Hailin</given-names></name><xref ref-type="aff" rid="j_jds1175_aff_003">3</xref>
</contrib>
<aff id="j_jds1175_aff_001"><label>1</label>Department of Mathematics, 2000 North Pkwy, <institution>Rhodes College</institution>, Memphis, TN, 38112, <country>United States</country></aff>
<aff id="j_jds1175_aff_002"><label>2</label>Department of Mathematical Sciences, <institution>University of Cincinnati</institution>, Cincinnati, OH 45221, <country>United States</country></aff>
<aff id="j_jds1175_aff_003"><label>3</label>Department of Mathematics, <institution>University of Mississippi</institution>, University, MS, 38677, <country>United States</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:abeykoonc@rhodes.edu">abeykoonc@rhodes.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>1</day><month>4</month><year>2025</year></pub-date><volume>23</volume><issue>2</issue><fpage>370</fpage><lpage>388</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1175_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>We have included two supplementary files where Supplementary material 1 contains detailed calculations, theorems and proofs and Supplementary material 2 contains the R/RStudio codes used to draw the curves presented in the paper.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>1</day><month>10</month><year>2024</year></date><date date-type="accepted"><day>6</day><month>3</month><year>2025</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Recent studies observed a surprising concept on model test error called the double descent phenomenon where the increasing model complexity decreases the test error first and then the error increases and decreases again. To observe this, we work on a two-layer neural network model with a ReLU activation function designed for binary classification under supervised learning. Our aim is to observe and investigate the mathematical theory behind the double descent behavior of model test error for varying model sizes. We quantify the model size by the ration of number of training samples to the dimension of the model. Due to the complexity of the empirical risk minimization procedure, we use the Convex Gaussian MinMax Theorem to find a suitable candidate for the global training loss.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>generalization error</kwd>
<kwd>model complexity</kwd>
<kwd>over and under parameterization</kwd>
<kwd>ReLU activation</kwd>
<kwd>testing error</kwd>
</kwd-group>
<funding-group><funding-statement>The research of Hailin Sang is partially supported by the Simons Foundation Grant 586789, USA.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1175_reflist_001">
<title>References</title>
<ref id="j_jds1175_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Advani</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Saxe</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Sompolinsky</surname> <given-names>H</given-names></string-name> (<year>2020</year>). <article-title>High-dimensional dynamics of generalization error in neural networks</article-title>. <source><italic>Neural Networks</italic></source>, <volume>132</volume>: <fpage>428</fpage>–<lpage>446</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.neunet.2020.08.022" xlink:type="simple">https://doi.org/10.1016/j.neunet.2020.08.022</ext-link></mixed-citation>
</ref>
<ref id="j_jds1175_ref_002">
<mixed-citation publication-type="chapter"> <string-name><surname>Amir</surname> <given-names>I</given-names></string-name>, <string-name><surname>Koren</surname> <given-names>T</given-names></string-name>, <string-name><surname>Livni</surname> <given-names>R</given-names></string-name> (<year>2021</year>). <chapter-title>Sgd generalizes better than gd (and regularization doesn’t help)</chapter-title>. In: <source><italic>Conference on Learning Theory</italic></source> (<string-name><given-names>M</given-names> <surname>Belkin</surname></string-name>, <string-name><given-names>S</given-names> <surname>Kpotufe</surname></string-name>, eds.), <fpage>63</fpage>–<lpage>92</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Belkin</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hsu</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mandal</surname> <given-names>S</given-names></string-name> (<year>2019</year>). <article-title>Reconciling modern machine-learning practice and the classical bias–variance trade-off</article-title>. <source><italic>Proceedings of the National Academy of Sciences</italic></source>, <volume>116</volume>(<issue>32</issue>): <fpage>15849</fpage>–<lpage>15854</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Bhavsar</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ganatra</surname> <given-names>A</given-names></string-name> (<year>2012</year>). <article-title>A comparative study of training algorithms for supervised machine learning</article-title>. <source><italic>International Journal of Soft Computing and Engineering</italic></source>, <volume>2</volume>(<issue>4</issue>): <fpage>2231</fpage>–<lpage>2307</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_005">
<mixed-citation publication-type="book"> <string-name><surname>Bonaccorso</surname> <given-names>G</given-names></string-name> (<year>2018</year>). <source><italic>Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning</italic></source>. <publisher-name>Packt Publishing Ltd.</publisher-name></mixed-citation>
</ref>
<ref id="j_jds1175_ref_006">
<mixed-citation publication-type="chapter"> <string-name><surname>D’Ascoli</surname> <given-names>S</given-names></string-name>, <string-name><surname>Refinetti</surname> <given-names>M</given-names></string-name>, <string-name><surname>Biroli</surname> <given-names>G</given-names></string-name>, <string-name><surname>Krzakala</surname> <given-names>F</given-names></string-name> (<year>2020</year>). <chapter-title>Double trouble in double descent: Bias and variance(s) in the lazy regime</chapter-title>. In: <source><italic>Proceedings of the 37th International Conference on Machine Learning</italic></source> (<string-name><given-names>HD</given-names> <surname>III</surname></string-name>, <string-name><given-names>A</given-names> <surname>Singh</surname></string-name>, eds.), volume <volume>119</volume> of <series>Proceedings of Machine Learning Research</series>, <fpage>2280</fpage>–<lpage>2290</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Deng</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Kammoun</surname> <given-names>A</given-names></string-name>, <string-name><surname>Thrampoulidis</surname> <given-names>C</given-names></string-name> (<year>2022</year>). <article-title>A model of double descent for high-dimensional binary linear classification</article-title>. <source><italic>Information and Inference</italic></source>, <volume>11</volume>(<issue>2</issue>): <fpage>435</fpage>–<lpage>495</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Geiger</surname> <given-names>M</given-names></string-name>, <string-name><surname>Jacot</surname> <given-names>A</given-names></string-name>, <string-name><surname>Spigler</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gabriel</surname> <given-names>F</given-names></string-name>, <string-name><surname>Sagun</surname> <given-names>L</given-names></string-name>, <string-name><surname>d’Ascoli</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Scaling description of generalization with number of parameters in deep learning</article-title>. <source><italic>Journal of Statistical Mechanics: Theory and Experiment</italic></source>, <volume>2020</volume>(<issue>2</issue>): <fpage>023401</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_009">
<mixed-citation publication-type="book"> <string-name><surname>Hutter</surname> <given-names>F</given-names></string-name>, <string-name><surname>Kotthoff</surname> <given-names>L</given-names></string-name>, <string-name><surname>Vanschoren</surname> <given-names>J</given-names></string-name> (<year>2019</year>). <source><italic>Automated Machine Learning: Methods, Systems, Challenges</italic></source>. <publisher-name>Springer Nature</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_010">
<mixed-citation publication-type="chapter"> <string-name><surname>Kini</surname> <given-names>GR</given-names></string-name>, <string-name><surname>Thrampoulidis</surname> <given-names>C</given-names></string-name> (<year>2020</year>). <chapter-title>Analytic study of double descent in binary classification: The impact of loss</chapter-title>. In: <source><italic>2020 IEEE International Symposium on Information Theory (ISIT)</italic></source>, <fpage>2527</fpage>–<lpage>2532</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Lee</surname> <given-names>EH</given-names></string-name>, <string-name><surname>Cherkassky</surname> <given-names>V</given-names></string-name> (<year>2024</year>). <article-title>Understanding double descent using vc-theoretical framework</article-title>. <source><italic>IEEE Transactions on Neural Networks and Learning Systems</italic></source>, <volume>169</volume>: <fpage>242</fpage>–<lpage>256</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Mahesh</surname> <given-names>B</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Machine learning algorithms-a review</article-title>. <source><italic>International Journal of Science and Research</italic></source>, <volume>9</volume>(<issue>1</issue>): <fpage>381</fpage>–<lpage>386</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_013">
<mixed-citation publication-type="chapter"> <string-name><surname>Mignacco</surname> <given-names>F</given-names></string-name>, <string-name><surname>Krzakala</surname> <given-names>F</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Urbani</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zdeborova</surname> <given-names>L</given-names></string-name> (<year>2020</year>). <chapter-title>The role of regularization in classification of high-dimensional noisy Gaussian mixture</chapter-title>. In: <source><italic>Proceedings of the 37th International Conference on Machine Learning</italic></source> (<string-name><given-names>HD</given-names> <surname>III</surname></string-name>, <string-name><given-names>A</given-names> <surname>Singh</surname></string-name>, eds.), volume <volume>119</volume> of <series>Proceedings of Machine Learning Research</series>, <fpage>6874</fpage>–<lpage>6883</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_014">
<mixed-citation publication-type="other"> <string-name><surname>Nakkiran</surname> <given-names>P</given-names></string-name> (<year>2019</year>). More data can hurt for linear regression: Sample-wise double descent. arXiv preprint: <uri>https://arxiv.org/abs/1912.07242</uri>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Nakkiran</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kaplun</surname> <given-names>G</given-names></string-name>, <string-name><surname>Bansal</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Barak</surname> <given-names>B</given-names></string-name>, <string-name><surname>Sutskever</surname> <given-names>I</given-names></string-name> (<year>2021</year>). <article-title>Deep double descent: Where bigger models and more data hurt</article-title>. <source><italic>Journal of Statistical Mechanics: Theory and Experiment</italic></source>, <volume>2021</volume>(<issue>12</issue>): <fpage>124003</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_016">
<mixed-citation publication-type="other"> <string-name><surname>Nakkiran</surname> <given-names>P</given-names></string-name>, <string-name><surname>Venkat</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kakade</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>T</given-names></string-name> (<year>2020</year>). Optimal regularization can mitigate double descent. arXiv preprint: <uri>https://arxiv.org/abs/2003.01897</uri>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_017">
<mixed-citation publication-type="book"> <string-name><surname>Simon</surname> <given-names>CP</given-names></string-name>, <string-name><surname>Blume</surname> <given-names>L</given-names></string-name>, <etal>et al.</etal> (<year>1994</year>). <source><italic>Mathematics for Economists</italic></source>, volume <volume>7</volume>. <publisher-name>Norton</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Spigler</surname> <given-names>S</given-names></string-name>, <string-name><surname>Geiger</surname> <given-names>M</given-names></string-name>, <string-name><surname>d’Ascoli</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sagun</surname> <given-names>L</given-names></string-name>, <string-name><surname>Biroli</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wyart</surname> <given-names>M</given-names></string-name> (<year>2019</year>). <article-title>A jamming transition from under-to over-parametrization affects generalization in deep learning</article-title>. <source><italic>Journal of Physics A: Mathematical and Theoretical</italic></source>, <volume>52</volume>(<issue>47</issue>): <fpage>474001</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_019">
<mixed-citation publication-type="other"> <string-name><surname>Thrampoulidis</surname> <given-names>C</given-names></string-name>, <string-name><surname>Oymak</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hassibi</surname> <given-names>B</given-names></string-name> (<year>2014</year>). The gaussian min-max theorem in the presence of convexity. arXiv preprint: <uri>https://arxiv.org/abs/1408.4837</uri>.</mixed-citation>
</ref>
<ref id="j_jds1175_ref_020">
<mixed-citation publication-type="chapter"> <string-name><surname>Thrampoulidis</surname> <given-names>C</given-names></string-name>, <string-name><surname>Oymak</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hassibi</surname> <given-names>B</given-names></string-name> (<year>2015</year>). <chapter-title>Regularized linear regression: A precise analysis of the estimation error</chapter-title>. In: <source><italic>Conference on Learning Theory</italic></source> (<string-name><given-names>P</given-names> <surname>Grünwald</surname></string-name>, <string-name><given-names>E</given-names> <surname>Hazan</surname></string-name>, <string-name><given-names>S</given-names> <surname>Kale</surname></string-name>, eds.), volume <volume>40</volume>, <fpage>1683</fpage>–<lpage>1709</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
