Predicting Stunted Growth in Two Year Old Bangladeshi Children via the Super Learner
Pub. online: 4 August 2025
Type: Data Science In Action
Open Access
Received
14 November 2024
14 November 2024
Accepted
2 July 2025
2 July 2025
Published
4 August 2025
4 August 2025
Abstract
Stunted growth in children is a worldwide issue which may cause long term problems for individuals stunted as early as two years of age. However, predicting stunted growth with accuracy is quite complex, but machine learning poses a distinct advantage in this regard. While several techniques are available for predictive modeling, the Super Learner stands out as an ensemble method that integrates multiple algorithms into a single predictive model with enhanced performance. In this study, the Super Learner model, comprising generalized linear model, bagged trees, random forests, conditional random forest, stochastic gradient boosting, Bayesian additive regression trees, neural networks, and model averaged neural networks, achieved high performance with high area under the receiver operating characteristic curve, Brier Score, and the minimum of precision and recall values. However, after analyzing the results from cross validation, the final model selected was the Bayesian additive regression trees. Within the final model, the height-for-age z-score at one year, income, expenditure, anti-lipopolysaccharide antibody at week 6 and at week 18, plasma retinol binding protein at week 6, plasma soluble cluster designation 14 at week 18, fecal Reg 1B at week 12, vitamin D at week 18, mother’s weight and height at enrollment, fecal calprotectin at week 12, fecal myeloperoxidase at week 12, number of days of diarrhea through the first year of life, and the number of days of exclusive breastfeeding through the first year of life emerged as the top important variables for predicting stunted growth at two years of age.
Supplementary material
Supplementary MaterialThe supplementary material consists of the Supplementary Tables and Analyses PDF, a README file, R scripts to run all analyses, RData file with the data appropriately formatted for analyses, and RData files with the corresponding models. The Supplementary Tables and Analyses PDF file that contains data descriptions, summary of methods, and results for the NNLS optimization for the continuous outcome. The README file briefly explains each file.
References
Bleich J, Kapelner A, George EI, Jensen ST (2014). Variable selection for BART: An application to gene regulation. Annals of Applied Statistics, 8(3): 1750–1781. https://doi.org/10.1214/14-AOAS755
Boulesteix AL, Janitza S, Kruppa J, Konig IR (2012). Overview of random forest methodology and practical guidance with emphasis on comutaional biology and bioinformatics. WIREs Data Mining and Knowledge Discovery, 2(6): 493–507. https://doi.org/10.1002/widm.1072
Breiman L (1996). Bagging predictors. Machine Learning, 24: 123–140. https://doi.org/10.1023/A:1018054314350
Breiman L (2001). Random forests. Machine Learning, 45: 5–32. https://doi.org/10.1023/A:1010933404324
Butzin-Dozier Z, Ji Y, Coyle J, Malenica I, McQuade ETR, Grembi JA, et al. (2025). Treatment heterogeneity of water, sanitation, hygiene, and nutrition interventions on child growth by environmental enteric dysfunction and pathogen status for young children in Bangladesh. In: PLOS Neglected Tropical Diseases.
Campos AP, Vilar-Compte M, Hawkins SS (2020). Association between breastfeeding and child stunting in Mexico. Annals of Global Health, 86(1): 1–14. https://doi.org/10.5334/aogh.2836
Chipman HA, George EI, McCulloch RE (2010). BART: Bayesian additive regression trees. Annals of Applied Statistics, 4(1): 266–298. https://doi.org/10.1214/09-AOAS285
Davidson LA, Lönnerdal B (1990). Fecal alpha 1-antitrypsin in breast-fed infants is derived from human milk and is not indicative of enteric protein loss. Acta Paediatrica Scandinavica, 79(2): 137–141. https://doi.org/10.1111/j.1651-2227.1990.tb11429.x
Dewey KG, Begum K (2011). Long-term consequences of stunting in early life. Maternal and Child Nutrition, 7(Suppl 3): 5–18. https://doi.org/10.1111/j.1740-8709.2011.00349.x
Donowitz JR, Cook H, Alam M, Tofail F, Kabir M, Colgate ER, et al. (2018). Role of maternal health and infant inflammation in nutritional and neurodevelopmental outcomes of two-year-old Bangladeshi children. PLOS Neglected Tropical Diseases, 12(5): 1–20. https://doi.org/10.1371/journal.pntd.0006363
Dorosko SM, MacKenzie T, Connor RI (2008). Fecal calprotectin concentrations are higher in exclusively breastfed infants compared to those who are mixed-fed. Breastfeeding Medicine, 3(2): 117–119. PMID: 18564000. https://doi.org/10.1089/bfm.2007.0036
Fawcett T (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8): 861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Friedman JH (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5): 1189–1232. https://doi.org/10.1214/aos/1013203451
Friedman JH (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4): 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Goodman DS (1980). Plasma retinol-binding protein. Annals of the New York Academy of Sciences, 348: 378–390. https://doi.org/10.1111/j.1749-6632.1980.tb21314.x
Hoddinott J, Maluccio JA, Behrman JR, Flores R, Martorell R (2008). Effect of a nutrition intervention during early childhood on economic productivity in guatemalan adults. Lancet, 371(9610): 411–416. https://doi.org/10.1016/S0140-6736(08)60205-6
Kapelner A, Bleich J (2016). BartMachine: Machine learning with Bayesian additive regression trees. Journal of Statistical Software, 70(4): 1–40. https://doi.org/10.18637/jss.v070.i04
Kirkpatrick BD, Colgate ER, Mychaleckyj JC, Haque R, Dickson DM, Carmolli MP, et al. (2015). The “performance of rotavirus and oral polio vaccines in developing countries” (PROVIDE) study: Description of methods of an interventional study designed to explore complex biologic problems. The American Journal of Tropical Medicine and Hygiene, 92(4): 744–751. https://doi.org/10.4269/ajtmh.14-0518
Kuhn M (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5): 1–26. https://doi.org/10.18637/jss.v028.i05
Martorell R, Zongrone A (2012). Intergenerational influences on child growth and undernutrition. Paediatric and Perinatal Epidemiology, 26: 302–314. https://doi.org/10.1111/j.1365-3016.2012.01298.x
McDonald CM, Manji KP, Gosselin K, Tran H, Liu E, Kisenge R, et al. (2016). Elevations in serum anti-flagellin and anti-LPS igs are related to growth faltering in young Tanzanian children. The American Journal of Clinical Nutrition, 103(6): 1548–1554. https://doi.org/10.3945/ajcn.116.131409
Mertens A, Benjamin-Chung J, Colford JM Jr, Coyle J, van der Laan MJ, Hubbard AE, et al. (2023). Causes and consequences of child growth faltering in low-resource settings. Nature, 621: 568–576. https://doi.org/10.1038/s41586-023-06501-x
Naimi AI, Balzer LB (2018). Stacked generalization: An introduction to super learning. European Journal of Epidemiology, 33: 459–464. https://doi.org/10.1007/s10654-018-0390-z
Naylor C, Lu M, Haque R, Mondal D, Buonomo E, Nayak U, et al. (2015). Environmental enteropathy, oral vaccine failure and growth faltering in infants in Bangladesh. eBioMedicine, 2(11): 1759–1766. https://doi.org/10.1016/j.ebiom.2015.09.036
Olden JD, Joy MK, Death RG (2004). An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological Modelling, 178(3): 389–397. https://doi.org/10.1016/j.ecolmodel.2004.03.013
Peterson KM, Buss J, Easley R, Yang Z, Korpe PS, Niu F, et al. (2013). REG1B as a predictor of childhood stunting in Bangladesh and Peru. The American Journal of Clinical Nutrition, 97(5): 1129–1133. https://doi.org/10.3945/ajcn.112.048306
Phillips RV, van der Laan MJ, Lee H, Gruber S (2023). Practical considerations for specifying a super learner. International Journal of Epidemiology, 52(4): 1276–1285. https://doi.org/10.1093/ije/dyad023
Pirracchio R, Carone M (2016). The balance super learner: A robust adaptation of the super learner to improve estimation of the average treatment effect in the treated based on propensity score matching. Statistical Methods in Medical Research, 27(8): 2504–2518. https://doi.org/10.1177/0962280216682055
Prendergast AJ, Humphrey JH (2014). The stunting syndrome in developing countries. Paediatrics and International Child Health, 34(4): 250–265. https://doi.org/10.1179/2046905514Y.0000000158
Victora CG, Adair LS, Fall CHD, Hallal PC, Martorell R, Richter L, et al. (2008). Maternal and child undernutrition: Consequences for adult health and human capital. The Lancet, 371(9609): 340–357. https://doi.org/10.1016/S0140-6736(07)61692-4
Zambruni M, Ochoa TJ, Somasunderam A, Cabada MM, Morales ML, Mitreva M, et al. (2019). Stunting is preceded by intestinal mucosal damage and microbiome changes and is associated with systemic inflammation in a cohort of Peruvian infants. The American Journal of Tropical Medicine and Hygiene, 101(5): 1009–1017. https://doi.org/10.4269/ajtmh.18-0975