Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science
  4. Decision Tree-Based Predictive Models fo ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Decision Tree-Based Predictive Models for Academic Achievement Using College Students’ Support Networks
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 557–577
Anthony Frazier   Joethi Silva   Rachel Meilak     All authors (6)

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS1033
Pub. online: 30 December 2021      Type: Data Science In Action      Open accessOpen Access

Received
15 September 2021
Accepted
27 November 2021
Published
30 December 2021

Abstract

In this study, we examine a set of primary data collected from 484 students enrolled in a large public university in the Mid-Atlantic United States region during the early stages of the COVID-19 pandemic. The data, called Ties data, included students’ demographic and support network information. The support network data comprised of information that highlighted the type of support, (i.e. emotional or educational; routine or intense). Using this data set, models for predicting students’ academic achievement, quantified by their self-reported GPA, were created using Chi-Square Automatic Interaction Detection (CHAID), a decision tree algorithm, and cforest, a random forest algorithm that uses conditional inference trees. We compare the methods’ accuracy and variation in the set of important variables suggested by each algorithm. Each algorithm found different variables important for different student demographics with some overlap. For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement. The presence of differing types of routine support were important in predicting academic achievement for cisgender women, while differing types of intense support were important in predicting academic achievement for cisgender men.

Supplementary material

 Supplementary Material
Supplemental material linked to the online version of the paper includes R codes implementing the CHAID and cforest algorithms and an example dataset used to demonstrate the codes.

References

 
Al-Barrak MA, Al-Razgan M (2016). Predicting students final GPA using decision trees: A case study. International Journal of Information and Education Technology, 6(7): 528–533.
 
Arcidiacono P, Aucejo EM, Spenner K (2012). What happens after enrollment? An analysis of the time path of racial differences in GPA and major choice. IZA Journal of Labor Economics, 1(1): 1–24.
 
Azizi Y (2013). The study of the relationship among emotional intelligence, peer social support, and family social support and GPA among Iranian high school students. European Online Journal of Natural and Social Sciences: Proceedings, 2(2s): 650.
 
Behr A, Giese M, Teguim K HD, Theune K (2020). Early prediction of university dropouts: A random forest approach. Jahrbücher für Nationalökonomie und Statistik, 240(6): 743–789.
 
Benali L, Notton G, Fouilloy A, Voyant C, Dizene R (2019). Solar radiation forecasting using artificial neural network and random forest methods: Application to normal beam, horizontal diffuse and global components. Renewable Energy, 132: 871–884.
 
Breiman L (1999). Random forests. UC Berkeley TR567.
 
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32.
 
Breiman L, Cutler A (2004). Random forest-manual. Online: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_manual.htm.
 
Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and regression trees. Wadsworth Int. Group, 37(15): 237–251.
 
Brooks JE (2015). The impact of family structure, relationships, and support on african american students’ collegiate experiences. Journal of Black Studies, 46(8): 817–836.
 
Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, et al. (2020). Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Science of the Total Environment, 701: 134979–134979.
 
Cheng W, Ickes W, Verhofstadt L (2012). How is family support related to students’ GPA scores?: A longitudinal study. Higher Education, 64(3): 399–420.
 
China FT (2015). The relationship between social support, social adjustment, academic adjustment, and academic performance among college students in Tanzania, Ph.D. thesis, The Open University Of Tanzania.
 
DeBerard MS, Spielmans GI, Julka DL (2004). Predictors of academic achievement and retention among college freshmen: A longitudinal study. College Student Journal, 38(1): 66–81.
 
Dennis JM, Phinney JS, Chuateco LI (2005). The role of motivation, parental support, and peer support in the academic success of ethnic minority first-generation college students. Journal of College Student Development, 46(3): 223–236.
 
Eggens L, Van der Werf M, Bosker R (2008). The influence of personal networks and social support on study attainment of students in university education. Higher Education, 55(5): 553–573.
 
Fernandez-Lozano C, Hervella P, Mato-Abad V, Rodríguez-Yáñez M, Suárez-Garaboa S, López-Dequidt I, et al. (2021). Random forest-based prediction of stroke outcome. Scientific Reports, 11(1): 10071–10071.
 
Fiebig JN, Braid BL, Ross PA, Tom MA, Prinzo C (2010). Hispanic community college students: Acculturation, family support, perceived educational barriers, and vocational planning. Community College Journal of Research and Practice, 34(10): 848–864.
 
Fletcher J, Tienda M (2010). Race and ethnic differences in college achievement: Does high school attended matter? The Annals of the American Academy of Political and Social Science, 627(1): 144–166.
 
Fortin NM, Oreopoulos P, Phipps S (2015). Leaving boys behind: Gender disparities in high academic achievement. The Journal of Human Resources, 50(3): 549–579.
 
Fung KY (2015). Network diversity and educational attainment: A case study in China. The Journal of Chinese Sociology, 2(1): 1–20.
 
Gomes CMA, Lemos GC, Jelihovschi EG (2020). Comparing the predictive power of the cart and ctree algorithms. Revista Avaliação Psicológica, 19(1): 87–96.
 
Heard HE (2007). Fathers, mothers, and family structure: Family trajectories, parent gender, and adolescent schooling. Journal of Marriage and the Family, 69(2): 435–450.
 
Ho SH, Jee SH, Lee JE, Park JS (2004). Analysis on risk factors for cervical cancer using induction technique. Expert Systems with Applications, 27(1): 97–105.
 
Ho TK (1995). Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, volume 1, 278–282. IEEE.
 
Hothorn T, Buehlmann P, Dudoit S, Molinaro A, Van Der Laan M (2006a). Survival ensembles. Biostatistics, 7(3): 355–373.
 
Hothorn T, Hornik K, Zeileis A (2006b). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3): 651–674.
 
James G, Witten D, Hastie T, Tibshirani R (2013). An Introduction to Statistical Learning: With Applications in R, volume 103 of Springer Texts in Statistics. Springer, New York.
 
Janitza S, Hornung R (2018). On the overestimation of random forest’s out-of-bag error. PLoS ONE, 13(8): e0201904.
 
Kashy-Rosenbaum G, Kaplan O, Israel-Cohen Y (2018). Predicting academic achievement by class-level emotions and perceived homeroom teachers’ emotional support. Psychology in the Schools, 55(7): 770–782.
 
Kass GV (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society. Series C. Applied Statistics, 29(2): 119–127.
 
Kovacic ZJ (2010). Early prediction of student success: Mining students enrolment data. In: Proceedings of Informing Science & IT Education Conference (InSITE). Open Polytechnic, Wellington, New Zealand.
 
Kuchynka SL, Salomon K, Bosson JK, El-Hout M, Kiebel E, Cooperman C, et al. (2018). Hostile and benevolent sexism and college women’s STEM outcomes. Psychology of Women Quarterly, 42(1): 72–87.
 
Kuhn M (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(1): 1–26.
 
Li J, Han X, Wang W, Sun G, Cheng Z (2018). How social support influences university students’ academic achievement and emotional exhaustion: The mediating role of self-esteem. Learning and Individual Differences, 61: 120–126.
 
Li X, Wang YW, Kim YH (2020). The moderation of parental support on the relationship between race-related career barriers and academic achievement. Journal of Career Development. https://doi.org/10.1177/0894845320937353.
 
McArdle JJ, Ritschard G (2013). Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences. Routledge, New York.
 
McCarthy RV, McCarthy MM, Ceccucci W, Halawi L (2019). Predictive models using decision trees. In: Applying Predictive Analytics, 123–144. Springer.
 
Michael J, Gordon SL (1997). Data Mining Technique for Marketing, Sales and Customer Support. John Wiley & Sons Inc., New York.
 
Mohapatra N, Shreya K, Chinmay A (2020). Optimization of the random forest algorithm. In: Advances in Data Science and Management, Lecture Notes on Data Engineering and Communications Technologies, 201–208. Springer, Singapore, Singapore.
 
Nicpon MF, Huser L, Blanks EH, Sollenberger S, Befort C, Kurpius SER (2006). The relationship of loneliness and social support with college freshmen’s academic performance and persistence. Journal of College Student Retention, 8(3): 345–358.
 
Otte M, Correll N (2013). C-forest: Parallel shortest path planning with superlinear speedup. IEEE Transactions on Robotics, 29(3): 798–806.
 
Palacios AMG, Alvarez RD (2016). An analysis of nonfirst-generation community college men of color: Comparing GPA, noncognitive, and campus ethos differences across race. Community College Journal of Research and Practice, 40(3): 180–187.
 
Pitombo CS, de Souza AD, Lindner A (2017). Comparing decision tree algorithms to estimate intercity trip distribution. Transportation Research. Part C, Emerging Technologies, 77: 16–32.
 
Ratner B (2012). Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. Taylor & Francis, Boca Raton.
 
Sathyadevan S, Nair RR (2014). Comparative analysis of decision tree algorithms: Id3, c4.5 and random forest. In: Computational Intelligence in Data Mining – Volume 1, Smart Innovation, Systems and Technologies, 549–562. Springer, India, New Delhi.
 
Scott J (1988). Trend report social network analysis. Sociology, 22(1): 109–127.
 
Shirali GA, Noroozi MV, Malehi AS (2018). Predicting the outcome of occupational accidents by cart and chaid methods at a steel factory in Iran. Journal of Public Health Research, 7(2): 1361.
 
Sonnert G, Fox MF (2012). Women, men, and academic performance in science and engineering: The gender difference in undergraduate grade point averages. The Journal of Higher Education, 83(1): 73–101.
 
Strasser H, Weber C (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics, 8: 220–250.
 
Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9: 307.
 
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8(1): 25–25.
 
Sut N, Simsek O (2011). Comparison of regression tree data mining methods for prediction of mortality in head injury. Expert Systems with Applications, 38(12): 15534–15539.
 
Tucker K, Sharp G, Qingmin S, Scinta T, Thanki S (2020). Fostering historically underserved students’ success: An embedded peer support model that merges non-cognitive principles with proven academic support practices. Review of Higher Education, 43(3): 861–885.
 
Venkatasubramaniam A, Wolfson J, Mitchell N, Barnes T, JaKa M, French S (2017). Decision trees in epidemiological research. Emerging Themes in Epidemiology, 14(1): 1–12.
 
Webber S, Schwartz A, Kemper KJ, Batra M, Mahan JD, Babal JC, et al. (2021). Faculty and peer support during pediatric residency: Association with performance outcomes, race, and gender. Academic Pediatrics, 21(2): 366–374.
 
Zavatkay D (2015). Social support and community college student academic persistence. In: NERA Conference Proceedings 2015, 3.

PDF XML
PDF XML

Copyright
2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
cforest CHAID conditional inference trees egocentric network perceived social support support network

Metrics
since February 2021
2313

Article info
views

746

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy