Identifying Prerequisite Courses in Undergraduate Biology Using Machine Learning
Volume 21, Issue 4 (2023), pp. 745–760
Pub. online: 20 October 2022
Type: Data Science In Action
Open Access
Received
6 May 2022
6 May 2022
Accepted
16 September 2022
16 September 2022
Published
20 October 2022
20 October 2022
Abstract
Many undergraduate students who matriculated in Science, Technology, Engineering and Mathematics (STEM) degree programs drop out or switch their major. Previous studies indicate that performance of students in prerequisite courses is important for attrition of students in STEM. This study analyzed demographic information, ACT/SAT score, and performance of students in freshman year courses to develop machine learning models predicting their success in earning a bachelor’s degree in biology. The predictive model based on Random Forest (RF) and Extreme Gradient Boosting (XGBoost) showed a better performance in terms of AUC (Area Under the Curve) with more balanced sensitivity and specificity than Logistic Regression (LR), K-Nearest Neighbor (KNN), and Neural Network (NN) models. An explainable machine learning approach called break-down was employed to identify important freshman year courses that could have a larger impact on student success at the biology degree program and student levels. More important courses identified at the program level can help program coordinators to prioritize their effort in addressing student attrition while more important courses identified at the student level can help academic advisors to provide more personalized, data-driven guidance to students.
Supplementary material
Supplementary MaterialThis includes the data file containing the training and test sets analyzed in the study, and all R code used in the analysis along with an explanatory README.txt file.
References
Aulck L, Nambi D, Velagapudi N, Blumenstock J, West J (2019). Mining university registrar records to predict first-year undergraduate attrition. In: Proceedings of the 12th International Conference on Educational Data Mining (CF Lynch, A Merceron, M Desmarais, R Nkambou, eds.), 9–18. International Educational Data Mining Society.
Bayer J, Bydzovská H, Géryk J, Obsivac T, Popelinsky L (2012). Predicting drop-out from social behavior of students. In: Proceedings of the 5th International Conference on Educational Data Mining (K Yacef, O Zaïane, A Hershkovitz, M Yudelson, J Stamper, eds.), 103–109. International Educational Data Mining Society.
Chen Y, Johri A, Rangwala H (2018). Running out of STEM: A comparative study across STEM majors of college students at-risk of dropping out early. In: Proceedings of the 8th International Conference on Learning Analytics and Knowledge (A Pardo, K Bartimote, G Lynch, S Buckingham Shum, R Ferguson, A Merceron, X Ochoa, eds.), 270–279. Society for Learning Analytics Research.
Nagy M, Molontay R (2018). Predicting dropout in higher education based on secondary school performance. In: Proceedings of the 22nd IEEE International Conference on Intelligent Engineering Systems, 389–394. Institute of Electrical and Electronics Engineers. Paper is available in IEEE Xplore: https://ieeexplore.ieee.org/abstract/document/8523888.