Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 21, Issue 4 (2023)
  4. Identifying Prerequisite Courses in Unde ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Identifying Prerequisite Courses in Undergraduate Biology Using Machine Learning
Volume 21, Issue 4 (2023), pp. 745–760
Youngjin Lee  

Authors

 
Placeholder
https://doi.org/10.6339/22-JDS1068
Pub. online: 20 October 2022      Type: Data Science In Action      Open accessOpen Access

Received
6 May 2022
Accepted
16 September 2022
Published
20 October 2022

Abstract

Many undergraduate students who matriculated in Science, Technology, Engineering and Mathematics (STEM) degree programs drop out or switch their major. Previous studies indicate that performance of students in prerequisite courses is important for attrition of students in STEM. This study analyzed demographic information, ACT/SAT score, and performance of students in freshman year courses to develop machine learning models predicting their success in earning a bachelor’s degree in biology. The predictive model based on Random Forest (RF) and Extreme Gradient Boosting (XGBoost) showed a better performance in terms of AUC (Area Under the Curve) with more balanced sensitivity and specificity than Logistic Regression (LR), K-Nearest Neighbor (KNN), and Neural Network (NN) models. An explainable machine learning approach called break-down was employed to identify important freshman year courses that could have a larger impact on student success at the biology degree program and student levels. More important courses identified at the program level can help program coordinators to prioritize their effort in addressing student attrition while more important courses identified at the student level can help academic advisors to provide more personalized, data-driven guidance to students.

Supplementary material

 Supplementary Material
This includes the data file containing the training and test sets analyzed in the study, and all R code used in the analysis along with an explanatory README.txt file.

References

 
Alexander C, Chen E, Grumbach K (2009). How leaky is the health career pipeline? Minority student achievement in college gateway courses. Academic Medicine, 84(6): 797–802.
 
Altman NS (1992). An introduction to kernel and nearest-neighbor nonparametric regression. American Statistician, 46(3): 175–185.
 
Aulck L, Nambi D, Velagapudi N, Blumenstock J, West J (2019). Mining university registrar records to predict first-year undergraduate attrition. In: Proceedings of the 12th International Conference on Educational Data Mining (CF Lynch, A Merceron, M Desmarais, R Nkambou, eds.), 9–18. International Educational Data Mining Society.
 
Ausubel DP (1963). The Psychology of Meaningful Verbal Learning. Grune & Stratton, New York, NY.
 
Bayer J, Bydzovská H, Géryk J, Obsivac T, Popelinsky L (2012). Predicting drop-out from social behavior of students. In: Proceedings of the 5th International Conference on Educational Data Mining (K Yacef, O Zaïane, A Hershkovitz, M Yudelson, J Stamper, eds.), 103–109. International Educational Data Mining Society.
 
Berens J, Schneider K, Görtz S, Oster S, Burghoff J (2019). Early detection of students at risk – predicting student dropouts using administrative student data and machine learning methods. Journal of Educational Data Mining, 11(3): 1–41.
 
Bettencourt GM, Manly CA, Kimball E, Wells RS (2020). STEM degree completion and first-generation college students: A cumulative disadvantage approach to the outcomes gap. Review of Higher Education, 43(3): 753–779.
 
Biecek P, Burzykowski T (2021). Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models. Chapman & Hall/CRC, Boca Raton, FL.
 
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32.
 
Bruner J (1974). Toward a Theory of Instruction. Belknap Press, Cambridge, MA.
 
Chen T, Guestrin C (2016). Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (B Krishnapuram, M Shah, AJ Smola, C Aggarwal, D Shen, R Rastogi, eds.), 785–794. Association for Computing Machinery.
 
Chen X, Ho P (2012). STEM in postsecondary education: Entrance, attrition, and coursetaking among 2003–2004 beginning postsecondary students. NCES Report no. 2013-152.
 
Chen X, Soldner M (2013). STEM attrition: College students’ paths into and out of STEM fields. NCES Report 2014-001.
 
Chen X, Weko T (2009). Students who study science, technology, engineering, and mathematics (STEM) in postsecondary education. NCES Report no. 2009-161.
 
Chen Y, Johri A, Rangwala H (2018). Running out of STEM: A comparative study across STEM majors of college students at-risk of dropping out early. In: Proceedings of the 8th International Conference on Learning Analytics and Knowledge (A Pardo, K Bartimote, G Lynch, S Buckingham Shum, R Ferguson, A Merceron, X Ochoa, eds.), 270–279. Society for Learning Analytics Research.
 
Cochran JD, Campbell SM, Baker HM, Leeds EM (2013). The role of student characteristics in predicting retention in online courses. Research in Higher Education, 55(1): 27–48.
 
Cox DR (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society, Series B, Methodological, 20(2): 215–232.
 
Cromley JG, Perez T, Kaplan A (2015). Undergraduate stem achievement and retention: Cognitive, motivational, and institutional factors and solutions. Policy Insights From the Behavioral and Brain Sciences, 3(1): 4–11.
 
Ctrobl C, Boulesteix A, Zeileis A, Hothorn T (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8: 25.
 
Dai T, Cromley JG (2014). Changes in implicit theories of ability in biology and dropout from stem majors: A latent growth curve approach. Contemporary Educational Psychology, 39(3): 233–247.
 
Delen D (2011). Predicting student attrition with data mining methods. Journal of College Student Retention, 13(1): 17–35.
 
Ehrenberg RG (2010). Analyzing the factors that influence persistence rates in STEM field, majors: Introduction to the symposium. Economics of Education Review, 29: 888–891.
 
Fawcett T (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8): 861–874.
 
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018). Learning from Imbalanced Data Sets. Springer, Cham, Switzerland.
 
Gagné RM, Briggs LJ (1974). Principles of Instructional Design. Holt, Rinehart & Winston, New York, NY.
 
Gasiewski JA, Eagan MK, Garcia GA, Hurtado S, Chang M (2012). From gatekeeping to engagement: A multicontextual, mixed method study of student academic engagement in introductory stem courses. Research in Higher Education, 53(2): 229–261.
 
James G, Witten D, Hastie T, Tibshirani R (2013). An Introduction to Statistical Learning: With Application in R. Springer, New York, NY.
 
Khun M, Silge J (2022). Tidy Modeling with R: A Framework for Modeling in the Tidyverse. O’reilly, Sebastopol, CA.
 
Kleinbaum DG, Klein M (2010). Logistic Regression: A Self-Learning Text. Springer, New York, NY.
 
Kovačić ZJ (2010). Early prediction of student success: Mining students enrolment data. In: Proceedings of Informing Science IT Education Conference (E Cohen, ed.), 647–665. Informing Science Institute.
 
Kuhn TK, Gordon VN, Webber J (2006). The advising and counseling continuum: Triggers for referral. NACADA Journal, 26: 24–31.
 
Le H, Robbins SB, Westrick P (2014). Predicting student enrollment and persistence in college stem fields using an expanded P-E fit framework: A large-scale multilevel study. Journal of Applied Psychology, 99(5): 915–947.
 
Lee YG, Ferrare JJ (2019). Finding one’s place or losing the race? The consequences of stem departure for college dropout and degree completion. Review of Higher Education, 43(1): 221–261.
 
Loh WY, Zhou P (2021). Variable importance scores. Journal of Data Science, 19(4): 569–592.
 
Malcom S, Feder M (2016). Barriers and Opportunities for 2-year and 4-year STEM Degree: Systematic Change to Support Students’ Diverse Pathways. The National Academies Press, Washington, DC.
 
McCulloch WS, Pitts W (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4): 115–133.
 
Nagy M, Molontay R (2018). Predicting dropout in higher education based on secondary school performance. In: Proceedings of the 22nd IEEE International Conference on Intelligent Engineering Systems, 389–394. Institute of Electrical and Electronics Engineers. Paper is available in IEEE Xplore: https://ieeexplore.ieee.org/abstract/document/8523888.
 
National Academy of Education (2017). Big data in education: Balancing the benefits of educational research and student privacy: A workshop summary. National Academy of Education, Washington, DC.
 
National Science Board (2018). Science & engineering indicators 2018. NSB Report 2018-1.
 
Olson S, Riordan DG (2012). Engage to excel: Producing one million additional college graduates with degrees in science, technology, engineering, and mathematics. Report to the President. Washington, DC.
 
Patrick AD, Prybutok AN, Borrego M (2021). Predicting persistence in engineering through an engineering identity scale. International Journal of Engineering Education, 34(2A): 351–363.
 
Reigeluth CM (1979). In search of a better way to organize instruction: The elaboration theory. Journal of Instructional Development, 2(3): 8–15.
 
Reigeluth CM, Merrill MD, Bunderson CV (1978). The structure of subject matter content and its instructional design implications. Instructional Science, 7: 107–126.
 
Schwebel D, Walburn N, Klyce K, Jerrolds K (2012). Efficacy of advising outreach on student retention, academic progress and achievement, and frequency of advising contacts: A longitudinal randomized trial. NACADA Journal, 32: 36–43.
 
Shewry MC, Wynn HP (1987). Maximum entropy sampling. Journal of Applied Statistics, 14(2): 165–170.
 
Smith M, Therry L, Whale J (2012). Developing a model for identifying students at risk of failure in a first year accounting unit. Higher Education Studies, 2(4): 91–102.
 
Sullivan JF (2006). Broadening engineering’s participation-a call for K-16 engineering education. The Bridge, 36(2): 17–24.
 
Suresh R (2007). The relationship between barrier courses and persistence in engineering. Journal of College Student Retention, 8(2): 215–239.
 
Thompson R, Bolin G (2011). Indicators of success in stem majors: A cohort study. Journal of College Admission, 212: 18–24.
 
Xie Y, Killewald AA (2012). Is American Science in Decline?. Harvard University Press, Cambridge, MA.

Related articles PDF XML
Related articles PDF XML

Copyright
2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
attrition Educational Data Mining Learning Analytics STEM education student success

Metrics
since February 2021
676

Article info
views

437

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy