Pub. online:18 Jul 2022Type:Data Science In ActionOpen Access
Journal:Journal of Data Science
Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences, pp. 359–379
Abstract
Our paper introduces tree-based methods, specifically classification and regression trees (CRT), to study student achievement. CRT allows data analysis to be driven by the data’s internal structure. Thus, CRT can model complex nonlinear relationships and supplement traditional hypothesis-testing approaches to provide a fuller picture of the topic being studied. Using Early Childhood Longitudinal Study-Kindergarten 2011 data as a case study, our research investigated predictors from students’ demographic backgrounds to ascertain their relationships to students’ academic performance and achievement gains in reading and math. In our study, CRT displays complex patterns between predictors and outcomes; more specifically, the patterns illuminated by the regression trees differ across the subject areas (i.e., reading and math) and between the performance levels and achievement gains. Through the use of real-world assessment datasets, this article demonstrates the strengths and limitations of CRT when analyzing student achievement data as well as the challenges. When achievement data such as achievement gains in our case study are not linearly strongly related to any continuous predictors, regression trees may make more accurate predictions than general linear models and produce results that are easier to interpret. Our study illustrates scenarios when CRT on achievement data is most appropriate and beneficial.