Tree-Based Methods: A Tool for Modeling Nonlinear Complex Relationships and Generating New Insights from Data
Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences, pp. 359–379
Pub. online: 18 July 2022
Type: Data Science In Action
Open Access
Received
1 January 2022
1 January 2022
Accepted
21 June 2022
21 June 2022
Published
18 July 2022
18 July 2022
Abstract
Our paper introduces tree-based methods, specifically classification and regression trees (CRT), to study student achievement. CRT allows data analysis to be driven by the data’s internal structure. Thus, CRT can model complex nonlinear relationships and supplement traditional hypothesis-testing approaches to provide a fuller picture of the topic being studied. Using Early Childhood Longitudinal Study-Kindergarten 2011 data as a case study, our research investigated predictors from students’ demographic backgrounds to ascertain their relationships to students’ academic performance and achievement gains in reading and math. In our study, CRT displays complex patterns between predictors and outcomes; more specifically, the patterns illuminated by the regression trees differ across the subject areas (i.e., reading and math) and between the performance levels and achievement gains. Through the use of real-world assessment datasets, this article demonstrates the strengths and limitations of CRT when analyzing student achievement data as well as the challenges. When achievement data such as achievement gains in our case study are not linearly strongly related to any continuous predictors, regression trees may make more accurate predictions than general linear models and produce results that are easier to interpret. Our study illustrates scenarios when CRT on achievement data is most appropriate and beneficial.
Supplementary material
Supplementary MaterialThe supplementary material includes the following files: (1) README: a brief explanation of all the files in the supplementary material; (2) synthetic data files; (3) code files; (4) supplemental files for the manuscript – a. supplemental tree file: an expanded overview of CRT method, and b. supplemental tables and figures file: additional ANCOVA result tables and regression tree figures for the outcome variables.
References
Mulligan GM, Hastedt S, McCarroll JC (2012). First-Time Kindergartners in 2010–11: First Findings from the Kindergarten Rounds of the Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011) (NCES 2012-049). U.S. Department of Education. National Center for Education Statistics, Washington, DC.
Tourangeau K, Nord C, Lê T, Sorongon AG, Hagedorn MC, Daly P, et al. (2015). Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011). User’s Manual for the ECLS-K: 2011 Kindergarten Data File and Electronic Codebook, Public Version (NCES 2015-074). U.S. Department of Education. National Center for Education Statistics, Washington, DC.