Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences
  4. Tree-Based Methods: A Tool for Modeling ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Tree-Based Methods: A Tool for Modeling Nonlinear Complex Relationships and Generating New Insights from Data
Volume 20, Issue 3 (2022): Special Issue: Data Science Meets Social Sciences, pp. 359–379
Ya Mo   Brian Habing   Nell Sedransk  

Authors

 
Placeholder
https://doi.org/10.6339/22-JDS1056
Pub. online: 18 July 2022      Type: Data Science In Action      Open accessOpen Access

Received
1 January 2022
Accepted
21 June 2022
Published
18 July 2022

Abstract

Our paper introduces tree-based methods, specifically classification and regression trees (CRT), to study student achievement. CRT allows data analysis to be driven by the data’s internal structure. Thus, CRT can model complex nonlinear relationships and supplement traditional hypothesis-testing approaches to provide a fuller picture of the topic being studied. Using Early Childhood Longitudinal Study-Kindergarten 2011 data as a case study, our research investigated predictors from students’ demographic backgrounds to ascertain their relationships to students’ academic performance and achievement gains in reading and math. In our study, CRT displays complex patterns between predictors and outcomes; more specifically, the patterns illuminated by the regression trees differ across the subject areas (i.e., reading and math) and between the performance levels and achievement gains. Through the use of real-world assessment datasets, this article demonstrates the strengths and limitations of CRT when analyzing student achievement data as well as the challenges. When achievement data such as achievement gains in our case study are not linearly strongly related to any continuous predictors, regression trees may make more accurate predictions than general linear models and produce results that are easier to interpret. Our study illustrates scenarios when CRT on achievement data is most appropriate and beneficial.

Supplementary material

 Supplementary Material
The supplementary material includes the following files: (1) README: a brief explanation of all the files in the supplementary material; (2) synthetic data files; (3) code files; (4) supplemental files for the manuscript – a. supplemental tree file: an expanded overview of CRT method, and b. supplemental tables and figures file: additional ANCOVA result tables and regression tree figures for the outcome variables.

References

 
Baker B (2001). Can flexible non-linear modeling tell us anything new about educational productivity? Economics of Education Review, 20(1): 81–92.
 
Breiman L, Friedman J, Stone CJ, Olshen RA (1984). Classification and Regression Trees. CRC press.
 
Cheadle J (2008). Educational investment, family context, and children’s math and reading growth from kindergarten through the third grade. Sociology of Education, 81(1): 1–31.
 
Cooper C, Crosnoe R, Suizzo M, Pituch K (2010). Poverty, race, and parental involvement during the transition to elementary school. Journal of Family Issues, 31(7): 859–883.
 
Field A (2013). Discovering Statistics Using IBM SPSS Statistics. Sage.
 
IBM Corp (2021a). IBM SPSS Modeler, Version 18.3. IBM Corp., Armonk, NY.
 
IBM Corp (2021b). IBM SPSS Statistics for Windows, Version 28.0. IBM Corp., Armonk, NY.
 
James G, Witten D, Hastie T, Tibshirani R (2017). An Introduction to Statistical Learning: With Applications in R. Springer.
 
Jeon M, De Boeck P (2016). A generalized item response tree model for psychological assessments. Behavior Research Methods, 48(3): 1070–1085.
 
Jeon M, De Boeck P, van der Linden W (2017). Modeling answer change behavior: An application of a generalized item response tree model. Journal of Educational and Behavioral Statistics, 42(4): 467–490.
 
Kass GV (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(2): 119–127.
 
Kuhn M, Johnson K (2013). Applied Predictive Modeling, volume 26. Springer.
 
Ledolter J (2013). Data Mining and Business Analytics with R. John Wiley & Sons.
 
Loh W-Y (2014). Fifty years of classification and regression trees. International Statistical Review, 82(3): 329–348.
 
Ma X (2005). Growth in mathematics achievement during middle and high school: Analysis with classification and regression trees. Journal of Educational Research, 99(2): 78–86.
 
Ma X (2018). Using Classification and Regression Trees: A Practical Primer. Information Age Publishing, Inc.
 
Mulligan GM, Hastedt S, McCarroll JC (2012). First-Time Kindergartners in 2010–11: First Findings from the Kindergarten Rounds of the Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011) (NCES 2012-049). U.S. Department of Education. National Center for Education Statistics, Washington, DC.
 
O’Dwyer LM, Bernauer JA (2013). Quantitative Research for the Qualitative Researcher. SAGE publications.
 
Rupp AA, Garcia P, Jamieson J (2001). Combining multiple regression and CART to understand difficulty in second language reading and listening comprehension test items. International Journal of Testing, 1(3–4): 185–216.
 
Strobl C, Malley J, Tutz G (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4): 323.
 
Tourangeau K, Nord C, Lê T, Sorongon AG, Hagedorn MC, Daly P, et al. (2015). Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011). User’s Manual for the ECLS-K: 2011 Kindergarten Data File and Electronic Codebook, Public Version (NCES 2015-074). U.S. Department of Education. National Center for Education Statistics, Washington, DC.
 
Yan X, Su X (2009). Linear Regression Analysis: Theory and Computing. World Scientific.

Related articles PDF XML
Related articles PDF XML

Copyright
2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
achievement early childhood education tree-based methods

Metrics
since February 2021
1052

Article info
views

507

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy