<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1056</article-id>
<article-id pub-id-type="doi">10.6339/22-JDS1056</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Data Science in Action</subject></subj-group></article-categories>
<title-group>
<article-title>Tree-Based Methods: A Tool for Modeling Nonlinear Complex Relationships and Generating New Insights from Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Mo</surname><given-names>Ya</given-names></name><email xlink:href="mailto:yamo@boisestate.edu">yamo@boisestate.edu</email><xref ref-type="aff" rid="j_jds1056_aff_001">1</xref><xref ref-type="aff" rid="j_jds1056_aff_002">2</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Habing</surname><given-names>Brian</given-names></name><xref ref-type="aff" rid="j_jds1056_aff_002">2</xref><xref ref-type="aff" rid="j_jds1056_aff_003">3</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Sedransk</surname><given-names>Nell</given-names></name><xref ref-type="aff" rid="j_jds1056_aff_002">2</xref>
</contrib>
<aff id="j_jds1056_aff_001"><label>1</label>Department of Curriculum, Instruction, and Foundational Studies, <institution>College of Education, Boise State University</institution>, 1910 University Drive Boise, ID 83725-1745, <country>U.S.A.</country></aff>
<aff id="j_jds1056_aff_002"><label>2</label><institution>National Institute of Statistical Sciences</institution>, Washington D.C., <country>U.S.A.</country></aff>
<aff id="j_jds1056_aff_003"><label>3</label>Department of Statistics, <institution>University of South Carolina</institution>, Columbia, <country>U.S.A.</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:yamo@boisestate.edu">yamo@boisestate.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2022</year></pub-date><pub-date pub-type="epub"><day>18</day><month>7</month><year>2022</year></pub-date><volume>20</volume><issue>3</issue><fpage>359</fpage><lpage>379</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1056_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The supplementary material includes the following files: (1) README: a brief explanation of all the files in the supplementary material; (2) synthetic data files; (3) code files; (4) supplemental files for the manuscript – a. supplemental tree file: an expanded overview of CRT method, and b. supplemental tables and figures file: additional ANCOVA result tables and regression tree figures for the outcome variables.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>1</day><month>1</month><year>2022</year></date><date date-type="accepted"><day>21</day><month>6</month><year>2022</year></date></history>
<permissions><copyright-statement>2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2022</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Our paper introduces tree-based methods, specifically classification and regression trees (CRT), to study student achievement. CRT allows data analysis to be driven by the data’s internal structure. Thus, CRT can model complex nonlinear relationships and supplement traditional hypothesis-testing approaches to provide a fuller picture of the topic being studied. Using Early Childhood Longitudinal Study-Kindergarten 2011 data as a case study, our research investigated predictors from students’ demographic backgrounds to ascertain their relationships to students’ academic performance and achievement gains in reading and math. In our study, CRT displays complex patterns between predictors and outcomes; more specifically, the patterns illuminated by the regression trees differ across the subject areas (i.e., reading and math) and between the performance levels and achievement gains. Through the use of real-world assessment datasets, this article demonstrates the strengths and limitations of CRT when analyzing student achievement data as well as the challenges. When achievement data such as achievement gains in our case study are not linearly strongly related to any continuous predictors, regression trees may make more accurate predictions than general linear models and produce results that are easier to interpret. Our study illustrates scenarios when CRT on achievement data is most appropriate and beneficial.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>achievement</kwd>
<kwd>early childhood education</kwd>
<kwd>tree-based methods</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1056_reflist_001">
<title>References</title>
<ref id="j_jds1056_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Baker</surname> <given-names>B</given-names></string-name> (<year>2001</year>). <article-title>Can flexible non-linear modeling tell us anything new about educational productivity?</article-title> <source>Economics of Education Review</source>, <volume>20</volume>(<issue>1</issue>): <fpage>81</fpage>–<lpage>92</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_002">
<mixed-citation publication-type="book"> <string-name><surname>Breiman</surname> <given-names>L</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Stone</surname> <given-names>CJ</given-names></string-name>, <string-name><surname>Olshen</surname> <given-names>RA</given-names></string-name> (<year>1984</year>). <source>Classification and Regression Trees</source>. <publisher-name>CRC press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Cheadle</surname> <given-names>J</given-names></string-name> (<year>2008</year>). <article-title>Educational investment, family context, and children’s math and reading growth from kindergarten through the third grade</article-title>. <source>Sociology of Education</source>, <volume>81</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>31</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Cooper</surname> <given-names>C</given-names></string-name>, <string-name><surname>Crosnoe</surname> <given-names>R</given-names></string-name>, <string-name><surname>Suizzo</surname> <given-names>M</given-names></string-name>, <string-name><surname>Pituch</surname> <given-names>K</given-names></string-name> (<year>2010</year>). <article-title>Poverty, race, and parental involvement during the transition to elementary school</article-title>. <source>Journal of Family Issues</source>, <volume>31</volume>(<issue>7</issue>): <fpage>859</fpage>–<lpage>883</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_005">
<mixed-citation publication-type="book"> <string-name><surname>Field</surname> <given-names>A</given-names></string-name> (<year>2013</year>). <source>Discovering Statistics Using IBM SPSS Statistics</source>. <publisher-name>Sage</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_006">
<mixed-citation publication-type="book"> <collab>IBM Corp</collab> (<year>2021</year>a). <source>IBM SPSS Modeler, Version 18.3</source>. <publisher-name>IBM Corp.</publisher-name>, <publisher-loc>Armonk, NY</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_007">
<mixed-citation publication-type="book"> <collab>IBM Corp</collab> (<year>2021</year>b). <source>IBM SPSS Statistics for Windows, Version 28.0</source>. <publisher-name>IBM Corp.</publisher-name>, <publisher-loc>Armonk, NY</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_008">
<mixed-citation publication-type="book"> <string-name><surname>James</surname> <given-names>G</given-names></string-name>, <string-name><surname>Witten</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name> (<year>2017</year>). <source>An Introduction to Statistical Learning: With Applications in R</source>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Jeon</surname> <given-names>M</given-names></string-name>, <string-name><surname>De Boeck</surname> <given-names>P</given-names></string-name> (<year>2016</year>). <article-title>A generalized item response tree model for psychological assessments</article-title>. <source>Behavior Research Methods</source>, <volume>48</volume>(<issue>3</issue>): <fpage>1070</fpage>–<lpage>1085</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Jeon</surname> <given-names>M</given-names></string-name>, <string-name><surname>De Boeck</surname> <given-names>P</given-names></string-name>, <string-name><surname>van der Linden</surname> <given-names>W</given-names></string-name> (<year>2017</year>). <article-title>Modeling answer change behavior: An application of a generalized item response tree model</article-title>. <source>Journal of Educational and Behavioral Statistics</source>, <volume>42</volume>(<issue>4</issue>): <fpage>467</fpage>–<lpage>490</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Kass</surname> <given-names>GV</given-names></string-name> (<year>1980</year>). <article-title>An exploratory technique for investigating large quantities of categorical data</article-title>. <source>Journal of the Royal Statistical Society: Series C (Applied Statistics)</source>, <volume>29</volume>(<issue>2</issue>): <fpage>119</fpage>–<lpage>127</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_012">
<mixed-citation publication-type="book"> <string-name><surname>Kuhn</surname> <given-names>M</given-names></string-name>, <string-name><surname>Johnson</surname> <given-names>K</given-names></string-name> (<year>2013</year>). <source>Applied Predictive Modeling</source>, volume <volume>26</volume>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_013">
<mixed-citation publication-type="book"> <string-name><surname>Ledolter</surname> <given-names>J</given-names></string-name> (<year>2013</year>). <source>Data Mining and Business Analytics with R</source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>W-Y</given-names></string-name> (<year>2014</year>). <article-title>Fifty years of classification and regression trees</article-title>. <source>International Statistical Review</source>, <volume>82</volume>(<issue>3</issue>): <fpage>329</fpage>–<lpage>348</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Ma</surname> <given-names>X</given-names></string-name> (<year>2005</year>). <article-title>Growth in mathematics achievement during middle and high school: Analysis with classification and regression trees</article-title>. <source>Journal of Educational Research</source>, <volume>99</volume>(<issue>2</issue>): <fpage>78</fpage>–<lpage>86</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_016">
<mixed-citation publication-type="book"> <string-name><surname>Ma</surname> <given-names>X</given-names></string-name> (<year>2018</year>). <source>Using Classification and Regression Trees: A Practical Primer</source>. <publisher-name>Information Age Publishing, Inc.</publisher-name></mixed-citation>
</ref>
<ref id="j_jds1056_ref_017">
<mixed-citation publication-type="book"> <string-name><surname>Mulligan</surname> <given-names>GM</given-names></string-name>, <string-name><surname>Hastedt</surname> <given-names>S</given-names></string-name>, <string-name><surname>McCarroll</surname> <given-names>JC</given-names></string-name> (<year>2012</year>). <source>First-Time Kindergartners in 2010–11: First Findings from the Kindergarten Rounds of the Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011) (NCES 2012-049)</source>. <comment>U.S. Department of Education</comment>. <publisher-name>National Center for Education Statistics</publisher-name>, <publisher-loc>Washington, DC</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_018">
<mixed-citation publication-type="book"> <string-name><surname>O’Dwyer</surname> <given-names>LM</given-names></string-name>, <string-name><surname>Bernauer</surname> <given-names>JA</given-names></string-name> (<year>2013</year>). <source>Quantitative Research for the Qualitative Researcher</source>. <publisher-name>SAGE publications</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Rupp</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Garcia</surname> <given-names>P</given-names></string-name>, <string-name><surname>Jamieson</surname> <given-names>J</given-names></string-name> (<year>2001</year>). <article-title>Combining multiple regression and CART to understand difficulty in second language reading and listening comprehension test items</article-title>. <source>International Journal of Testing</source>, <volume>1</volume>(<issue>3–4</issue>): <fpage>185</fpage>–<lpage>216</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Strobl</surname> <given-names>C</given-names></string-name>, <string-name><surname>Malley</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tutz</surname> <given-names>G</given-names></string-name> (<year>2009</year>). <article-title>An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests</article-title>. <source>Psychological Methods</source>, <volume>14</volume>(<issue>4</issue>): <fpage>323</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_021">
<mixed-citation publication-type="book"> <string-name><surname>Tourangeau</surname> <given-names>K</given-names></string-name>, <string-name><surname>Nord</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lê</surname> <given-names>T</given-names></string-name>, <string-name><surname>Sorongon</surname> <given-names>AG</given-names></string-name>, <string-name><surname>Hagedorn</surname> <given-names>MC</given-names></string-name>, <string-name><surname>Daly</surname> <given-names>P</given-names></string-name>, <etal>et al.</etal> (<year>2015</year>). <source>Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K: 2011). User’s Manual for the ECLS-K: 2011 Kindergarten Data File and Electronic Codebook, Public Version (NCES 2015-074)</source>. <comment>U.S. Department of Education</comment>. <publisher-name>National Center for Education Statistics</publisher-name>, <publisher-loc>Washington, DC</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1056_ref_022">
<mixed-citation publication-type="book"> <string-name><surname>Yan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Su</surname> <given-names>X</given-names></string-name> (<year>2009</year>). <source>Linear Regression Analysis: Theory and Computing</source>. <publisher-name>World Scientific</publisher-name>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
