<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1023</article-id>
<article-id pub-id-type="doi">10.6339/21-JDS1023</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Variable Importance Scores</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-6983-2495</contrib-id>
<name><surname>Loh</surname><given-names>Wei-Yin</given-names></name><email xlink:href="mailto:loh@stat.wisc.edu">loh@stat.wisc.edu</email><xref ref-type="aff" rid="j_jds1023_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhou</surname><given-names>Peigen</given-names></name><xref ref-type="aff" rid="j_jds1023_aff_001">1</xref>
</contrib>
<aff id="j_jds1023_aff_001"><label>1</label>Department of Statistics, <institution>University of Wisconsin</institution>, 1300 University Avenue, Madison, WI 53706, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:loh@stat.wisc.edu">loh@stat.wisc.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>16</day><month>9</month><year>2021</year></pub-date><volume>19</volume><issue>4</issue><fpage>569</fpage><lpage>592</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1023_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>Data files and simulation programs used in the article may be found in a supplementary file.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>6</day><month>7</month><year>2021</year></date><date date-type="accepted"><day>26</day><month>8</month><year>2021</year></date></history>
<permissions><copyright-statement>2021 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>There are many methods of scoring the importance of variables in prediction of a response but not much is known about their accuracy. This paper partially fills the gap by introducing a new method based on the GUIDE algorithm and comparing it with 11 existing methods. For data without missing values, eight methods are shown to give biased scores that are too high or too low, depending on the type of variables (ordinal, binary or nominal) and whether or not they are dependent on other variables, even when all of them are independent of the response. Among the remaining four methods, only GUIDE continues to give unbiased scores if there are missing data values. It does this with a self-calibrating bias-correction step that is applicable to data with and without missing values. GUIDE also provides threshold scores for differentiating important from unimportant variables with 95 and 99 percent confidence. Correlations of the scores to the predictive power of the methods are studied in three real data sets. For many methods, correlations with marginal predictive power are much higher than with conditional predictive power.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>bias correction</kwd>
<kwd>classification and regression tree</kwd>
<kwd>missing values</kwd>
<kwd>prediction</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1023_reflist_001">
<title>References</title>
<ref id="j_jds1023_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Bi</surname> <given-names>J</given-names></string-name> (<year>2012</year>). <article-title>A review of statistical methods for determination of relative importance of correlated predictors and identification of drivers of consumer liking</article-title>. <source>Journal of Sensory Studies</source>, <volume>27</volume>: <fpage>87</fpage>–<lpage>101</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Bleich</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kapelner</surname> <given-names>A</given-names></string-name>, <string-name><surname>George</surname> <given-names>EI</given-names></string-name>, <string-name><surname>Jensen</surname> <given-names>ST</given-names></string-name> (<year>2014</year>). <article-title>Variable selection for BART: An application to gene regulation</article-title>. <source>Annals of Applied Statistics</source>, <volume>8</volume>: <fpage>1750</fpage>–<lpage>1781</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Breiman</surname> <given-names>L</given-names></string-name> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Machine Learning</source>, <volume>45</volume>: <fpage>5</fpage>–<lpage>32</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_004">
<mixed-citation publication-type="book"> <string-name><surname>Breiman</surname> <given-names>L</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Olshen</surname> <given-names>RA</given-names></string-name>, <string-name><surname>Stone</surname> <given-names>CJ</given-names></string-name> (<year>1984</year>). <source>Classification and Regression Trees</source>. <publisher-name>Chapman &amp; Hall/CRC</publisher-name>, <publisher-loc>Boca Raton</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Bring</surname> <given-names>J</given-names></string-name> (<year>1994</year>). <article-title>How to standardize regression coefficients</article-title>. <source>American Statistician</source>, <volume>48</volume>: <fpage>209</fpage>–<lpage>213</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Bureau</surname> <given-names>A</given-names></string-name>, <string-name><surname>Dupuis</surname> <given-names>J</given-names></string-name>, <string-name><surname>sK</surname> <given-names>F</given-names></string-name>, <string-name><surname>Lunetta</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Hayward</surname> <given-names>B</given-names></string-name>, <string-name><surname>Keith</surname> <given-names>TP</given-names></string-name>, <etal>et al.</etal> (<year>2005</year>). <article-title>Identifying SNPs predictive of phenotype using random forests</article-title>. <source>Genetic Epidemiology</source>, <volume>28</volume>: <fpage>171</fpage>–<lpage>182</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_007">
<mixed-citation publication-type="chapter"> <string-name><surname>Chambers</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>TJ</given-names></string-name> (<year>1992</year>). <chapter-title>An appetizer</chapter-title>. In: <source>Statistical Models in S</source> (<string-name><given-names>JM</given-names> <surname>Chambers</surname></string-name>, <string-name><given-names>TJ</given-names> <surname>Hastie</surname></string-name>, eds.), <fpage>1</fpage>–<lpage>12</lpage>. <publisher-name>Wadsworth &amp; Brooks/Cole</publisher-name>, <publisher-loc>Pacific Grove</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Chaudhuri</surname> <given-names>P</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>MC</given-names></string-name>, <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name>, <string-name><surname>Yao</surname> <given-names>R</given-names></string-name> (<year>1994</year>). <article-title>Piecewise-polynomial regression trees</article-title>. <source>Statistica Sinica</source>, <volume>4</volume>: <fpage>143</fpage>–<lpage>167</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Chipman</surname> <given-names>HA</given-names></string-name>, <string-name><surname>George</surname> <given-names>EI</given-names></string-name>, <string-name><surname>McCulloch</surname> <given-names>RE</given-names></string-name> (<year>2010</year>). <article-title>BART: Bayesian additive regression trees</article-title>. <source>Annals of Applied Statistics</source>, <volume>4</volume>: <fpage>266</fpage>–<lpage>298</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_010">
<mixed-citation publication-type="other"> <string-name><surname>Denby</surname> <given-names>L</given-names></string-name> (1986). Major league baseball salary and performance data. <uri>http://lib.stat.cmu.edu/datasets/baseball.data</uri>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Díaz-Uriarte</surname> <given-names>R</given-names></string-name>, <string-name><surname>Alvarez de Andrés</surname> <given-names>S</given-names></string-name> (<year>2006</year>). <article-title>Gene selection and classification of microarray data using random forest</article-title>. <source>BMC Bioinformatics</source>, <volume>7</volume>(<issue>3</issue>): <elocation-id>3</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name> (<year>2001</year>). <article-title>Greedy function approximation: A gradient boosting machine</article-title>. <source>The Annals of Statistics</source>, <volume>29</volume>: <fpage>1189</fpage>–<lpage>1232</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name> (<year>2010</year>). <article-title>Regularization paths for generalized linear models via coordinate descent</article-title>. <source>Journal of Statistical Software</source>, <volume>33</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>22</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_014">
<mixed-citation publication-type="other"> <string-name><surname>Greenwell</surname> <given-names>B</given-names></string-name>, <string-name><surname>Boehmke</surname> <given-names>B</given-names></string-name>, <string-name><surname>Cunningham</surname> <given-names>J</given-names></string-name>, <string-name><surname>Developers</surname> <given-names>G</given-names></string-name> (2019). <italic>gbm: Generalized Boosted Regression Models</italic>. R package version 2.1.5.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Harrison</surname> <given-names>SL</given-names></string-name>, <string-name><surname>Fazio-Eynullayeva</surname> <given-names>E</given-names></string-name>, <string-name><surname>Lane</surname> <given-names>DA</given-names></string-name>, <string-name><surname>Underhill</surname> <given-names>P</given-names></string-name>, <string-name><surname>Lip</surname> <given-names>GYH</given-names></string-name> (<year>2020</year>). <article-title>Comorbidities associated with mortality in 31,461 adults with COVID-19 in the United States: A federated electronic medical record analysis</article-title>. <source>PLoS Medicine</source>, <volume>17</volume>(<issue>9</issue>): <fpage>1</fpage>–<lpage>11</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Hoaglin</surname> <given-names>DC</given-names></string-name>, <string-name><surname>Velleman</surname> <given-names>PF</given-names></string-name> (<year>1995</year>). <article-title>A critical look at some analyses of Major League Baseball salaries</article-title>. <source>American Statistician</source>, <volume>49</volume>: <fpage>277</fpage>–<lpage>285</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Hothorn</surname> <given-names>T</given-names></string-name>, <string-name><surname>Hornik</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zeileis</surname> <given-names>A</given-names></string-name> (<year>2006</year>). <article-title>Unbiased recursive partitioning: A conditional inference framework</article-title>. <source>Journal of Computational and Graphical Statistics</source>, <volume>15</volume>: <fpage>651</fpage>–<lpage>674</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Ishwaran</surname> <given-names>H</given-names></string-name> (<year>2007</year>). <article-title>Variable importance in binary regression trees and forests</article-title>. <source>Electronic Journal of Statistics</source>, <volume>1</volume>: <fpage>519</fpage>–<lpage>537</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Ishwaran</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kogalur</surname> <given-names>U</given-names></string-name> (<year>2007</year>). <article-title>Random survival forests for R</article-title>. <source>R News</source>, <volume>7</volume>(<issue>2</issue>): <fpage>25</fpage>–<lpage>31</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Ishwaran</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kogalur</surname> <given-names>U</given-names></string-name>, <string-name><surname>Blackstone</surname> <given-names>E</given-names></string-name>, <string-name><surname>Lauer</surname> <given-names>M</given-names></string-name> (<year>2008</year>). <article-title>Random survival forests</article-title>. <source>Annals of Applied Statistics</source>, <volume>2</volume>(<issue>3</issue>): <fpage>841</fpage>–<lpage>860</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_021">
<mixed-citation publication-type="other"> <string-name><surname>Johnson</surname> <given-names>RW</given-names></string-name> (2004). 2004 new car and truck data. <uri>http://jse.amstat.org/datasets/04cars.txt</uri>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Kim</surname> <given-names>H</given-names></string-name>, <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name> (<year>2001</year>). <article-title>Classification trees with unbiased multiway splits</article-title>. <source>Journal of the American Statistical Association</source>, <volume>96</volume>: <fpage>589</fpage>–<lpage>604</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_023">
<mixed-citation publication-type="other"> <string-name><surname>Kuhn</surname> <given-names>M</given-names></string-name> (2020). <italic>caret: Classification and Regression Training</italic>. R package version 6.0-86.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_024">
<mixed-citation publication-type="journal"> <string-name><surname>Liaw</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wiener</surname> <given-names>M</given-names></string-name> (<year>2002</year>). <article-title>Classification and regression by randomforest</article-title>. <source>R News</source>, <volume>2</volume>(<issue>3</issue>): <fpage>18</fpage>–<lpage>22</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name> (<year>2002</year>). <article-title>Regression trees with unbiased variable selection and interaction detection</article-title>. <source>Statistica Sinica</source>, <volume>12</volume>: <fpage>361</fpage>–<lpage>386</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_026">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name> (<year>2009</year>). <article-title>Improving the precision of classification trees</article-title>. <source>Annals of Applied Statistics</source>, <volume>3</volume>: <fpage>1710</fpage>–<lpage>1737</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_027">
<mixed-citation publication-type="chapter"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name> (<year>2012</year>). <chapter-title>Variable selection for classification and regression in large <italic>p</italic>, small <italic>n</italic> problems</chapter-title>. In: <source>Probability Approximations and Beyond</source> (<string-name><given-names>A</given-names> <surname>Barbour</surname></string-name>, <string-name><given-names>HP</given-names> <surname>Chan</surname></string-name>, <string-name><given-names>D</given-names> <surname>Siegmund</surname></string-name>, eds.), volume <volume>205</volume> of <series><italic>Lecture Notes in Statistics—Proceedings</italic></series>, <fpage>133</fpage>–<lpage>157</lpage>. <publisher-name>Springer</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_028">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name>, <string-name><surname>Eltinge</surname> <given-names>J</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name> (<year>2019</year>). <article-title>Classification and regression trees and forests for incomplete data from sample surveys</article-title>. <source>Statistica Sinica</source>, <volume>29</volume>: <fpage>431</fpage>–<lpage>453</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_029">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name>, <string-name><surname>Shih</surname> <given-names>YS</given-names></string-name> (<year>1997</year>). <article-title>Split selection methods for classification trees</article-title>. <source>Statistica Sinica</source>, <volume>7</volume>: <fpage>815</fpage>–<lpage>840</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_030">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name>, <string-name><surname>Vanichsetakul</surname> <given-names>N</given-names></string-name> (<year>1988</year>). <article-title>Tree-structured classification via generalized discriminant analysis (with discussion)</article-title>. <source>Journal of the American Statistical Association</source>, <volume>83</volume>: <fpage>715</fpage>–<lpage>728</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_031">
<mixed-citation publication-type="journal"> <string-name><surname>Loh</surname> <given-names>WY</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>P</given-names></string-name> (<year>2020</year>). <article-title>Missing data, imputation and regression trees</article-title>. <source>Statistica Sinica</source>, <volume>30</volume>: <fpage>1697</fpage>–<lpage>1722</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_032">
<mixed-citation publication-type="chapter"> <string-name><surname>Lundberg</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>SI</given-names></string-name> (<year>2017</year>). <chapter-title>A unified approach to interpreting model predictions</chapter-title>. In: <source>NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems</source> (<string-name><given-names>U.</given-names> <surname>von Luxburg</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Guyon</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Bengio</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Wallach</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Fergus</surname></string-name>, eds.), <fpage>4768</fpage>–<lpage>4777</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Nembrini</surname> <given-names>S</given-names></string-name>, <string-name><surname>König</surname> <given-names>IR</given-names></string-name>, <string-name><surname>Wright</surname> <given-names>MN</given-names></string-name> (<year>2018</year>). <article-title>The revival of the Gini importance?</article-title> <source>Bioinformatics</source>, <volume>21</volume>: <fpage>3711</fpage>–<lpage>3718</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_034">
<mixed-citation publication-type="chapter"> <string-name><surname>Ribeiro</surname> <given-names>MT</given-names></string-name>, <string-name><surname>Singh</surname> <given-names>S</given-names></string-name>, <string-name><surname>Guestrin</surname> <given-names>C</given-names></string-name> (<year>2016</year>). <chapter-title>“Why should I trust you?”: Explaining the predictions of any classifier</chapter-title>. In: <source>KDD’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, <fpage>1135</fpage>–<lpage>1144</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_035">
<mixed-citation publication-type="journal"> <string-name><surname>Sandri</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zuccolotto</surname> <given-names>Z</given-names></string-name> (<year>2008</year>). <article-title>A bias correction algorithm for the Gini variable importance measure in classification trees</article-title>. <source>Journal of Computational and Graphical Statistics</source>, <volume>17</volume>: <fpage>611</fpage>–<lpage>628</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_036">
<mixed-citation publication-type="journal"> <string-name><surname>Strobl</surname> <given-names>C</given-names></string-name>, <string-name><surname>Boulesteix</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kneib</surname> <given-names>T</given-names></string-name>, <string-name><surname>Augustin</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zeileis</surname> <given-names>A</given-names></string-name> (<year>2008</year>). <article-title>Conditional variable importance for random forests</article-title>. <source>BMC Bioinformatics</source>, <volume>9</volume>: <elocation-id>307</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_037">
<mixed-citation publication-type="journal"> <string-name><surname>Strobl</surname> <given-names>C</given-names></string-name>, <string-name><surname>Boulesteix</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zeileis</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hothorn</surname> <given-names>T</given-names></string-name> (<year>2007</year>). <article-title>Bias in random forest variable importance measures: Illustrations, sources and a solution</article-title>. <source>BMC Bioinformatics</source>, <volume>8</volume>: <elocation-id>25</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_038">
<mixed-citation publication-type="other"> <string-name><surname>Therneau</surname> <given-names>TM</given-names></string-name>, <string-name><surname>Atkinson</surname> <given-names>EJ</given-names></string-name> (2019a). An introduction to recursive partitioning using the RPART routines. R vignette. <uri>https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf</uri>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_039">
<mixed-citation publication-type="other"> <string-name><surname>Therneau</surname> <given-names>TM</given-names></string-name>, <string-name><surname>Atkinson</surname> <given-names>EJ</given-names></string-name> (2019b). <italic>rpart: Recursive Partitioning and Regression Trees</italic>. R package version 4.1-15.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_040">
<mixed-citation publication-type="journal"> <string-name><surname>Wei</surname> <given-names>P</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Song</surname> <given-names>J</given-names></string-name> (<year>2015</year>). <article-title>Variable importance analysis: A comprehensive review</article-title>. <source>Reliability Engineering &amp; Systems Safety</source>, <volume>142</volume>: <fpage>399</fpage>–<lpage>432</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_041">
<mixed-citation publication-type="journal"> <string-name><surname>White</surname> <given-names>AP</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>WZ</given-names></string-name> (<year>1994</year>). <article-title>Bias in information-based measures in decision tree induction</article-title>. <source>Machine Learning</source>, <volume>15</volume>: <fpage>321</fpage>–<lpage>329</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_042">
<mixed-citation publication-type="journal"> <string-name><surname>Wright</surname> <given-names>MN</given-names></string-name>, <string-name><surname>Ziegler</surname> <given-names>A</given-names></string-name> (<year>2017</year>). <article-title>ranger: A fast implementation of random forests for high dimensional data in C++ and R</article-title>. <source>Journal of Statistical Software</source>, <volume>77</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>17</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_043">
<mixed-citation publication-type="journal"> <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Boos</surname> <given-names>DD</given-names></string-name>, <string-name><surname>Stefanski</surname> <given-names>LA</given-names></string-name> (<year>2007</year>). <article-title>Variable selection by the addition of pseudovariables</article-title>. <source>Journal of the American Statistical Association</source>, <volume>102</volume>: <fpage>235</fpage>–<lpage>243</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_044">
<mixed-citation publication-type="other"> <string-name><surname>Zhu</surname> <given-names>R</given-names></string-name> (2018). <italic>Reinforcement Learning Trees</italic>. R package version 3.2.2.</mixed-citation>
</ref>
<ref id="j_jds1023_ref_045">
<mixed-citation publication-type="journal"> <string-name><surname>Zhu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>D</given-names></string-name>, <string-name><surname>Kosorok</surname> <given-names>MR</given-names></string-name> (<year>2015</year>). <article-title>Reinforcement learning trees</article-title>. <source>Journal of the American Statistical Association</source>, <volume>110</volume>: <fpage>1770</fpage>–<lpage>1784</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
