<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1123</article-id>
<article-id pub-id-type="doi">10.6339/24-JDS1123</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Producing Fast and Convenient Machine Learning Benchmarks in R with the stressor Package</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Haycock</surname><given-names>Sam</given-names></name><email xlink:href="mailto:haycock.sam@outlook.com">haycock.sam@outlook.com</email><xref ref-type="aff" rid="j_jds1123_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Bean</surname><given-names>Brennan</given-names></name><xref ref-type="aff" rid="j_jds1123_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Burchfield</surname><given-names>Emily</given-names></name><xref ref-type="aff" rid="j_jds1123_aff_002">2</xref>
</contrib>
<aff id="j_jds1123_aff_001"><label>1</label>Department of Mathematics and Statistics, <institution>Utah State University</institution>, Logan, UT, <country>USA</country></aff>
<aff id="j_jds1123_aff_002"><label>2</label>Department of Environmental Sciences, <institution>Emory University</institution>, Atlanta, GA, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:haycock.sam@outlook.com">haycock.sam@outlook.com</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2024</year></pub-date><pub-date pub-type="epub"><day>4</day><month>6</month><year>2024</year></pub-date><volume>22</volume><issue>2</issue><fpage>239</fpage><lpage>258</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1123_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The supplementary materials associated with this paper contain all the data and code necessary to reproduce the figures and tables shown in this paper. Dataset descriptions have been provided in the text, but additional information about the files can be found in the README file contained in the supplementary materials folder.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>28</day><month>7</month><year>2023</year></date><date date-type="accepted"><day>21</day><month>2</month><year>2024</year></date></history>
<permissions><copyright-statement>2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2024</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>The programming overhead required to implement machine learning workflows creates a barrier for many discipline-specific researchers with limited programming experience. The <monospace>stressor</monospace> package provides an <monospace>R</monospace> interface to <monospace>Python</monospace>’s <monospace>PyCaret</monospace> package, which automatically tunes and trains 14-18 machine learning (ML) models for use in accuracy comparisons. In addition to providing an R interface to <monospace>PyCaret</monospace>, <monospace>stressor</monospace> also contains functions that facilitate synthetic data generation and variants of cross-validation that allow for easy benchmarking of the ability of machine-learning models to extrapolate or compete with simpler models on simpler data forms. We show the utility of <monospace>stressor</monospace> on two agricultural datasets, one using classification models to predict crop suitability and another using regression models to predict crop yields. Full ML benchmarking workflows can be completed in only a few lines of code with relatively small computational cost. The results, and more importantly the workflow, provide a template for how applied researchers can quickly generate accuracy comparisons of many machine learning models with very little programming.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>agriculture</kwd>
<kwd>benchmarking</kwd>
<kwd>cross-validation</kwd>
<kwd>machine learning</kwd>
<kwd>Python</kwd>
<kwd>R</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1123_reflist_001">
<title>References</title>
<ref id="j_jds1123_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Aguilar</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gramig</surname> <given-names>GG</given-names></string-name>, <string-name><surname>Hendrickson</surname> <given-names>JR</given-names></string-name>, <string-name><surname>Archer</surname> <given-names>DW</given-names></string-name>, <string-name><surname>Forcella</surname> <given-names>F</given-names></string-name>, <string-name><surname>Liebig</surname> <given-names>MA</given-names></string-name> (<year>2015</year>). <article-title>Crop species diversity changes in the United States: 1978–2012</article-title>. <source><italic>PLoS ONE</italic></source>. <volume>10</volume>(<issue>8</issue>): <fpage>1</fpage>–<lpage>4</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1371/journal.pone.0136580" xlink:type="simple">https://doi.org/10.1371/journal.pone.0136580</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_002">
<mixed-citation publication-type="other"> <string-name><surname>Ali</surname> <given-names>M</given-names></string-name> (<year>2020</year>). <italic>PyCaret: An open source, low-code machine learning library in Python</italic>. PyCaret version 1.0.0. <uri>https://www.pycaret.org</uri> (Accessed May 17, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_003">
<mixed-citation publication-type="book"> <string-name><surname>Bowles</surname> <given-names>M</given-names></string-name> (<year>2015</year>). <source><italic>Machine Learning in Python: Essential Techniques for Predictive Analysis</italic></source>. <publisher-name>John Wiley &amp; Sons</publisher-name>, <publisher-loc>Hoboken, NJ, USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_004">
<mixed-citation publication-type="chapter"> <string-name><surname>Brenning</surname> <given-names>A</given-names></string-name> (<year>2012</year>). <chapter-title>Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest</chapter-title>. In: <source><italic>2012 IEEE International Geoscience and Remote Sensing Symposium</italic></source>, <fpage>5372</fpage>–<lpage>5375</lpage>. <publisher-name>IEEE</publisher-name>. <uri>https://doi.org/10.1109/IGARSS.2012.6352393</uri> (Accessed Dec 29, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Brenning</surname> <given-names>A</given-names></string-name>, <string-name><surname>Long</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fieguth</surname> <given-names>P</given-names></string-name> (<year>2012</year>). <article-title>Detecting rock glacier flow structures using Gabor filters and ikonos imagery</article-title>. <source><italic>Remote Sensing of Environment</italic></source>, <volume>125</volume>: <fpage>227</fpage>–<lpage>237</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.rse.2012.07.005" xlink:type="simple">https://doi.org/10.1016/j.rse.2012.07.005</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Burchfield</surname> <given-names>EK</given-names></string-name> (<year>2022</year>). <article-title>Shifting cultivation geographies in the central and eastern US</article-title>. <source><italic>Environmental Research Letters</italic></source>, <volume>17</volume>(<issue>5</issue>): <fpage>1</fpage>–<lpage>11</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1088/1748-9326/ac6c3d" xlink:type="simple">https://doi.org/10.1088/1748-9326/ac6c3d</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Burchfield</surname> <given-names>EK</given-names></string-name>, <string-name><surname>Nelson</surname> <given-names>KS</given-names></string-name> (<year>2021</year>). <article-title>Agricultural yield geographies in the United States</article-title>. <source><italic>Environmental Research Letters</italic></source>, <volume>16</volume>(<issue>5</issue>): <fpage>1</fpage>–<lpage>12</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1088/1748-9326/abe88d" xlink:type="simple">https://doi.org/10.1088/1748-9326/abe88d</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_008">
<mixed-citation publication-type="other"> <string-name><surname>Chen</surname> <given-names>T</given-names></string-name>, <string-name><surname>He</surname> <given-names>T</given-names></string-name>, <string-name><surname>Benesty</surname> <given-names>M</given-names></string-name>, <string-name><surname>Khotilovich</surname> <given-names>V</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>H</given-names></string-name>, et al. (<year>2023</year>). <italic>xgboost: Extreme Gradient Boosting</italic>. R package version 1.7.6.1. <uri>https://CRAN.R-project.org/package=xgboost</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Crane-Droesch</surname> <given-names>A</given-names></string-name> (<year>2018</year>). <article-title>Machine learning methods for crop yield prediction and climate change impact assessment in agriculture</article-title>. <source><italic>Environmental Research Letters</italic></source>, <volume>13</volume>(<issue>11</issue>): <fpage>114003</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1088/1748-9326/aae159" xlink:type="simple">https://doi.org/10.1088/1748-9326/aae159</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_010">
<mixed-citation publication-type="other"> <string-name><surname>Culp</surname> <given-names>M</given-names></string-name>, <string-name><surname>Johnson</surname> <given-names>K</given-names></string-name>, <string-name><surname>Michailidis</surname> <given-names>G</given-names></string-name> (<year>2016</year>). <italic>ada: The R Package Ada for Stochastic Boosting</italic>. R package version 2.0-5. <uri>https://CRAN.R-project.org/package=ada</uri> (Accessed May 17, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name> (<year>2010</year>). <article-title>Regularization paths for generalized linear models via coordinate descent</article-title>. <source><italic>Journal of Statistical Software</italic></source>, <volume>33</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>22</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18637/jss.v033.i01" xlink:type="simple">https://doi.org/10.18637/jss.v033.i01</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Goslee</surname> <given-names>SC</given-names></string-name> (<year>2020</year>). <article-title>Drivers of agricultural diversity in the contiguous United States</article-title>. <source><italic>Frontiers in Sustainable Food Systems</italic></source>, <volume>4</volume>: <fpage>75</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3389/fsufs.2020.00075" xlink:type="simple">https://doi.org/10.3389/fsufs.2020.00075</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Gramacy</surname> <given-names>RB</given-names></string-name> (<year>2007</year>). <article-title>tgp: An R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models</article-title>. <source><italic>Journal of Statistical Software</italic></source>, <volume>19</volume>: <fpage>1</fpage>–<lpage>46</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18637/jss.v019.i09" xlink:type="simple">https://doi.org/10.18637/jss.v019.i09</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_014">
<mixed-citation publication-type="other"> <string-name><surname>Greenwell</surname> <given-names>B</given-names></string-name>, <string-name><surname>Boehmke</surname> <given-names>B</given-names></string-name>, <string-name><surname>Cunningham</surname> <given-names>J</given-names></string-name>, <string-name><surname>GBM Developers</surname></string-name> (<year>2022</year>). <italic>gbm: Generalized Boosted Regression Models</italic>. R package version 2.1.8.1. <uri>https://CRAN.R-project.org/package=gbm</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Harrison</surname> <given-names>D</given-names></string-name>, <string-name><surname>Rubinfeld</surname> <given-names>DL</given-names></string-name> (<year>1978</year>). <article-title>Hedonic housing prices and the demand for clean air</article-title>. <source><italic>Journal of Environmental Economics and Management</italic></source>, <volume>5</volume>(<issue>1</issue>): <fpage>81</fpage>–<lpage>102</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/0095-0696(78)90006-2" xlink:type="simple">https://doi.org/10.1016/0095-0696(78)90006-2</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_016">
<mixed-citation publication-type="other"> <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Efron</surname> <given-names>B</given-names></string-name> (<year>2022</year>). <italic>lars: Least Angle Regression, Lasso and Forward Stagewise</italic>. R package version 1.3. <uri>https://CRAN.R-project.org/package=lars</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_017">
<mixed-citation publication-type="book"> <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>JH</given-names></string-name> (<year>2009</year>). <source><italic>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</italic></source>, volume 2. <publisher-name>Springer</publisher-name>, <publisher-loc>New York, NY USA</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Haycock</surname> <given-names>S</given-names></string-name> (<year>2023</year>). <article-title>stressor: An R package for benchmarking machine learning models</article-title>. <source><italic>Utah State University Digital Commons</italic></source>. <fpage>1</fpage>-<lpage>75</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.26076/2am5-9f67" xlink:type="simple">https://doi.org/10.26076/2am5-9f67</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Hengl</surname> <given-names>T</given-names></string-name>, <string-name><surname>Miller</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Križan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shepherd</surname> <given-names>KD</given-names></string-name>, <string-name><surname>Sila</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kilibarda</surname> <given-names>M</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <article-title>African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning</article-title>. <source><italic>Scientific Reports</italic></source>, <volume>11</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>18</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1038/s41598-021-85639-y" xlink:type="simple">https://doi.org/10.1038/s41598-021-85639-y</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_020">
<mixed-citation publication-type="other"> <string-name><surname>Hothorn</surname> <given-names>T</given-names></string-name> (<year>2023</year>). CRAN task view: Machine learning &amp; statistical learning. Version 2023-07-20. <uri>https://CRAN.R-project.org/view=MachineLearning</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_021">
<mixed-citation publication-type="chapter"> <string-name><surname>Ke</surname> <given-names>G</given-names></string-name>, <string-name><surname>Meng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Finley</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>W</given-names></string-name>, <etal>et al.</etal> (<year>2017</year>). <chapter-title>Lightgbm: A highly efficient gradient boosting decision tree</chapter-title>. In: <source><italic>Advances in Neural Information Processing Systems</italic></source>. volume <volume>30</volume>. <publisher-name>Curran Associates, Inc.</publisher-name> <uri>https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf</uri> (Accessed May 17, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_022">
<mixed-citation publication-type="other"> <string-name><surname>Krueger</surname> <given-names>T</given-names></string-name>, <string-name><surname>Braun</surname> <given-names>M</given-names></string-name> (<year>2022</year>). <italic>CVST: Fast Cross-Validation via Sequential Testing</italic>. R package version 0.2-3. <uri>https://CRAN.R-project.org/package=CVST</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_023">
<mixed-citation publication-type="other"> <string-name><surname>Kuhn</surname> <given-names>M</given-names></string-name> (<year>2022</year>). <italic>caret: Classification and Regression Training</italic>. R package version 6.0-93. <uri>https://CRAN.R-project.org/package=caret</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_024">
<mixed-citation publication-type="other"> <string-name><surname>Kuhn</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wickham</surname> <given-names>H</given-names></string-name> (<year>2020</year>). <italic>Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles</italic>. <uri>https://www.tidymodels.org</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_025">
<mixed-citation publication-type="journal"> <string-name><surname>Lang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Binder</surname> <given-names>M</given-names></string-name>, <string-name><surname>Richter</surname> <given-names>J</given-names></string-name>, <string-name><surname>Schratz</surname> <given-names>P</given-names></string-name>, <string-name><surname>Pfisterer</surname> <given-names>F</given-names></string-name>, <string-name><surname>Coors</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal> (<year>2019</year>). <article-title>mlr3: A modern object-oriented machine learning framework in R</article-title>. <source><italic>Journal of Open Source Software</italic></source>. <fpage>1903</fpage>. <uri>https://doi.org/10.21105/joss.01903</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_026">
<mixed-citation publication-type="chapter"> <string-name><surname>Le</surname> <given-names>HM</given-names></string-name>, <string-name><surname>Eriksson</surname> <given-names>A</given-names></string-name>, <string-name><surname>Do</surname> <given-names>TT</given-names></string-name>, <string-name><surname>Milford</surname> <given-names>M</given-names></string-name> (<year>2019</year>). <chapter-title>A binary optimization approach for constrained k-means clustering</chapter-title>. In: <source><italic>Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Revised Selected Papers, Part IV</italic></source>. <conf-loc>Perth, Australia</conf-loc>. <conf-date>December 2–6, 2018</conf-date>, <fpage>383</fpage>–<lpage>398</lpage>. <publisher-name>Springer</publisher-name>. <uri>https://doi.org/10.1007/978-3-030-20870-7_24</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_027">
<mixed-citation publication-type="other"> <string-name><surname>Leisch</surname> <given-names>F</given-names></string-name>, <string-name><surname>Dimitriadou</surname> <given-names>E</given-names></string-name> (<year>2021</year>). <italic>mlbench: Machine Learning Benchmark Problems</italic>. R package version 2.1-3.1. <uri>https://cran.r-project.org/package=mlbench</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_028">
<mixed-citation publication-type="journal"> <string-name><surname>Liang</surname> <given-names>XZ</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chambers</surname> <given-names>RG</given-names></string-name>, <string-name><surname>Schmoldt</surname> <given-names>DL</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name>, <etal>et al.</etal> (<year>2017</year>). <article-title>Determining climate effects on US total agricultural productivity</article-title>. <source><italic>Proceedings of the National Academy of Sciences</italic></source>, <volume>114</volume>(<issue>12</issue>): <fpage>E2285</fpage>–<lpage>E2292</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1073/pnas.1615922114" xlink:type="simple">https://doi.org/10.1073/pnas.1615922114</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_029">
<mixed-citation publication-type="book"> <string-name><surname>Lovelace</surname> <given-names>R</given-names></string-name>, <string-name><surname>Nowosad</surname> <given-names>J</given-names></string-name>, <string-name><surname>Muenchow</surname> <given-names>J</given-names></string-name> (<year>2019</year>). <source><italic>Geocomputation with R</italic></source>. <publisher-name>CRC Press</publisher-name>. <uri>https://r.geocompx.org/spatial-cv.html</uri> (Accessed: Dec 29, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_030">
<mixed-citation publication-type="chapter"> <string-name><surname>Lundell</surname> <given-names>JF</given-names></string-name> (<year>2017</year>). <chapter-title>There has to be an easier way: A simple alternative for parameter tuning of supervised learning methods</chapter-title>. In: <source><italic>JSM Proceedings, Statistical Computing Section</italic></source>, <fpage>3028</fpage>–<lpage>3036</lpage>. <publisher-name>American Statistical Association</publisher-name>, <publisher-loc>Alexandria, VA</publisher-loc>. <uri>https://cran.r-project.org/package=EZtune</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_031">
<mixed-citation publication-type="other"> <string-name><surname>Majka</surname> <given-names>M</given-names></string-name> (<year>2019</year>). <italic>naivebayes: High Performance Implementation of the Naive Bayes Algorithm in R</italic>. R package version 0.9.7. <uri>https://CRAN.R-project.org/package=naivebayes</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_032">
<mixed-citation publication-type="book"> <string-name><surname>Meyer</surname> <given-names>D</given-names></string-name>, <string-name><surname>Dimitriadou</surname> <given-names>E</given-names></string-name>, <string-name><surname>Hornik</surname> <given-names>K</given-names></string-name>, <string-name><surname>Weingessel</surname> <given-names>A</given-names></string-name>, <string-name><surname>Leisch</surname> <given-names>F</given-names></string-name> (<year>2022</year>). <italic>e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien</italic>. R package version 1.7-12. <uri>https://CRAN.R-project.org/package=e1071</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_033">
<mixed-citation publication-type="journal"> <string-name><surname>Neunhoeffer</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sternberg</surname> <given-names>S</given-names></string-name> (<year>2019</year>). <article-title>How cross-validation can go wrong and what to do about it</article-title>. <source><italic>Political Analysis</italic></source>, <volume>27</volume>(<issue>1</issue>): <fpage>101</fpage>–<lpage>106</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1017/pan.2018.39" xlink:type="simple">https://doi.org/10.1017/pan.2018.39</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_034">
<mixed-citation publication-type="other"> <string-name><surname>Papadakis</surname> <given-names>M</given-names></string-name>, <string-name><surname>Tsagris</surname> <given-names>M</given-names></string-name>, <string-name><surname>Fafalios</surname> <given-names>S</given-names></string-name> (<year>2023</year>). <italic>Rfast: A Collection of Efficient and Extremely Fast R Functions</italic>. R package version 2.1.0. <uri>https://CRAN.R-project.org/package=Rfast</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_035">
<mixed-citation publication-type="journal"> <string-name><surname>Pedregosa</surname> <given-names>F</given-names></string-name>, <string-name><surname>Varoquaux</surname> <given-names>G</given-names></string-name>, <string-name><surname>Gramfort</surname> <given-names>A</given-names></string-name>, <string-name><surname>Michel</surname> <given-names>V</given-names></string-name>, <string-name><surname>Thirion</surname> <given-names>B</given-names></string-name>, <string-name><surname>Grisel</surname> <given-names>O</given-names></string-name>, <etal>et al.</etal> (<year>2011</year>). <article-title>Scikit-learn: Machine learning in Python</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>12</volume>: <fpage>2825</fpage>–<lpage>2830</lpage>. <uri>https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf?ref=https:/</uri> (Accessed Dec 29, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_036">
<mixed-citation publication-type="journal"> <string-name><surname>Pedregosa</surname> <given-names>F</given-names></string-name>, <string-name><surname>Varoquaux</surname> <given-names>G</given-names></string-name>, <string-name><surname>Gramfort</surname> <given-names>A</given-names></string-name>, <string-name><surname>Michel</surname> <given-names>V</given-names></string-name>, <string-name><surname>Thirion</surname> <given-names>B</given-names></string-name>, <string-name><surname>Grisel</surname> <given-names>O</given-names></string-name>, <etal>et al.</etal> (<year>2013</year>). <article-title>sklearn.model_selection.randomizedsearchcv</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>12</volume>: <fpage>2825</fpage>–<lpage>2830</lpage>. <uri>https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html</uri> (Accessed May 17, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_037">
<mixed-citation publication-type="journal"> <string-name><surname>Ploton</surname> <given-names>P</given-names></string-name>, <string-name><surname>Mortier</surname> <given-names>F</given-names></string-name>, <string-name><surname>Réjou-Méchain</surname> <given-names>M</given-names></string-name>, <string-name><surname>Barbier</surname> <given-names>N</given-names></string-name>, <string-name><surname>Picard</surname> <given-names>N</given-names></string-name>, <string-name><surname>Rossi</surname> <given-names>V</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Spatial validation reveals poor predictive performance of large-scale ecological mapping models</article-title>. <source><italic>Nature Communications</italic></source>, <volume>11</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>11</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1038/s41467-020-18321-y" xlink:type="simple">https://doi.org/10.1038/s41467-020-18321-y</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_038">
<mixed-citation publication-type="other"> <string-name><surname>R Core Team</surname></string-name> (<year>2022</year>). <italic>R: A Language and Environment for Statistical Computing</italic>. R Foundation for Statistical Computing, Vienna, Austria.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_039">
<mixed-citation publication-type="journal"> <string-name><surname>Raschka</surname> <given-names>S</given-names></string-name>, <string-name><surname>Patterson</surname> <given-names>J</given-names></string-name>, <string-name><surname>Nolet</surname> <given-names>C</given-names></string-name> (<year>2020</year>). <article-title>Machine learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence</article-title>. <source><italic>Information</italic></source>, <volume>11</volume>(<issue>4</issue>): <fpage>1</fpage>–<lpage>33</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/info11040193" xlink:type="simple">https://doi.org/10.3390/info11040193</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_040">
<mixed-citation publication-type="journal"> <string-name><surname>Ray</surname> <given-names>DK</given-names></string-name>, <string-name><surname>Gerber</surname> <given-names>JS</given-names></string-name>, <string-name><surname>MacDonald</surname> <given-names>GK</given-names></string-name>, <string-name><surname>West</surname> <given-names>PC</given-names></string-name> (<year>2015</year>). <article-title>Climate variation explains a third of global crop yield variability</article-title>. <source><italic>Nature Communications</italic></source>, <volume>6</volume>(<issue>1</issue>). <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1038/ncomms6989" xlink:type="simple">https://doi.org/10.1038/ncomms6989</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_041">
<mixed-citation publication-type="journal"> <string-name><surname>Roberts</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Bahn</surname> <given-names>V</given-names></string-name>, <string-name><surname>Ciuti</surname> <given-names>S</given-names></string-name>, <string-name><surname>Boyce</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Elith</surname> <given-names>J</given-names></string-name>, <string-name><surname>Guillera-Arroita</surname> <given-names>G</given-names></string-name>, <etal>et al.</etal> (<year>2017</year>a). <article-title>Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure</article-title>. <source><italic>Ecography</italic></source>, <volume>40</volume>(<issue>8</issue>): <fpage>913</fpage>–<lpage>929</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/ecog.02881" xlink:type="simple">https://doi.org/10.1111/ecog.02881</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_042">
<mixed-citation publication-type="journal"> <string-name><surname>Roberts</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Braun</surname> <given-names>NO</given-names></string-name>, <string-name><surname>Sinclair</surname> <given-names>TR</given-names></string-name>, <string-name><surname>Lobell</surname> <given-names>DB</given-names></string-name>, <string-name><surname>Schlenker</surname> <given-names>W</given-names></string-name> (<year>2017</year>b). <article-title>Comparing and combining process-based crop models and statistical models with some implications for climate change</article-title>. <source><italic>Environmental Research Letters</italic></source>, <volume>12</volume>(<issue>9</issue>): <fpage>095010</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1088/1748-9326/aa7f33" xlink:type="simple">https://doi.org/10.1088/1748-9326/aa7f33</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_043">
<mixed-citation publication-type="journal"> <string-name><surname>Roberts</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Schlenker</surname> <given-names>W</given-names></string-name>, <string-name><surname>Eyer</surname> <given-names>J</given-names></string-name> (<year>2013</year>). <article-title>Agronomic weather measures in econometric models of crop yield with implications for climate change</article-title>. <source><italic>American Journal of Agricultural Economics</italic></source>, <volume>95</volume>(<issue>2</issue>): <fpage>236</fpage>–<lpage>243</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/ajae/aas047" xlink:type="simple">https://doi.org/10.1093/ajae/aas047</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_044">
<mixed-citation publication-type="other"> <string-name><surname>Rong</surname> <given-names>X</given-names></string-name> (<year>2022</year>). <italic>deepnet: Deep Learning Toolkit in R</italic>. R package version 0.2.1. <uri>https://CRAN.R-project.org/package=deepnet</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_045">
<mixed-citation publication-type="journal"> <string-name><surname>Rosenzweig</surname> <given-names>C</given-names></string-name>, <string-name><surname>Jones</surname> <given-names>JW</given-names></string-name>, <string-name><surname>Hatfield</surname> <given-names>JL</given-names></string-name>, <string-name><surname>Ruane</surname> <given-names>AC</given-names></string-name>, <string-name><surname>Boote</surname> <given-names>KJ</given-names></string-name>, <string-name><surname>Thorburn</surname> <given-names>P</given-names></string-name>, <etal>et al.</etal> (<year>2013</year>). <article-title>The agricultural model intercomparison and improvement project (agmip): Protocols and pilot studies</article-title>. <source><italic>Agricultural and Forest Meteorology</italic></source>, <volume>170</volume>: <fpage>166</fpage>–<lpage>182</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.agrformet.2012.09.011" xlink:type="simple">https://doi.org/10.1016/j.agrformet.2012.09.011</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_046">
<mixed-citation publication-type="other"> <string-name><surname>Schratz</surname> <given-names>P</given-names></string-name>, <string-name><surname>Becker</surname> <given-names>M</given-names></string-name> (<year>2023</year>). <italic>mlr3spatiotempcv: Spatiotemporal Resampling Methods for ‘mlr3’</italic>. <uri>https://mlr3spatiotempcv.mlr-org.com/</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_047">
<mixed-citation publication-type="journal"> <string-name><surname>Schumacher</surname> <given-names>BL</given-names></string-name>, <string-name><surname>Burchfield</surname> <given-names>EK</given-names></string-name>, <string-name><surname>Bean</surname> <given-names>B</given-names></string-name>, <string-name><surname>Yost</surname> <given-names>MA</given-names></string-name> (<year>2023</year>). <article-title>Leveraging important covariate groups for corn yield prediction</article-title>. <source><italic>Agriculture</italic></source>, <volume>13</volume>(<issue>3</issue>). <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/agriculture13030618" xlink:type="simple">https://doi.org/10.3390/agriculture13030618</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_048">
<mixed-citation publication-type="other"> <string-name><surname>Soltani</surname> <given-names>A</given-names></string-name> (<year>2012</year>). <italic>Modeling physiology of crop development, growth and yield.</italic> CABi. <uri>https://www.cabidigitallibrary.org/doi/book/10.1079/9781845939700.0000</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_049">
<mixed-citation publication-type="journal"> <string-name><surname>Spangler</surname> <given-names>K</given-names></string-name>, <string-name><surname>Schumacher</surname> <given-names>BL</given-names></string-name>, <string-name><surname>Bean</surname> <given-names>B</given-names></string-name>, <string-name><surname>Burchfield</surname> <given-names>EK</given-names></string-name> (<year>2022</year>). <article-title>Path dependencies in US agriculture: Regional factors of diversification</article-title>. <source><italic>Agriculture, Ecosystems &amp; Environment</italic></source>, <volume>333</volume>: <fpage>107957</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.agee.2022.107957" xlink:type="simple">https://doi.org/10.1016/j.agee.2022.107957</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_050">
<mixed-citation publication-type="journal"> <string-name><surname>Srivastava</surname> <given-names>N</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name>, <string-name><surname>Krizhevsky</surname> <given-names>A</given-names></string-name>, <string-name><surname>Sutskever</surname> <given-names>I</given-names></string-name>, <string-name><surname>Salakhutdinov</surname> <given-names>R</given-names></string-name> (<year>2014</year>). <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>15</volume>(<issue>1</issue>): <fpage>1929</fpage>–<lpage>1958</lpage>. <uri>https://dl.acm.org/doi/abs/10.5555/2627435.2670313</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_051">
<mixed-citation publication-type="other"> <string-name><surname>Therneau</surname> <given-names>T</given-names></string-name> (<year>2018</year>). <italic>deming: Deming, Theil-Sen, Passing-Bablock and Total Least Squares Regression</italic>. R package version 1.4. <uri>https://CRAN.R-project.org/package=deming</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_052">
<mixed-citation publication-type="other"> <string-name><surname>Therneau</surname> <given-names>T</given-names></string-name>, <string-name><surname>Atkinson</surname> <given-names>B</given-names></string-name> (<year>2022</year>). <italic>rpart: Recursive Partitioning and Regression Trees</italic>. R package version 4.1.19. <uri>https://CRAN.R-project.org/package=rpart</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_053">
<mixed-citation publication-type="journal"> <string-name><surname>Urban</surname> <given-names>DW</given-names></string-name>, <string-name><surname>Sheffield</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lobell</surname> <given-names>DB</given-names></string-name> (<year>2015</year>). <article-title>The impacts of future climate and carbon dioxide changes on the average and variability of us maize yields under two emission scenarios</article-title>. <source><italic>Environmental Research Letters</italic></source>, <volume>10</volume>(<issue>4</issue>): <fpage>045003</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1088/1748-9326/10/4/045003" xlink:type="simple">https://doi.org/10.1088/1748-9326/10/4/045003</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_054">
<mixed-citation publication-type="other"> USDA (2019). 2017 Census of Agriculture. <uri>https://www.nass.usda.gov/AgCensus</uri> (Accessed Dec 29, 2023).</mixed-citation>
</ref>
<ref id="j_jds1123_ref_055">
<mixed-citation publication-type="other"> <string-name><surname>Ushey</surname> <given-names>K</given-names></string-name>, <string-name><surname>Allaire</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Y</given-names></string-name> (<year>2022</year>). <italic>reticulate: Interface to Python</italic>. R package version 1.25. <uri>https://CRAN.R-project.org/package=reticulate</uri>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_056">
<mixed-citation publication-type="journal"> <string-name><surname>van Klompenburg</surname> <given-names>T</given-names></string-name>, <string-name><surname>Kassahun</surname> <given-names>A</given-names></string-name>, <string-name><surname>Catal</surname> <given-names>C</given-names></string-name> (<year>2020</year>). <article-title>Crop yield prediction using machine learning: A systematic literature review</article-title>. <source><italic>Computers and Electronics in Agriculture</italic></source>, <volume>177</volume>: <fpage>105709</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.compag.2020.105709" xlink:type="simple">https://doi.org/10.1016/j.compag.2020.105709</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_057">
<mixed-citation publication-type="book"> <string-name><surname>Venables</surname> <given-names>WN</given-names></string-name>, <string-name><surname>Ripley</surname> <given-names>BD</given-names></string-name> (<year>2002</year>). <source><italic>Modern Applied Statistics with S</italic></source>. <publisher-name>Springer</publisher-name>, <publisher-loc>New York</publisher-loc>, <edition>fourth</edition> edition. ISBN 0-387-95457-0.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_058">
<mixed-citation publication-type="journal"> <string-name><surname>Wadoux</surname> <given-names>AMC</given-names></string-name>, <string-name><surname>Heuvelink</surname> <given-names>GB</given-names></string-name>, <string-name><surname>De Bruin</surname> <given-names>S</given-names></string-name>, <string-name><surname>Brus</surname> <given-names>DJ</given-names></string-name> (<year>2021</year>). <article-title>Spatial cross-validation is not the right way to evaluate map accuracy</article-title>. <source><italic>Ecological Modelling</italic></source>, <volume>457</volume>: <fpage>109692</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.ecolmodel.2021.109692" xlink:type="simple">https://doi.org/10.1016/j.ecolmodel.2021.109692</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_059">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>XD</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>RC</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>ZQ</given-names></string-name>, <string-name><surname>Hong</surname> <given-names>CQ</given-names></string-name> (<year>2019</year>). <article-title>Fast adaptive k-means subspace clustering for high-dimensional data</article-title>. <source><italic>IEEE Access</italic></source>, <volume>7</volume>: <fpage>42639</fpage>–<lpage>42651</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ACCESS.2019.2907043" xlink:type="simple">https://doi.org/10.1109/ACCESS.2019.2907043</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1123_ref_060">
<mixed-citation publication-type="journal"> <string-name><surname>Wright</surname> <given-names>MN</given-names></string-name>, <string-name><surname>Ziegler</surname> <given-names>A</given-names></string-name> (<year>2017</year>). <article-title>Ranger: A fast implementation of random forests for high dimensional data in C++ and R</article-title>. <source><italic>Journal of Statistical Software</italic></source>, <volume>77</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>17</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18637/jss.v077.i01" xlink:type="simple">https://doi.org/10.18637/jss.v077.i01</ext-link>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
