<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1017</article-id>
<article-id pub-id-type="doi">10.6339/21-JDS1017</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Do Predictor Envelopes Really Reduce Dimension?</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Jacobson</surname><given-names>Tate</given-names></name><xref ref-type="aff" rid="j_jds1017_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zou</surname><given-names>Hui</given-names></name><email xlink:href="mailto:zouxx019@umn.edu">zouxx019@umn.edu</email><xref ref-type="aff" rid="j_jds1017_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1017_aff_001"><label>1</label>School of Statistics, <institution>University of Minnesota</institution></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:zouxx019@umn.edu">zouxx019@umn.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>11</day><month>11</month><year>2021</year></pub-date><volume>19</volume><issue>4</issue><fpage>528</fpage><lpage>541</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1017_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>Code and data for reproducing our results can be found at <uri>https://github.com/TateJacobson/Envelope-EDF</uri>. This repository contains the following folders: 
<list>
<list-item id="j_jds1017_li_001">
<label>•</label>
<p><bold>Cleaning Output:</bold> Contains an R script for cleaning saved simulation output and generating plots from it.</p>
</list-item>
<list-item id="j_jds1017_li_002">
<label>•</label>
<p><bold>edf:</bold> An R package for computing the effective degrees of freedom</p>
</list-item>
<list-item id="j_jds1017_li_003">
<label>•</label>
<p><bold>Simulations:</bold> Contains R scripts for the simulations run in “Do Predictor Envelopes Really Reduce Dimension?”</p>
</list-item>
</list>
</p>
</caption>
</supplementary-material><history><date date-type="received"><day>9</day><month>3</month><year>2021</year></date><date date-type="accepted"><day>8</day><month>6</month><year>2021</year></date></history>
<permissions><copyright-statement>2021 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Predictor envelopes model the response variable by using a subspace of dimension <italic>d</italic> extracted from the full space of all <italic>p</italic> input variables. Predictor envelopes have a close connection to partial least squares and enjoy improved estimation efficiency in theory. As such, predictor envelopes have become increasingly popular in Chemometrics. Often, <italic>d</italic> is much smaller than <italic>p</italic>, which seemingly enhances the interpretability of the envelope model. However, the process of estimating the envelope subspace adds complexity to the final fitted model. To better understand the complexity of predictor envelopes, we study their effective degrees of freedom (EDF) in a variety of settings. We find that in many cases a <italic>d</italic>-dimensional predictor envelope model can have far more than <inline-formula id="j_jds1017_ineq_001"><alternatives><mml:math>
<mml:mi mathvariant="italic">d</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$d+1$]]></tex-math></alternatives></inline-formula> EDF and often has close to <inline-formula id="j_jds1017_ineq_002"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$p+1$]]></tex-math></alternatives></inline-formula>. However, the EDF of a predictor envelope depend heavily on the structure of the underlying data-generating model and there are settings under which predictor envelopes can have substantially reduced model complexity.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>dimension reduction</kwd>
<kwd>effective degrees of freedom</kwd>
<kwd>envelopes</kwd>
<kwd>Monte Carlo</kwd>
</kwd-group>
<funding-group><award-group><funding-source xlink:href="https://doi.org/10.13039/501100008982">NSF</funding-source><award-id>1915842</award-id><award-id>2015120</award-id></award-group><funding-statement>This work is supported in part by NSF 1915842 and 2015120. </funding-statement></funding-group>
</article-meta>
</front>
<body/>
<back>
<ref-list id="j_jds1017_reflist_001">
<title>References</title>
<ref id="j_jds1017_ref_001">
<mixed-citation publication-type="book"> <string-name><surname>Cook</surname> <given-names>RD</given-names></string-name> (<year>1998</year>). <source>Regression Graphics: Ideas for Studying Regressions through Graphics</source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_002">
<mixed-citation publication-type="book"> <string-name><surname>Cook</surname> <given-names>RD</given-names></string-name> (<year>2018</year>). <source>An Introduction to Envelopes: Dimension Reduction for Efficient Estimation in Multivariate Statistics</source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Cook</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Forzani</surname> <given-names>L</given-names></string-name> (<year>2020</year>). <article-title>Envelopes: A new chapter in partial least squares regression</article-title>. <source>Journal of Chemometrics</source>, <volume>34</volume>(<issue>10</issue>), e3287, DOI: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/cem.3287" xlink:type="simple">https://doi.org/10.1002/cem.3287</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Cook</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Forzani</surname> <given-names>L</given-names></string-name>, <string-name><surname>Su</surname> <given-names>Z</given-names></string-name> (<year>2016</year>). <article-title>A note on fast envelope estimation</article-title>. <source>Journal of Multivariate Analysis</source>, <volume>150</volume>: <fpage>42</fpage>–<lpage>54</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Cook</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Helland</surname> <given-names>IS</given-names></string-name>, <string-name><surname>Su</surname> <given-names>Z</given-names></string-name> (<year>2013</year>). <article-title>Envelopes and partial least squares regression</article-title>. <source>Journal of the Royal Statistical Society. Series B: Statistical Methodology</source>, <volume>75</volume>(<issue>5</issue>): <fpage>851</fpage>–<lpage>877</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Cook</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Chiaromonte</surname> <given-names>F</given-names></string-name> (<year>2007</year>). <article-title>Dimension reduction in regression without matrix inversion</article-title>. <source>Biometrika</source>, <volume>94</volume>(<issue>3</issue>): <fpage>569</fpage>–<lpage>584</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Efron</surname> <given-names>B</given-names></string-name> (<year>1986</year>). <article-title>How biased is the apparent error rate of a prediction rule?</article-title> <source>Journal of the American Statistical Association</source>, <volume>81</volume>(<issue>394</issue>): <fpage>461</fpage>–<lpage>470</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Janson</surname> <given-names>L</given-names></string-name>, <string-name><surname>Fithian</surname> <given-names>W</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>TJ</given-names></string-name> (<year>2015</year>). <article-title>Effective degrees of freedom: A flawed metaphor</article-title>. <source>Biometrika</source>, <volume>102</volume>(<issue>2</issue>): <fpage>479</fpage>–<lpage>485</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Krämer</surname> <given-names>N</given-names></string-name>, <string-name><surname>Sugiyama</surname> <given-names>M</given-names></string-name> (<year>2011</year>). <article-title>The degrees of freedom of partial least squares regression</article-title>. <source>Journal of the American Statistical Association</source>, <volume>106</volume>(<issue>494</issue>): <fpage>697</fpage>–<lpage>705</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_010">
<mixed-citation publication-type="other"> <string-name><surname>Lee</surname> <given-names>M</given-names></string-name>, <string-name><surname>Su</surname> <given-names>Z</given-names></string-name> (2020). R package <italic>Renvlp: Computing Envelope Estimators</italic>. <uri>https://cran.r-project.org/web/packages/Renvlp/</uri>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Mallows</surname> <given-names>CL</given-names></string-name> (<year>1973</year>). <article-title>Some comments on <inline-formula id="j_jds1017_ineq_003"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">p</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${C_{p}}$]]></tex-math></alternatives></inline-formula></article-title>. <source>Technometrics</source>, <volume>15</volume>(<issue>4</issue>): <fpage>661</fpage>–<lpage>675</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1017_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Mukherjee</surname> <given-names>A</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name> (<year>2015</year>). <article-title>On the degrees of freedom of reduced-rank estimators in multivariate regression</article-title>. <source>Biometrika</source>, <volume>102</volume>(<issue>2</issue>): <fpage>457</fpage>–<lpage>477</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
