JDS Journal of Data Science 1683-86021680-743X1680-743X School of Statistics, Renmin University of China JDS1017 10.6339/21-JDS1017 Statistical Data Science Do Predictor Envelopes Really Reduce Dimension? JacobsonTate1 ZouHuizouxx019@umn.edu1 School of Statistics, University of Minnesota Corresponding author. Email: zouxx019@umn.edu. 202111112021194528541 Supplementary Material

Code and data for reproducing our results can be found at https://github.com/TateJacobson/Envelope-EDF. This repository contains the following folders:

Cleaning Output: Contains an R script for cleaning saved simulation output and generating plots from it.

edf: An R package for computing the effective degrees of freedom

Simulations: Contains R scripts for the simulations run in “Do Predictor Envelopes Really Reduce Dimension?”

932021862021 © 2021 The Author(s)2021 This is a free to read article.

Predictor envelopes model the response variable by using a subspace of dimension d extracted from the full space of all p input variables. Predictor envelopes have a close connection to partial least squares and enjoy improved estimation efficiency in theory. As such, predictor envelopes have become increasingly popular in Chemometrics. Often, d is much smaller than p, which seemingly enhances the interpretability of the envelope model. However, the process of estimating the envelope subspace adds complexity to the final fitted model. To better understand the complexity of predictor envelopes, we study their effective degrees of freedom (EDF) in a variety of settings. We find that in many cases a d-dimensional predictor envelope model can have far more than d + 1 EDF and often has close to p + 1. However, the EDF of a predictor envelope depend heavily on the structure of the underlying data-generating model and there are settings under which predictor envelopes can have substantially reduced model complexity.

dimension reduction effective degrees of freedom envelopes Monte Carlo NSF19158422015120This work is supported in part by NSF 1915842 and 2015120.
References Cook RD (1998). Regression Graphics: Ideas for Studying Regressions through Graphics. John Wiley & Sons. Cook RD (2018). An Introduction to Envelopes: Dimension Reduction for Efficient Estimation in Multivariate Statistics. John Wiley & Sons. Cook RD, Forzani L (2020). Envelopes: A new chapter in partial least squares regression. Journal of Chemometrics, 34(10), e3287, DOI: https://doi.org/10.1002/cem.3287. Cook RD, Forzani L, Su Z (2016). A note on fast envelope estimation. Journal of Multivariate Analysis, 150: 4254. Cook RD, Helland IS, Su Z (2013). Envelopes and partial least squares regression. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 75(5): 851877. Cook RD, Li B, Chiaromonte F (2007). Dimension reduction in regression without matrix inversion. Biometrika, 94(3): 569584. Efron B (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81(394): 461470. Janson L, Fithian W, Hastie TJ (2015). Effective degrees of freedom: A flawed metaphor. Biometrika, 102(2): 479485. Krämer N, Sugiyama M (2011). The degrees of freedom of partial least squares regression. Journal of the American Statistical Association, 106(494): 697705. Lee M, Su Z (2020). R package Renvlp: Computing Envelope Estimators. https://cran.r-project.org/web/packages/Renvlp/. Mallows CL (1973). Some comments on C p . Technometrics, 15(4): 661675. Mukherjee A, Chen K, Wang N, Zhu J (2015). On the degrees of freedom of reduced-rank estimators in multivariate regression. Biometrika, 102(2): 457477.