Supplementary Material

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1017

10.6339/21-JDS1017

Statistical Data Science

Do Predictor Envelopes Really Reduce Dimension?

Jacobson

Tate

1 Zou

Hui

zouxx019@umn.edu1∗ 1School of Statistics, University of Minnesota

∗Corresponding author. Email: zouxx019@umn.edu.

2021

11112021

194528541

Supplementary Material

Code and data for reproducing our results can be found at https://github.com/TateJacobson/Envelope-EDF. This repository contains the following folders: •

Cleaning Output: Contains an R script for cleaning saved simulation output and generating plots from it.

•

edf: An R package for computing the effective degrees of freedom

•

Simulations: Contains R scripts for the simulations run in “Do Predictor Envelopes Really Reduce Dimension?”

932021862021

2021

This is a free to read article.

Predictor envelopes model the response variable by using a subspace of dimension d extracted from the full space of all p input variables. Predictor envelopes have a close connection to partial least squares and enjoy improved estimation efficiency in theory. As such, predictor envelopes have become increasingly popular in Chemometrics. Often, d is much smaller than p, which seemingly enhances the interpretability of the envelope model. However, the process of estimating the envelope subspace adds complexity to the final fitted model. To better understand the complexity of predictor envelopes, we study their effective degrees of freedom (EDF) in a variety of settings. We find that in many cases a d-dimensional predictor envelope model can have far more than d + 1 EDF and often has close to p + 1. However, the EDF of a predictor envelope depend heavily on the structure of the underlying data-generating model and there are settings under which predictor envelopes can have substantially reduced model complexity.

Keywords dimension reduction effective degrees of freedom envelopes Monte Carlo

NSF

1915842

2015120

This work is supported in part by NSF 1915842 and 2015120.

References

Cook

(1998). Regression Graphics: Ideas for Studying Regressions through Graphics. John Wiley & Sons.

Cook

(2018). An Introduction to Envelopes: Dimension Reduction for Efficient Estimation in Multivariate Statistics. John Wiley & Sons.

Cook

, Forzani

(2020). Envelopes: A new chapter in partial least squares regression. Journal of Chemometrics, 34(10), e3287, DOI: https://doi.org/10.1002/cem.3287.

Cook

, Forzani

, Su

(2016). A note on fast envelope estimation. Journal of Multivariate Analysis, 150: 42–54.

Cook

, Helland

, Su

(2013). Envelopes and partial least squares regression. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 75(5): 851–877.

Cook

, Li

, Chiaromonte

(2007). Dimension reduction in regression without matrix inversion. Biometrika, 94(3): 569–584.

Efron

(1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81(394): 461–470.

Janson

, Fithian

, Hastie

(2015). Effective degrees of freedom: A flawed metaphor. Biometrika, 102(2): 479–485.

Krämer

, Sugiyama

(2011). The degrees of freedom of partial least squares regression. Journal of the American Statistical Association, 106(494): 697–705.

Lee

, Su

(2020). R package Renvlp: Computing Envelope Estimators. https://cran.r-project.org/web/packages/Renvlp/.

Mallows

(1973). Some comments on C p

. Technometrics, 15(4): 661–675.

Mukherjee

, Chen

, Wang

, Zhu

(2015). On the degrees of freedom of reduced-rank estimators in multivariate regression. Biometrika, 102(2): 457–477.