Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 19, Issue 1 (2021)
  4. A Simple Aggregation Rule for Penalized ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

A Simple Aggregation Rule for Penalized Regression Coefficients after Multiple Imputation
Volume 19, Issue 1 (2021), pp. 1–14
Ryan A. Peterson  

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS995
Pub. online: 28 January 2021      Type: Computing In Data Science     

Received
1 July 2020
Accepted
1 October 2020
Published
28 January 2021

Abstract

Early in the course of the pandemic in Colorado, researchers wished to fit a sparse predictive model to intubation status for newly admitted patients. Unfortunately, the training data had considerable missingness which complicated the modeling process. I developed a quick solution to this problem: Median Aggregation of penaLized Coefficients after Multiple imputation (MALCoM). This fast, simple solution proved successful on a prospective validation set. In this manuscript, I show how MALCoM performs comparably to a popular alternative (MI-lasso), and can be implemented in more general penalized regression settings. A simulation study and application to local COVID-19 data is included.

Supplementary material

 Supplementary Material
A script to reproduce simulations under varied parameters has been provided as supplemental material online, along with an appendix containing additional tables and figures pertaining to the simulations described herein.

References

 
Breheny P, Huang J (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 5(1): 232–253.
 
Chen Q, Wang S (2013). Variable selection for multiply-imputed data with application to dioxin exposure study. Statistics in Medicine, 32(21): 3646–3659.
 
Collins L, Schafer JL, Kam C (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4): 330–351.
 
Friedman J, Hastie T, Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1–22.
 
Gong J, Ou J, Qiu X, Jie Y, Chen Y, Yuan L, et al. (2020). A tool for early prediction of severe coronavirus disease 2019 (COVID-19): A multicenter study using the risk nomogram in Wuhan and Guangdong, China. Clinical Infectious Diseases, 71(15): 833–840.
 
Liu Y, Wang Y, Feng Y, Wall MM (2016). Variable selection and prediction with incomplete high-dimensional data. The Annals of Applied Statistics, 10(1): 418–450.
 
Long Q, Johnson BA (2015). Variable selection in the presence of missing data: Resampling and imputation. Biostatistics, 16(3): 596–610.
 
Meier L, Van De Geer S, Bühlmann P (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1): 53–71.
 
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12: 77.
 
Rubin DB (2004). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
 
Seaman SR, White IR (2013). Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research, 22(3): 278–295. PMID: 21220355.
 
Sirimongkolkasem T, Drikvandi R (2019). On regularisation methods for analysis of high dimensional data. Annals of Data Science, 6(4): 737–763.
 
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288.
 
Van Buuren S (2018). Flexible Imputation of Missing Data. CRC Press.
 
Van Buuren S, Groothuis-Oudshoorn K (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3): 1–67.
 
Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. (2020). Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ, 369.
 
Yang Y, Yang H (2018). Model selection consistency of lasso for empirical data. Chinese Annals of Mathematics, Series B, 39(4): 607–620.
 
Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2): 894–942.
 
Zhao Y, Long Q (2017). Variable selection in the presence of missing data: Imputation-based methods. WIREs Computational Statistics, 9(5): e1402.
 
Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.

Related articles PDF XML
Related articles PDF XML

Copyright
© 2021 The Author(s).
This is a free to read article.

Keywords
elastic net LASSO minimax concave penalty missing data regularization

Metrics
since February 2021
1545

Article info
views

625

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy