Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 19, Issue 3 (2021)
  4. Sign-based Shrinkage Based on an Asymmet ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Sign-based Shrinkage Based on an Asymmetric LASSO Penalty
Volume 19, Issue 3 (2021), pp. 429–449
Eric S. Kawaguchi   Burcu F. Darst   Kan Wang     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS1015
Pub. online: 2 June 2021      Type: Statistical Data Science     

Received
16 March 2021
Accepted
16 May 2021
Published
2 June 2021

Abstract

Penalized regression provides an automated approach to preform simultaneous variable selection and parameter estimation and is a popular method to analyze high-dimensional data. Since the conception of the LASSO in the mid-to-late 1990s, extensive research has been done to improve penalized regression. The LASSO, and several of its variations, performs penalization symmetrically around zero. Thus, variables with the same magnitude are shrunk the same regardless of the direction of effect. To the best of our knowledge, sign-based shrinkage, preferential shrinkage based on the sign of the coefficients, has yet to be explored under the LASSO framework. We propose a generalization to the LASSO, asymmetric LASSO, that performs sign-based shrinkage. Our method is motivated by placing an asymmetric Laplace prior on the regression coefficients, rather than a symmetric Laplace prior. This corresponds to an asymmetric ${\ell _{1}}$ penalty under the penalized regression framework. In doing so, preferential shrinkage can be performed through an auxiliary tuning parameter that controls the degree of asymmetry. Our numerical studies indicate that the asymmetric LASSO performs better than the LASSO when effect sizes are sign skewed. Furthermore, in the presence of positively-skewed effects, the asymmetric LASSO is comparable to the non-negative LASSO without the need to place an a priori constraint on the effect estimates and outperforms the non-negative LASSO when negative effects are also present in the model. A real data example using the breast cancer gene expression data from The Cancer Genome Atlas is also provided, where the asymmetric LASSO identifies two potentially novel gene expressions that are associated with BRCA1 with a minor improvement in prediction performance over the LASSO and non-negative LASSO.

Supplementary material

 Supplementary Material
The following supplemental material are provided: R files necessary to reproduce the simulation results reported in this manuscript, and PDF providing supplemental tables and figures and the proof of Lemma 2.1.

References

 
Akaike H (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): 716–723.
 
Angelosante D, Giannakis GB, Grossi E (2009). Compressed sensing of time-varying signals. In: 2009 16th International Conference on Digital Signal Processing, 1–8. IEEE.
 
Breheny P, Huang J (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The Annals of Applied Statistics, 5(1): 232–253.
 
Breheny PJ (2019). Marginal false discovery rates for penalized regression models. Biostatistics, 20(2): 299–314.
 
Cao J, Li H, Liu G, Han S, Xu P (2017). Knockdown of jarid2 inhibits the proliferation and invasion of ovarian cancer through the pi3k/akt signaling pathway. Molecular Medicine Reports, 16(3): 3600–3605.
 
Debortoli S, Müller O, Junglas I, vom Brocke J (2016). Text mining for information systems researchers: An annotated topic modeling tutorial. Communications of the Association for Information Systems, 39(7): 111–135.
 
Dobson AJ, Barnett AG (2018). An Introduction to Generalized Linear Models (4th ed.). Routledge.
 
Donoho DL, Johnstone JM (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3): 425–455.
 
Efron B, Hastie T, Johnstone I, Tibshirani R (2004). Least angle regression. The Annals of Statistics, 32(2): 407–499.
 
Enomoto R, Kinebuchi T, Sato M, Yagi H, Kurumizaka H, Yokoyama S (2006). Stimulation of dna strand exchange by the human tbpip/hop2-mnd1 complex. Journal of Biological Chemistry, 281(9): 5575–5581.
 
Enomoto R, Kinebuchi T, Sato M, Yagi H, Shibata T, Kurumizaka H, Yokoyama S (2004). Positive role of the mammalian tbpip/hop2 protein in dmc1-mediated homologous pairing. Journal of Biological Chemistry, 279(34): 35263–35272.
 
Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456): 1348–1360.
 
Friedman J, Hastie T, Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): 432–441.
 
Friedman J, Hastie T, Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1–22.
 
Gaines BR, Kim J, Zhou H (2018). Algorithms for fitting the constrained lasso. Journal of Computational and Graphical Statistics, 27(4): 861–871.
 
Ghaoui LE, Viallon V, Rabbani T (2010). Safe feature elimination for the lasso and sparse supervised learning problems. arXiv preprint: https://arxiv.org/abs/1009.4219.
 
Ghosh D, Chinnaiyan AM (2005). Classification and selection of biomarkers in genomic data using lasso. Journal of Biomedicine and Biotechnology, 2005(2): 147.
 
Golub GH, Heath M, Wahba G (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2): 215–223.
 
Hans C (2009). Bayesian lasso regression. Biometrika, 96(4): 835–845.
 
Huang X, Pan W (2003). Linear regression and two-class classification with gene expression data. Bioinformatics, 19(16): 2072–2078.
 
Ijichi H, Tanaka T, Nakamura T, Yagi H, Hakuba A, Sato M (2000). Molecular cloning and characterization of a human homologue of tbpip, a brca1 locus-related gene. Gene, 248(1–2): 99–107.
 
James GM, Paulson C, Rusmevichientong P (2012). The constrained lasso. In: Refereed Conference Proceedings, volume 31, 4945–4950. Citeseer.
 
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, Kathiresan S (2018). Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics, 50(9): 1219–1224.
 
Ko L, Cardona GR, Henrion-Caude A, Chin WW (2002). Identification and characterization of a tissue-specific coactivator, gt198, that interacts with the dna-binding domains of nuclear receptors. Molecular and Cellular Biology, 22(1): 357–369.
 
Koenker R, Basset G (1978). Asymptotic theory of least absolute error regression. Journal of the American Statistical Association, 73(363): 618–622.
 
Kozumi H, Kobayashi G (2011). Gibbs sampling methods for bayesian quantile regression. Journal of Statistical Computation and Simulation, 81(11): 1565–1578.
 
Li Y, Algarni A, Albathan M, Shen Y, Bijaksana MA (2014). Relevance feature discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 27(6): 1656–1669.
 
McCullagh P, Nelder JA (1983). Generalized Linear Models (2nd ed.). Routledge.
 
Meinshausen N, Bühlmann P (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3): 1436–1462.
 
Panagiotidis T, Stengos T, Vravosinos O (2018). On the determinants of bitcoin returns: A lasso approach. Finance Research Letters, 27: 235–240.
 
Park T, Casella G (2008). The bayesian lasso. Journal of the American Statistical Association, 103(482): 681–686.
 
Pereira JM, Basto M, da Silva AF (2016). The logistic lasso and ridge regression in predicting corporate failure. Procedia Economics and Finance, 39: 634–641.
 
Schwarz G, et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2): 461–464.
 
Takeuchi I, Le Q, Sears T, Smola A (2006). Nonparametric quantile estimation. Journal of Machine Learning Research, 7: 1231–1264.
 
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288.
 
Tibshirani R (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16(4): 385–395.
 
Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012). Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(2): 245–266.
 
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1): 91–108.
 
Tibshirani RJ, Taylor J (2011). The solution path of the generalized lasso. The Annals of Statistics, 39(3): 1335–1371.
 
Tsubouchi H, Roeder GS (2002). The mnd1 protein forms a complex with hop2 to promote homologous chromosome pairing and meiotic double-strand break repair. Molecular and Cellular Biology, 22(9): 3078–3088.
 
Welcsh PL, King MC (2001). Brca1 and brca2 and the genetics of breast and ovarian cancer. Human Molecular Genetics, 10(7): 705–713.
 
Welcsh PL, Owens KN, King MC (2000). Insights into the functions of brca1 and brca2. Trends in Genetics, 16(2): 69–74.
 
Wu L, Yang Y, Liu H (2014). Nonnegative-lasso and application in index tracking. Computational Statistics & Data Analysis, 70: 116–126.
 
Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009). Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6): 714–721.
 
Wu TT, Gong H, Clarke EM (2011). A transcriptome analysis by lasso penalized cox regression for pancreatic cancer survival. Journal of Bioinformatics and Computational Biology, 9(supp01): 63–73.
 
Wu TT, Lange K (2008). Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 2(1): 224–244.
 
Yu K, Moyeed RA (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4): 437–447.
 
Yu K, Zhang J (2005). A three-parameter asymmetric Laplace distribution and its extension. Communications in Statistics—Theory and Methods, 34(9–10): 1867–1879.
 
Yuan M, Lin Y (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1): 49–67.
 
Zeng Y, Yang T, Breheny P (2021). Hybrid safe–strong rules for efficient optimization in lasso-type problems. Computational Statistics & Data Analysis, 153: 107063.
 
Zhang CH, et al. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2): 894–942.
 
Zhang X, Li J, Yang Q, Li X WY, Liu Y, Shan B (2020). Tumor mutation burden and jarid2 gene alteration are associated with short disease-free survival in locally advanced triple-negative breast cancer. The Annals of Translational Medicine, 8(17): 1052.
 
Zhao P, Yu B (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7: 2541–2563.
 
Zhu XX, Yan YW, Ai CZ, Jiang S, Xu SS, Niu M, et al. (2017). Jarid2 is essential for the maintenance of tumor initiating cells in bladder cancer. Oncotarget, 8(15): 24483–24490.
 
Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429.
 
Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.

Related articles PDF XML
Related articles PDF XML

Copyright
© 2021 The Author(s)
This is a free to read article.

Keywords
asymmetric Laplace distribution high-dimensional statistics penalized regression quantile regularization variable selection

Funding
Eric S. Kawaguchi’s work is partially supported through the National Institutes of Health (NIH) grant T32ES013678. Burcu F. Darst’s work is partially supported through the National Cancer Institute (NCI) grant K99CA246063 and the Achievement Rewards for College Scientists Foundation Los Angeles Founder Chapter. The research of David V. Conti is partly supported by the NIH grants P01CA196569, R01HG010297, R01CA241410, R01CA257328, and P30CA014089.

Metrics
since February 2021
1733

Article info
views

1069

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy