Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 20, Issue 1 (2022)
  4. Hierarchical Ridge Regression for Incorp ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
Volume 20, Issue 1 (2022), pp. 34–50
Eric S. Kawaguchi †   Sisi Li †   Garrett M. Weaver     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS1030
Pub. online: 13 December 2021      Type: Statistical Data Science      Open accessOpen Access

† Joint First Author.

Received
20 July 2021
Accepted
5 November 2021
Published
13 December 2021

Abstract

There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of “meta features” to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.

Supplementary material

 Supplementary Materials
.zip contains the following files and/or directories: • simulations/: Directory that includes code and files necessary to reproduce the numerical results presented in this paper. • supplementary.pdf: Online supplementary material.

References

 
Bell JT, Tsai PC, Yang TP, Pidsley R, Nisbet J, Glass D, et al. (2012). Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. PLoS Genetics, 8(4): e1002629.
 
Berdyshev G, Korotaev G, Boiarskikh G, Vaniushin B (1967). Nucleotide composition of dna and rna from somatic tissues of humpback and its changes during spawning. Biokhimiia (Moscow, Russia), 32(5): 988–993.
 
Bergersen LC, Glad IK, Lyng H (2011). Weighted lasso with data integration. Statistical Applications in Genetics and Molecular Biology, 10(1).
 
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32.
 
Chai H, Shi X, Zhang Q, Zhao Q, Huang Y, Ma S (2017). Analysis of cancer gene expression data with an assisted robust marker identification approach. Genetic Epidemiology, 41(8): 779–789.
 
Cheng WY, Yang THO, Anastassiou D (2013). Development of a prognostic model for breast cancer survival in an open challenge environment. Science Translational Medicine, 5(181): 181ra50–181ra50.
 
Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34(2): 187–202.
 
Cox DR (1975). Partial likelihood. Biometrika, 62(2): 269–276.
 
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486(7403): 346–352.
 
Dai L, Chen K, Sun Z, Liu Z, Li G (2018). Broken adaptive ridge regression and its asymptotic properties. Journal of Multivariate Analysis, 168: 334–351.
 
Dobson AJ, Barnett AG (2018). An Introduction to Generalized Linear Models. CRC press.
 
Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456): 1348–1360.
 
Friedman J, Hastie T, Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1–22.
 
Gross SM, Tibshirani R (2015). Collaborative regression. Biostatistics, 16(2): 326–338.
 
Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. (2013). Genome-wide methylation profiles reveal quantitative views of human aging rates. Molecular Cell, 49(2): 359–367.
 
Hoerl AE, Kennard RW (1976). Ridge regression iterative estimation of the biasing parameter. Communications in Statistics-Theory and Methods, 5(1): 77–88.
 
Horvath S (2013). Dna methylation age of human tissues and cell types. Genome Biology, 14(10): 1–20.
 
Horvath S, Garagnani P, Bacalini MG, Pirazzini C, Salvioli S, Gentilini D, et al. (2015). Accelerated epigenetic aging in down syndrome. Aging Cell, 14(3): 491–495.
 
Horvath S, Langfelder P, Kwak S, Aaronson J, Rosinski J, Vogt TF, et al. (2016). Huntington’s disease accelerates epigenetic aging of human brain and disrupts dna methylation levels. Aging, 8(7): 1485.
 
Horvath S, Zhang Y, Langfelder P, Kahn RS, Boks MP, van Eijk K, et al. (2012). Aging effects on dna methylation modules in human brain and blood tissue. Genome Biology, 13(10): 1–18.
 
Koch CM, Wagner W (2011). Epigenetic-aging-signature to determine age in different tissues. Aging, 3(10): 1018.
 
Levine ME, Lu AT, Bennett DA, Horvath S (2015). Epigenetic age of the pre-frontal cortex is associated with neuritic plaques, amyloid load, and alzheimer’s disease related cognitive functioning. Aging, 7(12): 1198.
 
Liu J, Liang G, Siegmund KD, Lewinger JP (2018). Data integration by multi-tuning parameter elastic net regression. BMC Bioinformatics, 19(1): 1–9.
 
McCullagh P (2019). Generalized Linear Models. Routledge.
 
Quach A, Levine ME, Tanaka T, Lu AT, Chen BH, Ferrucci L, et al. (2017). Epigenetic clock analysis of diet, exercise, education, and lifestyle factors. Aging, 9(2): 419.
 
Rakyan VK, Down TA, Maslau S, Andrew T, Yang TP, Beyan H, et al. (2010). Human aging-associated dna hypermethylation occurs preferentially at bivalent chromatin domains. Genome Research, 20(4): 434–439.
 
Tay JK, Aghaeepour N, Hastie T, Tibshirani R (2021). Feature-weighted elastic net: using “features of features” for better prediction. Statistica Sinica.
 
Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Weisenberger DJ, Shen H, et al. (2010). Age-dependent dna methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Research, 20(4): 440–446.
 
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1): 267–288.
 
Tseng P (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3): 475–494.
 
Van De Wiel MA, Lien TG, Verlaat W, van Wieringen WN, Wilting SM (2016). Better prediction by use of co-data: adaptive group-regularized ridge regression. Statistics in Medicine, 35(3): 368–381.
 
Weaver GM, Lewinger JP (2019). xrnet: hierarchical regularized regression to incorporate external data. Journal of Open Source Software, 4(44): 1761.
 
Weaver GM, Lewinger JP (2021). xrnet: Hierarchical Regularized Regression. R package version 0.1.7.
 
Yuan M, Lin Y (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1): 49–67.
 
Zeng C, Thomas DC, Lewinger JP (2020). Incorporating prior knowledge into regularized regression. Bioinformatics, 37: 514–521.
 
Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2): 894–942.
 
Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429.
 
Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.

Related articles PDF XML
Related articles PDF XML

Copyright
2022 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
high-dimensional regression meta-features penalization prediction regularization

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Institutes of Health awards 1P01CA196569 and T32ES013678. These awards had no influence over the experimental design, data analysis or interpretation, or writing the manuscript.

Metrics
since February 2021
1663

Article info
views

700

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy