Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
Volume 20, Issue 1 (2022), pp. 34–50
Pub. online: 13 December 2021
Type: Statistical Data Science
Open Access
†
Joint First Author.
Received
20 July 2021
20 July 2021
Accepted
5 November 2021
5 November 2021
Published
13 December 2021
13 December 2021
Abstract
There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of “meta features” to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.
Supplementary material
Supplementary Materials.zip contains the following files and/or directories:
•
simulations/ : Directory that includes code and files necessary to reproduce the numerical results presented in this paper.
•
supplementary.pdf : Online supplementary material.