There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of “meta features” to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.
Early in the course of the pandemic in Colorado, researchers wished to fit a sparse predictive model to intubation status for newly admitted patients. Unfortunately, the training data had considerable missingness which complicated the modeling process. I developed a quick solution to this problem: Median Aggregation of penaLized Coefficients after Multiple imputation (MALCoM). This fast, simple solution proved successful on a prospective validation set. In this manuscript, I show how MALCoM performs comparably to a popular alternative (MI-lasso), and can be implemented in more general penalized regression settings. A simulation study and application to local COVID-19 data is included.