Efficient Bayesian High-Dimensional Classification via Random Projection with Application to Gene Expression Data

Chakraborty, Abhisek

doi:10.6339/23-JDS1102

Journal of Data Science

Efficient Bayesian High-Dimensional Classification via Random Projection with Application to Gene Expression Data

Volume 22, Issue 1 (2024), pp. 152–172

Abhisek Chakraborty

https://doi.org/10.6339/23-JDS1102

Pub. online: 12 June 2023 Type: Computing In Data Science

Open Access

Received
10 January 2023

Accepted
26 April 2023

Published
12 June 2023

Abstract

Inspired by the impressive successes of compress sensing-based machine learning algorithms, data augmentation-based efficient Gibbs samplers for Bayesian high-dimensional classification models are developed by compressing the design matrix to a much lower dimension. Ardent care is exercised in the choice of the projection mechanism, and an adaptive voting rule is employed to reduce sensitivity to the random projection matrix. Focusing on the high-dimensional Probit regression model, we note that the naive implementation of the data augmentation-based Gibbs sampler is not robust to the presence of co-linearity in the design matrix – a setup ubiquitous in $n\lt p$ problems. We demonstrate that a simple fix based on joint updates of parameters in the latent space circumnavigates this issue. With a computationally efficient MCMC scheme in place, we introduce an ensemble classifier by creating R ($\sim 25$–50) projected copies of the design matrix, and subsequently running R classification models with the R projected design matrix in parallel. We combine the output from the R replications via an adaptive voting scheme. Our scheme is inherently parallelizable and capable of taking advantage of modern computing environments often equipped with multiple cores. The empirical success of our methodology is illustrated in elaborate simulations and gene expression data applications. We also extend our methodology to a high-dimensional logistic regression model and carry out numerical studies to showcase its efficacy.

Supplementary material

Supplementary Material

Software implementation of the methodologies developed in the article is available for use at zovialpapai/Bayesian-classification-with-random-projection. Here, we present a short description about the directories in the repository, as follows: (1) functions: The directory contains utility functions in two R scripts, that are utilised in the repeated simulations and real data analysis conducted in the paper. (a) “BCC_Functions.R” contains functions for compression matrix generation; Probit regression via Albert & Chib and Holmes & Held data augmentation schemes; Logit regression via Polya-Gamma data augmentation scheme; hyper-parameter tuning; and associated helper functions. (b) Probit_HH_cpp.R contains Probit regression via Holmes & Held data augmentation scheme, written in Rcpp. (2) repeated simulations: The directory contains three R scripts, named BCC_sims.R, Weakleaners.R, and time_comparison.R. (a) BCC_sims.R can be utilised to carry out the simulations presented in Section 3 on High-dimensional Probit regression, and Section 5 on High-dimensional Logit regression, along with hyper-parameter tuning. (b) Weakleaners.R can be utilized to study the effect of number of replications of compression matrix (or number of weak classifiers) on the accuracy of classifiers AC, AC+, HH, HH+. The results are presented in Section 3. (c) time_comparison.R can be utilised to study comparative computional time of our classifiers. The results are presented in Section 3. (3) data: Micro-array gene expression cancer data sets utilized in the article is freely available on the website data.mendeley.com. Copies of the data sets are available in the data directory in the our repository. (4) real data analysis: The directory contains the a R script named BCC_data.R that can be utilised to carry out the analysis of micro-array gene expression cancer data sets (Leukemia, Lung Cancer, Prostate cancer), presented in Section 4 of the paper.

References

Achlioptas D (2003). Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4): 671–687. Special Issue on PODS 2001. https://doi.org/10.1016/S0022-0000(03)00025-4

Adragni KP, Cook RD (2014). Sufficient dimension reduction and prediction in regression. Philosophical Transactions of Royal Society A, 367: 1–21.

Albert JH, Chib S (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422): 669–679. https://doi.org/10.1080/01621459.1993.10476321

Armagan A, Dunson D, Lee J (2013). Generalized double pareto shrinkage. Statistica Sinica, 23(1): 119–143.

Banerjee S, Roy A (2014). Linear Algebra and Matrix Analysis for Statistics. Chapman and Hall/CRC.

Bhadra A, Datta J, Polson NG, Willard B (2017). The horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis, 12(4): 1105–1131. https://doi.org/10.1214/16-BA1028

Bhattacharya A, Chakraborty A, Mallick BK (2016). Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika, 103(4): 985–991. https://doi.org/10.1093/biomet/asw042

Bhattacharya A, Pati D, Pillai NS, Dunson DB (2015). Dirichlet–laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512): 1479–1490 PMID: 27019543. https://doi.org/10.1080/01621459.2014.960967

Biswas N, Mackey L, Meng XL (2022). Scalable spike-and-slab. In: Proceedings of the 39th International Conference on Machine Learning (K Chaudhuri, S Jegelka, L Song, C Szepesvari, G Niu, S Sabato, eds.), volume 162 of Proceedings of Machine Learning Research, 2021–2040. PMLR.

Brown PJ, Griffin JE (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1): 171–188. https://doi.org/10.1214/10-BA507

Candes EJ, Romberg JK, Tao T (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8): 1207–1223. https://doi.org/10.1002/cpa.20124

Cannings TI, Samworth RJ (2017). Random-projection ensemble classification. Journal of the Royal Statistical Society Series B, 79(4): 959–1035. https://doi.org/10.1111/rssb.12228

Cao J, Durante D, Genton MG (2022). Scalable computation of predictive probabilities in probit models with Gaussian process priors. Journal of Computational and Graphical Statistics, 31(3): 709–720. https://doi.org/10.1080/10618600.2022.2036614

Carvalho CM, Polson NG, Scott JG (2009). Handling sparsity via the horseshoe. In: Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics (D van Dyk, M Welling, eds.), volume 5 of Proceedings of Machine Learning Research, 73–80. PMLR, Hilton, Clearwater Beach Resort, Clearwater Beach, Florida USA.

Carvalho CM, Polson NG, Scott JG (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2): 465–480. https://doi.org/10.1093/biomet/asq017

Chipman H, George E, Mcculloch R (2006). Bayesian ensemble learning. In: Advances in Neural Information Processing Systems (B Schölkopf, J Platt, T Hoffman, eds.), volume 19, 1–8. MIT Press.

Chipman HA, George EI, McCulloch RE (1998). Bayesian cart model search. Journal of the American Statistical Association, 93(443): 935–948. https://doi.org/10.1080/01621459.1998.10473750

Clyde M, Lee H (2001). Bagging and the bayesian bootstrap. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (TS Richardson, TS Jaakkola, eds.), volume R3 of Proceedings of Machine Learning Research, 57–62. PMLR. Reissued by PMLR on 31 March 2021.

Corrêa RF, Ludermir TB (2007). Dimensionality reduction of very large document collections by semantic mapping. In: Proceedings of the 6th International Workshop on Self-Organizing Maps. volume 6. 1–6.

Cox T, Cox M (2001). Multidimensional Scaling. Chapman and Hall/CRC.

Dasgupta S (2013). Experiments with random projection. arXiv preprint: https://arxiv.org/abs/1301.3849.

Donoho D (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4): 1289–1306. https://doi.org/10.1109/TIT.2006.871582

DuMouchel W (2002). Data Squashing: Constructing Summary Data Sets. 579–591. Springer US, Boston, MA.

Faes C, Ormerod JT, Wand MP (2011). Variational bayesian inference for parametric and nonparametric regression with missing data. Journal of the American Statistical Association, 106(495): 959–971. https://doi.org/10.1198/jasa.2011.tm10301

George EI, McCulloch RE (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423): 881–889. https://doi.org/10.1080/01621459.1993.10476353

Girolami M, Rogers S (2006). Variational Bayesian multinomial probit regression with gaussian process priors. Neural Computation, 18(8): 1790–1817. https://doi.org/10.1162/neco.2006.18.8.1790

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286: 531–537. https://doi.org/10.1126/science.286.5439.531

Gordon M, Beiser J, Brandt J, et al. (2002). The ocular hypertension treatment study: Baseline factors that predict the onset of primary open-angle glaucoma. Archives of Ophthalmology, 120: 714–34. https://doi.org/10.1001/archopht.120.6.714

Guhaniyogi R, Dunson DB (2015). Bayesian compressed regression. Journal of the American Statistical Association, 110(512): 1500–1514. https://doi.org/10.1080/01621459.2014.969425

Hans C (2009). Bayesian lasso regression. Biometrika, 96(4): 835–845. https://doi.org/10.1093/biomet/asp047

Held L, Holmes CC (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1): 145–168. https://doi.org/10.1214/06-BA105

Hinton GE, Roweis S (2002). Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems (S Becker, S Thrun, K Obermayer, eds.), volume 15. MIT Press.

Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4): 382–401. https://doi.org/10.1214/ss/1009212519

Hotelling H (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 6: 417–441. https://doi.org/10.1037/h0071325

Johnson WB, Lindenstraus J (1984). Extensions of lipschitz mappings into hilbert space. Contemporary Mathematics, 26: 189–206. https://doi.org/10.1090/conm/026/737400

Jolliffe I, Cadima J (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374: 1–16.

Kim HC, Ghahramani Z (2012). Bayesian classifier combination. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (ND Lawrence, M Girolami, eds.), volume 22 of Proceedings of Machine Learning Research, 619–627. PMLR, La, Palma, Canary Islands.

Lee HKH, Taddy M, Gray GA (2010). Selection of a representative sample. Journal of Classification, 27: 41–53. https://doi.org/10.1007/s00357-010-9044-x

Li G, Japkowicz N, Stocki TJ, Ungar RK (2010). Cascading Customized Naïve Bayes Couple. 147–160. Springer, Berlin Heidelberg, Berlin, Heidelberg.

Li P, Hastie T, Church K (2006a). Improving random projections using marginal information. In: Conference on Learning Theory. 635–649. 2006.

Li P, Hastie T, Church K (2006b). Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 287–296. 2006.

Loaiza-Maya R, Nibbering D (2022). Fast variational bayes methods for multinomial probit models. Journal of Business & Economic Statistics, https://doi.org/10.1080/07350015.2022.2139267.

Lorbert A, Blei DM, Schapire RE, Ramadge PJ (2012). A bayesian boosting model. arXiv preprint: https://arxiv.org/abs/1209.1996.

Madigan (2004). Likelihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, 6: 173–190. https://doi.org/10.1023/A:1014095614948

Mika S, Schölkopf B, Smola A, Müller KR, Scholz M, Rätsch G (1998). Kernel pca and de-noising in feature spaces. In: Advances in Neural Information Processing Systems (M Kearns, S Solla, D Cohn, eds.), volume 1, 8. MIT Press.

Mitchell TJ, Beauchamp JJ (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404): 1023–1032. https://doi.org/10.1080/01621459.1988.10478694

Mukherjee S, Sen S (2021). Variational inference in high-dimensional linear regression. arXiv preprint: https://arxiv.org/abs/2104.12232.

Owen A (2003). Data squashing empirical likelihood. Data Mining and Knowledge Discovery, 7: 101–113. https://doi.org/10.1023/A:1021568920107

Park T, Casella G (2008). The bayesian lasso. Journal of the American Statistical Association, 103(482): 681–686. https://doi.org/10.1198/016214508000000337

Piironen J, Vehtari A (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2): 5018–5051. https://doi.org/10.1214/17-EJS1337SI

Polson NG, Scott JG (2011). Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction. Oxford University Press.

Polson NG, Scott JG, Windle J (2013). Bayesian inference for logistic models using pólya–gamma latent variables. Journal of the American Statistical Association, 108(504): 1339–1349. https://doi.org/10.1080/01621459.2013.829001

Roweis ST, Saul LK (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290: 2323–2326. https://doi.org/10.1126/science.290.5500.2323

Shin M, Bhattacharya A, Johnson VE (2015). Scalable bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statistica Sinica, 28: 1053–1078.

Singh D, Febbo P, Ross K, et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Genome Biology, 1: 203–212.

Sra S, Dhillon I (2005). Generalized nonnegative matrix approximations with bregman divergences. In: Advances in Neural Information Processing Systems (Y Weiss, B Schölkopf, J Platt, eds.), volume 18. MIT Press.

Tanner MA, Wong WH (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398): 528–540. https://doi.org/10.1080/01621459.1987.10478458

Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Methodological, 58(1): 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Titsias M, Lawrence ND (2010). Bayesian gaussian process latent variable model. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (YW Teh, M Titterington, eds.), volume 9 of Proceedings of Machine Learning Research, 844–851. PMLR, Chia, Laguna Resort, Sardinia, Italy.

van der Maaten L, Hinton G (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9: 1–27.

Xie H, Huang J (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37(2): 673–696. https://doi.org/10.1214/07-AOS580

Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2): 894–942. https://doi.org/10.1214/09-AOS729

Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429. https://doi.org/10.1198/016214506000000735

Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 67(2): 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

collapsed Gibbs sampler data augmentation dimensionality reduction ensemble learning parallel processing

Metrics

since February 2021

464

Article info
views

281

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file