Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 22, Issue 1 (2024)
  4. Efficient Bayesian High-Dimensional Clas ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Efficient Bayesian High-Dimensional Classification via Random Projection with Application to Gene Expression Data
Volume 22, Issue 1 (2024), pp. 152–172
Abhisek Chakraborty  

Authors

 
Placeholder
https://doi.org/10.6339/23-JDS1102
Pub. online: 12 June 2023      Type: Computing In Data Science      Open accessOpen Access

Received
10 January 2023
Accepted
26 April 2023
Published
12 June 2023

Abstract

Inspired by the impressive successes of compress sensing-based machine learning algorithms, data augmentation-based efficient Gibbs samplers for Bayesian high-dimensional classification models are developed by compressing the design matrix to a much lower dimension. Ardent care is exercised in the choice of the projection mechanism, and an adaptive voting rule is employed to reduce sensitivity to the random projection matrix. Focusing on the high-dimensional Probit regression model, we note that the naive implementation of the data augmentation-based Gibbs sampler is not robust to the presence of co-linearity in the design matrix – a setup ubiquitous in $n\lt p$ problems. We demonstrate that a simple fix based on joint updates of parameters in the latent space circumnavigates this issue. With a computationally efficient MCMC scheme in place, we introduce an ensemble classifier by creating R ($\sim 25$–50) projected copies of the design matrix, and subsequently running R classification models with the R projected design matrix in parallel. We combine the output from the R replications via an adaptive voting scheme. Our scheme is inherently parallelizable and capable of taking advantage of modern computing environments often equipped with multiple cores. The empirical success of our methodology is illustrated in elaborate simulations and gene expression data applications. We also extend our methodology to a high-dimensional logistic regression model and carry out numerical studies to showcase its efficacy.

Supplementary material

 Supplementary Material
Software implementation of the methodologies developed in the article is available for use at zovialpapai/Bayesian-classification-with-random-projection. Here, we present a short description about the directories in the repository, as follows: (1) functions: The directory contains utility functions in two R scripts, that are utilised in the repeated simulations and real data analysis conducted in the paper. (a) “BCC_Functions.R” contains functions for compression matrix generation; Probit regression via Albert & Chib and Holmes & Held data augmentation schemes; Logit regression via Polya-Gamma data augmentation scheme; hyper-parameter tuning; and associated helper functions. (b) Probit_HH_cpp.R contains Probit regression via Holmes & Held data augmentation scheme, written in Rcpp. (2) repeated simulations: The directory contains three R scripts, named BCC_sims.R, Weakleaners.R, and time_comparison.R. (a) BCC_sims.R can be utilised to carry out the simulations presented in Section 3 on High-dimensional Probit regression, and Section 5 on High-dimensional Logit regression, along with hyper-parameter tuning. (b) Weakleaners.R can be utilized to study the effect of number of replications of compression matrix (or number of weak classifiers) on the accuracy of classifiers AC, AC+, HH, HH+. The results are presented in Section 3. (c) time_comparison.R can be utilised to study comparative computional time of our classifiers. The results are presented in Section 3. (3) data: Micro-array gene expression cancer data sets utilized in the article is freely available on the website data.mendeley.com. Copies of the data sets are available in the data directory in the our repository. (4) real data analysis: The directory contains the a R script named BCC_data.R that can be utilised to carry out the analysis of micro-array gene expression cancer data sets (Leukemia, Lung Cancer, Prostate cancer), presented in Section 4 of the paper.

References

 
Achlioptas D (2003). Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4): 671–687. Special Issue on PODS 2001. https://doi.org/10.1016/S0022-0000(03)00025-4
 
Adragni KP, Cook RD (2014). Sufficient dimension reduction and prediction in regression. Philosophical Transactions of Royal Society A, 367: 1–21.
 
Albert JH, Chib S (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422): 669–679. https://doi.org/10.1080/01621459.1993.10476321
 
Armagan A, Dunson D, Lee J (2013). Generalized double pareto shrinkage. Statistica Sinica, 23(1): 119–143.
 
Banerjee S, Roy A (2014). Linear Algebra and Matrix Analysis for Statistics. Chapman and Hall/CRC.
 
Bhadra A, Datta J, Polson NG, Willard B (2017). The horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis, 12(4): 1105–1131. https://doi.org/10.1214/16-BA1028
 
Bhattacharya A, Chakraborty A, Mallick BK (2016). Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika, 103(4): 985–991. https://doi.org/10.1093/biomet/asw042
 
Bhattacharya A, Pati D, Pillai NS, Dunson DB (2015). Dirichlet–laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512): 1479–1490 PMID: 27019543. https://doi.org/10.1080/01621459.2014.960967
 
Biswas N, Mackey L, Meng XL (2022). Scalable spike-and-slab. In: Proceedings of the 39th International Conference on Machine Learning (K Chaudhuri, S Jegelka, L Song, C Szepesvari, G Niu, S Sabato, eds.), volume 162 of Proceedings of Machine Learning Research, 2021–2040. PMLR.
 
Brown PJ, Griffin JE (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1): 171–188. https://doi.org/10.1214/10-BA507
 
Candes EJ, Romberg JK, Tao T (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8): 1207–1223. https://doi.org/10.1002/cpa.20124
 
Cannings TI, Samworth RJ (2017). Random-projection ensemble classification. Journal of the Royal Statistical Society Series B, 79(4): 959–1035. https://doi.org/10.1111/rssb.12228
 
Cao J, Durante D, Genton MG (2022). Scalable computation of predictive probabilities in probit models with Gaussian process priors. Journal of Computational and Graphical Statistics, 31(3): 709–720. https://doi.org/10.1080/10618600.2022.2036614
 
Carvalho CM, Polson NG, Scott JG (2009). Handling sparsity via the horseshoe. In: Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics (D van Dyk, M Welling, eds.), volume 5 of Proceedings of Machine Learning Research, 73–80. PMLR, Hilton, Clearwater Beach Resort, Clearwater Beach, Florida USA.
 
Carvalho CM, Polson NG, Scott JG (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2): 465–480. https://doi.org/10.1093/biomet/asq017
 
Chipman H, George E, Mcculloch R (2006). Bayesian ensemble learning. In: Advances in Neural Information Processing Systems (B Schölkopf, J Platt, T Hoffman, eds.), volume 19, 1–8. MIT Press.
 
Chipman HA, George EI, McCulloch RE (1998). Bayesian cart model search. Journal of the American Statistical Association, 93(443): 935–948. https://doi.org/10.1080/01621459.1998.10473750
 
Clyde M, Lee H (2001). Bagging and the bayesian bootstrap. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (TS Richardson, TS Jaakkola, eds.), volume R3 of Proceedings of Machine Learning Research, 57–62. PMLR. Reissued by PMLR on 31 March 2021.
 
Corrêa RF, Ludermir TB (2007). Dimensionality reduction of very large document collections by semantic mapping. In: Proceedings of the 6th International Workshop on Self-Organizing Maps. volume 6. 1–6.
 
Cox T, Cox M (2001). Multidimensional Scaling. Chapman and Hall/CRC.
 
Dasgupta S (2013). Experiments with random projection. arXiv preprint: https://arxiv.org/abs/1301.3849.
 
Donoho D (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4): 1289–1306. https://doi.org/10.1109/TIT.2006.871582
 
DuMouchel W (2002). Data Squashing: Constructing Summary Data Sets. 579–591. Springer US, Boston, MA.
 
Faes C, Ormerod JT, Wand MP (2011). Variational bayesian inference for parametric and nonparametric regression with missing data. Journal of the American Statistical Association, 106(495): 959–971. https://doi.org/10.1198/jasa.2011.tm10301
 
George EI, McCulloch RE (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423): 881–889. https://doi.org/10.1080/01621459.1993.10476353
 
Girolami M, Rogers S (2006). Variational Bayesian multinomial probit regression with gaussian process priors. Neural Computation, 18(8): 1790–1817. https://doi.org/10.1162/neco.2006.18.8.1790
 
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286: 531–537. https://doi.org/10.1126/science.286.5439.531
 
Gordon M, Beiser J, Brandt J, et al. (2002). The ocular hypertension treatment study: Baseline factors that predict the onset of primary open-angle glaucoma. Archives of Ophthalmology, 120: 714–34. https://doi.org/10.1001/archopht.120.6.714
 
Guhaniyogi R, Dunson DB (2015). Bayesian compressed regression. Journal of the American Statistical Association, 110(512): 1500–1514. https://doi.org/10.1080/01621459.2014.969425
 
Hans C (2009). Bayesian lasso regression. Biometrika, 96(4): 835–845. https://doi.org/10.1093/biomet/asp047
 
Held L, Holmes CC (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1): 145–168. https://doi.org/10.1214/06-BA105
 
Hinton GE, Roweis S (2002). Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems (S Becker, S Thrun, K Obermayer, eds.), volume 15. MIT Press.
 
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4): 382–401. https://doi.org/10.1214/ss/1009212519
 
Hotelling H (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 6: 417–441. https://doi.org/10.1037/h0071325
 
Johnson WB, Lindenstraus J (1984). Extensions of lipschitz mappings into hilbert space. Contemporary Mathematics, 26: 189–206. https://doi.org/10.1090/conm/026/737400
 
Jolliffe I, Cadima J (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374: 1–16.
 
Kim HC, Ghahramani Z (2012). Bayesian classifier combination. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (ND Lawrence, M Girolami, eds.), volume 22 of Proceedings of Machine Learning Research, 619–627. PMLR, La, Palma, Canary Islands.
 
Lee HKH, Taddy M, Gray GA (2010). Selection of a representative sample. Journal of Classification, 27: 41–53. https://doi.org/10.1007/s00357-010-9044-x
 
Li G, Japkowicz N, Stocki TJ, Ungar RK (2010). Cascading Customized Naïve Bayes Couple. 147–160. Springer, Berlin Heidelberg, Berlin, Heidelberg.
 
Li P, Hastie T, Church K (2006a). Improving random projections using marginal information. In: Conference on Learning Theory. 635–649. 2006.
 
Li P, Hastie T, Church K (2006b). Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 287–296. 2006.
 
Loaiza-Maya R, Nibbering D (2022). Fast variational bayes methods for multinomial probit models. Journal of Business & Economic Statistics, https://doi.org/10.1080/07350015.2022.2139267.
 
Lorbert A, Blei DM, Schapire RE, Ramadge PJ (2012). A bayesian boosting model. arXiv preprint: https://arxiv.org/abs/1209.1996.
 
Madigan (2004). Likelihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, 6: 173–190. https://doi.org/10.1023/A:1014095614948
 
Mika S, Schölkopf B, Smola A, Müller KR, Scholz M, Rätsch G (1998). Kernel pca and de-noising in feature spaces. In: Advances in Neural Information Processing Systems (M Kearns, S Solla, D Cohn, eds.), volume 1, 8. MIT Press.
 
Mitchell TJ, Beauchamp JJ (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404): 1023–1032. https://doi.org/10.1080/01621459.1988.10478694
 
Mukherjee S, Sen S (2021). Variational inference in high-dimensional linear regression. arXiv preprint: https://arxiv.org/abs/2104.12232.
 
Owen A (2003). Data squashing empirical likelihood. Data Mining and Knowledge Discovery, 7: 101–113. https://doi.org/10.1023/A:1021568920107
 
Park T, Casella G (2008). The bayesian lasso. Journal of the American Statistical Association, 103(482): 681–686. https://doi.org/10.1198/016214508000000337
 
Piironen J, Vehtari A (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2): 5018–5051. https://doi.org/10.1214/17-EJS1337SI
 
Polson NG, Scott JG (2011). Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction. Oxford University Press.
 
Polson NG, Scott JG, Windle J (2013). Bayesian inference for logistic models using pólya–gamma latent variables. Journal of the American Statistical Association, 108(504): 1339–1349. https://doi.org/10.1080/01621459.2013.829001
 
Roweis ST, Saul LK (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290: 2323–2326. https://doi.org/10.1126/science.290.5500.2323
 
Shin M, Bhattacharya A, Johnson VE (2015). Scalable bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statistica Sinica, 28: 1053–1078.
 
Singh D, Febbo P, Ross K, et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Genome Biology, 1: 203–212.
 
Sra S, Dhillon I (2005). Generalized nonnegative matrix approximations with bregman divergences. In: Advances in Neural Information Processing Systems (Y Weiss, B Schölkopf, J Platt, eds.), volume 18. MIT Press.
 
Tanner MA, Wong WH (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398): 528–540. https://doi.org/10.1080/01621459.1987.10478458
 
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Methodological, 58(1): 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
 
Titsias M, Lawrence ND (2010). Bayesian gaussian process latent variable model. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (YW Teh, M Titterington, eds.), volume 9 of Proceedings of Machine Learning Research, 844–851. PMLR, Chia, Laguna Resort, Sardinia, Italy.
 
van der Maaten L, Hinton G (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9: 1–27.
 
Xie H, Huang J (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37(2): 673–696. https://doi.org/10.1214/07-AOS580
 
Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2): 894–942. https://doi.org/10.1214/09-AOS729
 
Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429. https://doi.org/10.1198/016214506000000735
 
Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 67(2): 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

Related articles PDF XML
Related articles PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
collapsed Gibbs sampler data augmentation dimensionality reduction ensemble learning parallel processing

Metrics
since February 2021
430

Article info
views

263

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy