Differentially Private Bayesian Envelope Regression via Sufficient Statistic Perturbation

Yu, Peng; Jiang, Yangdi; Su, Zhihua; Wu, Jiamei; Kong, Lingchen; Jiang, Bei

doi:10.6339/25-JDS1194

Journal of Data Science

Differentially Private Bayesian Envelope Regression via Sufficient Statistic Perturbation

Volume 24, Issue 1 (2026): Special Issue: Statistical aspects of Trustworthy Machine Learning, pp. 187–202

Peng Yu ^† Yangdi Jiang ^† Zhihua Su All authors (6)

https://doi.org/10.6339/25-JDS1194

Pub. online: 3 October 2025 Type: Statistical Data Science

Open Access

^† These two authors contributed equally to this paper.

Received
28 December 2024

Accepted
24 June 2025

Published
3 October 2025

Abstract

We propose a differentially private Bayesian framework for envelope regression, a technique that improves estimation efficiency by modelling the response as a function of a low-dimensional subspace of the predictors. Our method applies the analytic Gaussian mechanism to privatize sufficient statistics from the data, ensuring formal $(\epsilon ,\delta )$-differential privacy. We develop a tailored Gibbs sampling algorithm that performs valid Bayesian inference using only the noisy sufficient statistics. This approach leverages the envelope structure to isolate the variation in predictors that is relevant to the response, reducing estimation error compared to standard regression under the same privacy constraints. Through simulation studies, we demonstrate improved estimation accuracy and tighter credible intervals relative to a differentially private Bayesian linear regression baseline.

Supplementary material

Supplementary Material

A compressed folder containing the code used to generate the results in Section 4 and to implement our proposed methods is available online.

References

Aoshima M, Shen D, Shen H, Yata K, Zhou YH, Marron JS (2018). A survey of high dimension low sample size asymptotics. Australian & New Zealand Journal of Statistics, 60: 4–19. https://doi.org/10.1111/anzs.12212

Aoshima M, Yata K (2017). Statistical inference for high-dimension, low-sample-size data. American Mathematical Society, Sugaku Expositions, 30: 137–158. https://doi.org/10.1090/suga/421

Balle B, Wang YX (2018). Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In: International Conference on Machine Learning, 394–403. PMLR.

Bernstein G, Sheldon D (2018). Differentially private Bayesian inference for exponential families. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2924–2934. Curran Associates Inc., Red Hook, NY, USA.

Bernstein G, Sheldon D (2019). Differentially private bayesian linear regression. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 525–535. Curran Associates Inc., Red Hook, NY, USA.

Chanyaswad T, Dytso A, Poor HV, Mittal P (2018). MVG mechanism: Differential privacy under matrix-valued query. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 230–246. Association for Computing Machinery, New York, NY, USA.

Chaudhuri K, Sarwate AD, Sinha K (2013). A near-optimal algorithm for differentially-private principal components. Journal of Machine Learning Research, 14(1): 2905–2943.

Cook RD, Li B, Chiaromonte F (2010). Envelope models for parsimonious and efficient multivariate linear regression. Statistica Sinica, 20(3): 927–1010.

Cook RD, Zhang X (2015). Foundations for envelope models and methods. Journal of the American Statistical Association, 110(510): 599–611. https://doi.org/10.1080/01621459.2014.983235

Dandekar A, Basu D, Bressan S (2018). Differential privacy for regularised linear regression. In: International Conference on Database and Expert Systems Applications, 483–491. Springer.

Doe J, Roe J (2021). Differential privacy techniques for census data analysis. Journal of Census and Demographic Analysis, 15(2): 123–137.

Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M (2006a). Our data, ourselves: Privacy via distributed noise generation. In: Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques. Proceedings 25. St. Petersburg, Russia, May 28–June 1, 2006, 486–503. Springer.

Dwork C, McSherry F, Nissim K, Smith A (2006b). Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006. Proceedings 3. New York, NY, USA, March 4–7, 2006, 265–284. Springer.

Dwork C, Roth A (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4): 211–407.

Dyda A, Purcell M, Curtis S, Field E, Pillai P, Ricardo K, et al. (2021). Differential privacy for public health data: An innovative tool to optimize information sharing while protecting data confidentiality. Patterns, 2(12). https://doi.org/10.1016/j.patter.2021.100366

Frühwirth-Schnatter S (2006). Finite Mixture and Markov Switching Models. Springer.

Ju N, Awan J, Gong R, Rao V (2022). Data augmentation MCMC for Bayesian inference from privatized data. Advances in Neural Information Processing Systems, 35: 12732–12743.

McSherry F, Mironov I (2009). Differentially private recommender systems: Building privacy into the Netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 627–636. Association for Computing Machinery, New York, NY, USA.

Smith A (2008). Efficient, differentially private point estimators. arXiv preprint: https://arxiv.org/abs/0809.4794.

Talwar K, Thakurta A, Zhang L (2015). Nearly-optimal private lasso. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, 3025–3033. MIT Press, Cambridge, MA, USA.

Tierney L (1994). Markov chains for exploring posterior distributions. The Annals of Statistics, 22(4): 1701–1728.

Wang D, Xu J (2019). On sparse linear regression in the local differential privacy model. In: International Conference on Machine Learning, 6628–6637. PMLR.

Yao Y, Li Z (2018). Differential privacy with bias-control limited sources. IEEE Transactions on Information Forensics and Security, 13(5): 1230–1241. https://doi.org/10.1109/TIFS.2017.2780802

Zhang Z, Rubinstein BIP, Dimitrakakis C (2016). On the differential privacy of bayesian inference. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2365–2371. AAAI Press.

2026 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

credible interval dimension reduction MCMC statistical inference

Funding

The research received funding from the Canada CIFAR AI Chairs program, the Alberta Machine Intelligence Institute, the Natural Sciences and Engineering Council of Canada, and the Canadian Statistical Sciences Institute.

Metrics

since February 2021

664

Article info
views

456

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file