Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums

Miao, Jingang; Li, Yiming Paul

doi:10.6339/22-JDS1050

Journal of Data Science

Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums

Volume 21, Issue 1 (2023), pp. 27–42

Jingang Miao Yiming Paul Li

https://doi.org/10.6339/22-JDS1050

Pub. online: 16 June 2022 Type: Data Science In Action

Open Access

Received
14 April 2022

Accepted
21 May 2022

Published
16 June 2022

Abstract

The ratio of two Gaussians is useful in many contexts of statistical inference. We discuss statistically valid inference of the ratio under Differential Privacy (DP). We use the delta method to derive the asymptotic distribution of the ratio estimator and use the Gaussian mechanism to provide (epsilon, delta)-DP guarantees. Like many statistics, quantities involved in the inference of a ratio can be re-written as functions of sums, and sums are easy to work with for many reasons. In the context of DP, the sensitivity of a sum is easy to calculate. We focus on getting the correct coverage probability of 95% confidence intervals (CIs) of the DP ratio estimator. Our simulations show that the no-correction method, which ignores the DP noise, gives CIs that are too narrow to provide proper coverage for small samples. In our specific simulation scenario, the coverage of 95% CIs can be as low as below 10%. We propose two methods to mitigate the under-coverage issue, one based on Monte Carlo simulation and the other based on analytical correction. We show that the CIs of our methods have much better coverage with reasonable privacy budgets. In addition, our methods can handle weighted data, when the weights are fixed and bounded.

Supplementary material

Supplementary Material

Our simulation code is hosted on GitHub in a Jupyter Notebook file. Replication instructions can be found in the Supplementary Material.

References

Abowd J, Kifer D, Moran B, Ashmead R, Sexton W (2019). Census topdown: Differentially private data, incremental schemas, and consistency with public knowledge. https://systems.cs.columbia.edu/private-systems-class/papers/Abowd2019Census.pdf.

Alao R, Bogen M, Miao J, Mironov I, Tannen J (2021). How Meta is Working to Assess Fairness in Relation to Race in the U.S. Across Its Products and Systems. https://ai.facebook.com/research/publications/how-meta-is-working-to-assess-fairness-in-relation-to-race-in-the-us-across-its-products-and-systems.

Awan JA, Slavkovic A (2020). Differentially private inference for binomial data. J. Priv. Confid., 10(1).

Balle B, Barthe G, Gaboardi M (2020). Privacy profiles and amplification by subsampling. J. Priv. Confid., 10(1).

Balle B, Wang YX (2018). Improving the Gaussian mechanism for differential privacy: analytical calibration and optimal denoising. In: Proceedings of the 35th International Conference on Machine Learning, 394–403. PMLR.

Brawner T, Honaker J (2018). Bootstrap inference and differential privacy: Standard errors for free. https://hona.kr/papers/HonakerPrivacyBootstrap.pdf.

Bun M, Steinke T (2016). Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. Lecture Notes in Computer Science, 9985 LNCS 635–658.

Casella G, Berger RL (2002). Statistical Inference. Brooks/Cole Cengage Learning, 2nd ed. edition.

Covington C, He X, Honaker J, Kamath G (2021). Unbiased statistical estimation and valid confidence intervals under differential privacy. arXiv preprint: https://arxiv.org/abs/2110.14465.

DeGroot MH, Fienberg SE (1983). The comparison and evaluation of forecasters. J. R. Stat. Soc., Ser. D, Stat., 32(1/2): 12–22.

Deng A, Knoblich U, Lu J (2018). Applying the delta method in metric analytics: A practical guide with novel ideas. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 233–242. ACM, London, United Kingdom.

Deville JC, Sarndal CE (1992). Calibration estimators in survey sampling. J. Am. Stat. Assoc., 87(418): 376–382.

Dong J, Roth A, Su W (2019). Gaussian differential privacy. arXiv preprint: https://arxiv.org/abs/1905.02383.

D’Orazio V, Honaker J, King G (2015). Differential privacy for social science inference. https://papers.ssrn.com/abstract=2676160.

Du W, Foot C, Moniot M, Bray A, Groce A (2020). Differentially private confidence intervals. arXiv preprint: http://arxiv.org/abs/2001.02285.

Dunlap WP, Silver NC (1986). Confidence intervals and standard errors for ratios of normal variables. Behav. Res. Methods Instrum. Comput., 18(5): 469–471.

Dwork C, Kenthapadi K, Mcsherry F, Mironov I, Naor M (2006a). Our data, ourselves: Privacy via distributed noise generation. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, 486–503.

Dwork C, Lei J (2009). Differential privacy and robust statistics. In: Proceedings of the Annual ACM Symposium on Theory of Computing, 371–380.

Dwork C, McSherry F, Nissim K, Smith A (2006b). Calibrating noise to sensitivity in private data analysis. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 3876 LNCS, 265–284.

Dwork C, Roth A (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4): 211–407.

Dwork C, Smith A, Steinke T, Ullman J (2017). Exposed! A survey of attacks on private data. Annual Review of Statistics and Its Application (2017).

Erlingsson Ú, Pihur V, Korolova A (2014). RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the ACM Conference on Computer and Communications Security, 1054–1067. Association for Computing Machinery, New York, New York, USA.

Evans G, Honaker J, Li PY (2021). Bayesian experimentation with differential privacy. unpublished manuscript.

Evans G, King G, Schwenzfeier M, Thakurta A (2019). Statistically valid inferences from privacy protected data. https://www.semanticscholar.org/paper/Statistically-Valid-Inferences-from-Privacy-Data-Evans-King/1ac938181a198b1c6b7b46126a93ac3ae3e4cc60.

Ferrando C, Wang S, Sheldon D (2020). General-purpose differentially-private confidence intervals. arXiv preprint: http://arxiv.org/abs/2006.07749.

Gaboardi M, Lim HW, Rogers R, Vadhan SP (2016). Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In: 33rd International Conference on Machine Learning, ICML 2016, volume 5, 3086–3121. PMLR.

Geng Q, Ding W, Guo R, Kumar S (2020). Tight analysis of privacy and utility tradeoff in approximate differential privacy. In: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (S Chiappa, R Calandra, eds.), volume 108 of Proceedings of Machine Learning Research, 89–99. PMLR.

Gneiting T, Raftery AE (2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102(477): 359–378.

Harris CR, KJ vdWS, Gommers R M (2020). Array programming with NumPy. Nature, 585(7825): 357–362.

Hirschberg J, Lye JN (2007). Providing intuition to the Fieller method with two geometric representations using stata and Eviews. https://minerva-access.unimelb.edu.au/handle/11343/34613.

Kairouz P, Oh S, Viswanath P (2017). The composition theorem for differential privacy. IEEE Trans. Inf. Theory, 63(6): 4037–4049.

Karwa V, Vadhan S (2017). Finite sample differentially private confidence intervals. arXiv preprint: https://arxiv.org/abs/1711.03908.

Kasiviswanathan SP, Lee HK, Nissim K, Raskhodnikova S, Smith A (2008). What can we learn privately? In: 2008 49th Annual IEEE Symposium on Foundations of Computer Science, 531–540.

Kish L (1965). Survey Sampling. John Wiley & Sons, New York.

Kleiner A, Talwalkar A, Sarkar P, Jordan MI (2014). A scalable bootstrap for massive data. J. R. Stat. Soc., Ser. B, Stat. Methodol., 76(4): 795–816.

Miller RG (1962). Statistical prediction by discriminant analysis. In: Statistical Prediction by Discriminant Analysis, 1–54. Springer.

Mironov I (2017). Rényi differential privacy. In: 2017 IEEE 30th Computer Security Foundations Symposium (CSF), 263–275.

Movahedi M, Case BM, Honaker J, Knox A, Li L, Li YP, et al. (2021). Privacy-preserving randomized controlled trials: A protocol for industry scale deployment. In: Proceedings of the 2021 on Cloud Computing Security Workshop, 59–69. ACM.

Murphy AH (1972). Scalar and vector partitions of the probability score: Part I. Two-state situation. Journal of Applied Meteorology, 11(2): 273–282.

Vu D, Slavkovic A (2009). Differential privacy for clinical trial data: Preliminary evaluations. In: 2009 IEEE International Conference on Data Mining Workshops, 138–143.

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

calibration ratio Gaussian mechanism Laplace mechanism

Metrics

since February 2021

2102

Article info
views

716

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file