Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums
Volume 21, Issue 1 (2023), pp. 27–42
Pub. online: 16 June 2022 Type: Data Science In Action Open Access
14 April 2022
14 April 2022
21 May 2022
21 May 2022
16 June 2022
16 June 2022
The ratio of two Gaussians is useful in many contexts of statistical inference. We discuss statistically valid inference of the ratio under Differential Privacy (DP). We use the delta method to derive the asymptotic distribution of the ratio estimator and use the Gaussian mechanism to provide (epsilon, delta)-DP guarantees. Like many statistics, quantities involved in the inference of a ratio can be re-written as functions of sums, and sums are easy to work with for many reasons. In the context of DP, the sensitivity of a sum is easy to calculate. We focus on getting the correct coverage probability of 95% confidence intervals (CIs) of the DP ratio estimator. Our simulations show that the no-correction method, which ignores the DP noise, gives CIs that are too narrow to provide proper coverage for small samples. In our specific simulation scenario, the coverage of 95% CIs can be as low as below 10%. We propose two methods to mitigate the under-coverage issue, one based on Monte Carlo simulation and the other based on analytical correction. We show that the CIs of our methods have much better coverage with reasonable privacy budgets. In addition, our methods can handle weighted data, when the weights are fixed and bounded.
Supplementary materialSupplementary Material
Our simulation code is hosted on GitHub in a Jupyter Notebook file. Replication instructions can be found in the Supplementary Material.
Abowd J, Kifer D, Moran B, Ashmead R, Sexton W (2019). Census topdown: Differentially private data, incremental schemas, and consistency with public knowledge. https://systems.cs.columbia.edu/private-systems-class/papers/Abowd2019Census.pdf.
Alao R, Bogen M, Miao J, Mironov I, Tannen J (2021). How Meta is Working to Assess Fairness in Relation to Race in the U.S. Across Its Products and Systems. https://ai.facebook.com/research/publications/how-meta-is-working-to-assess-fairness-in-relation-to-race-in-the-us-across-its-products-and-systems.
Brawner T, Honaker J (2018). Bootstrap inference and differential privacy: Standard errors for free. https://hona.kr/papers/HonakerPrivacyBootstrap.pdf.
Covington C, He X, Honaker J, Kamath G (2021). Unbiased statistical estimation and valid confidence intervals under differential privacy. arXiv preprint: https://arxiv.org/abs/2110.14465.
Dong J, Roth A, Su W (2019). Gaussian differential privacy. arXiv preprint: https://arxiv.org/abs/1905.02383.
D’Orazio V, Honaker J, King G (2015). Differential privacy for social science inference. https://papers.ssrn.com/abstract=2676160.
Du W, Foot C, Moniot M, Bray A, Groce A (2020). Differentially private confidence intervals. arXiv preprint: http://arxiv.org/abs/2001.02285.
Evans G, King G, Schwenzfeier M, Thakurta A (2019). Statistically valid inferences from privacy protected data. https://www.semanticscholar.org/paper/Statistically-Valid-Inferences-from-Privacy-Data-Evans-King/1ac938181a198b1c6b7b46126a93ac3ae3e4cc60.
Ferrando C, Wang S, Sheldon D (2020). General-purpose differentially-private confidence intervals. arXiv preprint: http://arxiv.org/abs/2006.07749.
Geng Q, Ding W, Guo R, Kumar S (2020). Tight analysis of privacy and utility tradeoff in approximate differential privacy. In: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (S Chiappa, R Calandra, eds.), volume 108 of Proceedings of Machine Learning Research, 89–99. PMLR.
Hirschberg J, Lye JN (2007). Providing intuition to the Fieller method with two geometric representations using stata and Eviews. https://minerva-access.unimelb.edu.au/handle/11343/34613.
Karwa V, Vadhan S (2017). Finite sample differentially private confidence intervals. arXiv preprint: https://arxiv.org/abs/1711.03908.