For large observational studies lacking a control group (unlike randomized controlled trials, RCT), propensity scores (PS) are often the method of choice to account for pre-treatment confounding in baseline characteristics, and thereby avoid substantial bias in treatment estimation. A vast majority of PS techniques focus on average treatment effect estimation, without any clear consensus on how to account for confounders, especially in a multiple treatment setting. Furthermore, for time-to event outcomes, the analytical framework is further complicated in presence of high censoring rates (sometimes, due to non-susceptibility of study units to a disease), imbalance between treatment groups, and clustered nature of the data (where, survival outcomes appear in groups). Motivated by a right-censored kidney transplantation dataset derived from the United Network of Organ Sharing (UNOS), we investigate and compare two recent promising PS procedures, (a) the generalized boosted model (GBM), and (b) the covariate-balancing propensity score (CBPS), in an attempt to decouple the causal effects of treatments (here, study subgroups, such as hepatitis C virus (HCV) positive/negative donors, and positive/negative recipients) on time to death of kidney recipients due to kidney failure, post transplantation. For estimation, we employ a 2-step procedure which addresses various complexities observed in the UNOS database within a unified paradigm. First, to adjust for the large number of confounders on the multiple sub-groups, we fit multinomial PS models via procedures (a) and (b). In the next stage, the estimated PS is incorporated into the likelihood of a semi-parametric cure rate Cox proportional hazard frailty model via inverse probability of treatment weighting, adjusted for multi-center clustering and excess censoring, Our data analysis reveals a more informative and superior performance of the full model in terms of treatment effect estimation, over sub-models that relaxes the various features of the event time dataset.

Monitoring COVID-19-Induced Gender Differences in Teleworking Rates Using Mobile Network Data

Sara Grubanov-Boskovic Spyridon Spyratos Stefano Maria Iacus All authors (5)

https://doi.org/10.6339/22-JDS1043

Pub. online: 20 Apr 2022 Type: Data Science In Action

Open Access

Journal: Journal of Data Science Volume 20, Issue 2 (2022), pp. 209–227

Abstract

An Effective Tensor Regression with Latent Sparse Regularization

Ko-shin Chen Tingyang Xu Guannan Liang All authors (6)

https://doi.org/10.6339/22-JDS1048

Pub. online: 9 May 2022 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 20, Issue 2 (2022), pp. 228–252

Abstract

Inference for Optimal Differential Privacy Procedures for Frequency Tables

Chengcheng Li Naisyin Wang Gongjun Xu

https://doi.org/10.6339/22-JDS1044

Pub. online: 20 Apr 2022 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 20, Issue 2 (2022), pp. 253–276

Abstract

When releasing data to the public, a vital concern is the risk of exposing personal information of the individuals who have contributed to the data set. Many mechanisms have been proposed to protect individual privacy, though less attention has been dedicated to practically conducting valid inferences on the altered privacy-protected data sets. For frequency tables, the privacy-protection-oriented perturbations often lead to negative cell counts. Releasing such tables can undermine users’ confidence in the usefulness of such data sets. This paper focuses on releasing one-way frequency tables. We recommend an optimal mechanism that satisfies ϵ-differential privacy (DP) without suffering from having negative cell counts. The procedure is optimal in the sense that the expected utility is maximized under a given privacy constraint. Valid inference procedures for testing goodness-of-fit are also developed for the DP privacy-protected data. In particular, we propose a de-biased test statistic for the optimal procedure and derive its asymptotic distribution. In addition, we also introduce testing procedures for the commonly used Laplace and Gaussian mechanisms, which provide a good finite sample approximation for the null distributions. Moreover, the decaying rate requirements for the privacy regime are provided for the inference procedures to be valid. We further consider common users’ practices such as merging related or neighboring cells or integrating statistical information obtained across different data sources and derive valid testing procedures when these operations occur. Simulation studies show that our inference results hold well even when the sample size is relatively small. Comparisons with the current field standards, including the Laplace, the Gaussian (both with/without post-processing of replacing negative cell counts with zeros), and the Binomial-Beta McClure-Reiter mechanisms, are carried out. In the end, we apply our method to the National Center for Early Development and Learning’s (NCEDL) multi-state studies data to demonstrate its practical applicability.

RSS

Volume 20, Issue 2 (2022), April 2022

Export citation

Copy and paste formatted citation

Download citation in file

Authors