Network A/B Testing: Nonparametric Statistical Significance Test Based on Cluster-Level Permutation
Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 523–537
Pub. online: 25 July 2023
Type: Statistical Data Science
Open Access
Received
14 July 2023
14 July 2023
Accepted
14 July 2023
14 July 2023
Published
25 July 2023
25 July 2023
Abstract
A/B testing is widely used for comparing two versions of a product and evaluating new proposed product features. It is of great importance for decision-making and has been applied as a golden standard in the IT industry. It is essentially a form of two-sample statistical hypothesis testing. Average treatment effect (ATE) and the corresponding p-value can be obtained under certain assumptions. One key assumption in traditional A/B testing is the stable-unit-treatment-value assumption (SUTVA): there is no interference among different units. It means that the observation on one unit is unaffected by the particular assignment of treatments to the other units. Nonetheless, interference is very common in social network settings where people communicate and spread information to their neighbors. Therefore, the SUTVA assumption is violated. Analysis ignoring this network effect will lead to biased estimation of ATE. Most existing works focus mainly on the design of experiment and data analysis in order to produce estimators with good performance in regards to bias and variance. Little attention has been paid to the calculation of p-value. We work on the calculation of p-value for the ATE estimator in network A/B tests. After a brief review of existing research methods on design of experiment based on graph cluster randomization and different ATE estimation methods, we propose a permutation method for calculating p-value based on permutation test at the cluster level. The effectiveness of the method against that based on individual-level permutation is validated in a simulation study mimicking realistic settings.
Supplementary material
Supplementary MaterialThe zip supplementary material file contains the Python scripts for generating graph data, computing ATE estimators, estimating p-value via permutation tests, and generating figures in this paper.
References
Eckles D, Karrer B, Ugander J (2014). Design and analysis of experiments in networks: Reducing bias from interference. arXiv preprint: https://arxiv.org/abs/1404.7530.
Jiang B, Shi X, Shang H, Geng Z, Glass A (2016). A Framework for Network A/B Test. arXiv preprint: https://arxiv.org/abs/1610.07670.
Karypis G, Kumar V (1998). Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1): 96–129. https://doi.org/10.1006/jpdc.1997.1404
Kohavi R, Longbotham R, Walker T (2010). Online experiments: Practical lessons. Computer, 43(9): 82–85. https://doi.org/10.1109/MC.2010.264
Maris E, Oostenveld R (2007). Nonparametric statistical testing of EEG-and MEG-data. Journal of Neuroscience Methods, 164(1): 177–190. https://doi.org/10.1016/j.jneumeth.2007.03.024
Ugander J, Yin H (2023). Randomized graph cluster randomization. Journal of Causal Inference, 11(1): 20220014. https://doi.org/10.1515/jci-2022-0014
Watts DJ, Strogatz SH (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684): 440–442. https://doi.org/10.1038/30918