Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022
  4. Building a Foundation for More Flexible ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Building a Foundation for More Flexible A/B Testing: Applications of Interim Monitoring to Large Scale Data
Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 412–427
Wenru Zhou ORCID icon link to view author Wenru Zhou details   Miranda Kroehl   Maxene Meier     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/23-JDS1099
Pub. online: 21 April 2023      Type: Statistical Data Science      Open accessOpen Access

Received
15 December 2022
Accepted
17 April 2023
Published
21 April 2023

Abstract

The use of error spending functions and stopping rules has become a powerful tool for conducting interim analyses. The implementation of an interim analysis is broadly desired not only in traditional clinical trials but also in A/B tests. Although many papers have summarized error spending approaches, limited work has been done in the context of large-scale data that assists in finding the “optimal” boundary. In this paper, we summarized fifteen boundaries that consist of five error spending functions that allow early termination for futility, difference, or both, as well as a fixed sample size design without interim monitoring. The simulation is based on a practical A/B testing problem comparing two independent proportions. We examine sample sizes across a range of values from 500 to 250,000 per arm to reflect different settings where A/B testing may be utilized. The choices of optimal boundaries are summarized using a proposed loss function that incorporates different weights for the expected sample size under a null experiment with no difference between variants, the expected sample size under an experiment with a difference in the variants, and the maximum sample size needed if the A/B test did not stop early at an interim analysis. The results are presented for simulation settings based on adequately powered, under-powered, and over-powered designs with recommendations for selecting the “optimal” design in each setting.

Supplementary material

 Supplementary Material
All tables and Figures are uploaded as Supplementary Materials.

References

 
Armitage P, McPherson C, Rowe B (1969). Repeated significance tests on accumulating data. Journal of the Royal Statistical Society. Series A. General, 132(2): 235–244. https://doi.org/10.2307/2343787
 
Azevedo EM, Deng A, Montiel Olea Rao JL, Rao J Weyl EG (2020). A/b testing with fat tails. Journal of Political Economy, 128(12): 4614–000. https://doi.org/10.1086/710607
 
Balsubramani A, Ramdas A (2015). Sequential nonparametric testing with the law of the iterated logarithm. arXiv preprint: https://arxiv.org/abs/1506.03486.
 
D’agostino RB, Chase W, Belanger A (1988). The appropriateness of some common procedures for testing the equality of two independent binomial populations. American Statistician, 42(3): 198–202. https://doi.org/10.1080/00031305.1988.10475563
 
Demets DL, Lan KG (1994). Interim analysis: The alpha spending function approach. Statistics in Medicine, 13(13–14): 1341–1352. https://doi.org/10.1002/sim.4780131308
 
Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB (2015). Fundamentals of Clinical Trials. Springer.
 
Gao P, Ware JH, Mehta C (2008). Sample size re-estimation for adaptive sequential design in clinical trials. Journal of Biopharmaceutical Statistics, 18(6): 1184–1196. https://doi.org/10.1080/10543400802369053
 
Gordon Lan K, Reboussin DM, DeMets DL (1994). Information and information fractions for design and sequential monitoring of clinical trials. Communications in Statistics. Theory and Methods, 23(2): 403–420. https://doi.org/10.1080/03610929408831263
 
Haybittle J (1971). Repeated assessment of results in clinical trials of cancer treatment. British Journal of Radiology, 44(526): 793–797. https://doi.org/10.1259/0007-1285-44-526-793
 
Jennison C, Turnbull BW (1999). Group Sequential Methods with Applications to Clinical Trials. CRC Press.
 
Johari R, Koomen P, Pekelis L, Walsh D (2017). Peeking at a/b tests: Why it matters, and what to do about it. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1517–1525.
 
Johari R, Koomen P, Pekelis L, Walsh D (2022). Always valid inference: Continuous monitoring of a/b tests. Operations Research, 70(3): 1806–1821. https://doi.org/10.1287/opre.2021.2135
 
Johari R, Pekelis L, Walsh DJ (2015). Always valid inference: Bringing sequential analysis to a/b testing. arXiv preprint: https://arxiv.org/abs/1512.04922.
 
Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N (2013). Online controlled experiments at large scale. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1168–1176.
 
Koning R, Hasan S, Chatterji A (2022). Experimentation and start-up performance: Evidence from a/b testing. Management Science.
 
Miller E (2010). How Not to Run an A/B Test. URL: http://www.evanmiller.org/how-not-to-run-an-ab-test.html
 
Miller E (2015). Simple Sequential A/B Testing. URL http://www.evanmiller.org/sequential-abtesting.html, blog post.
 
O’Brien PC, Fleming TR (1979). A multiple testing procedure for clinical trials. Biometrics, 549–556. https://doi.org/10.2307/2530245
 
Pocock SJ (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika, 64(2): 191–199. https://doi.org/10.1093/biomet/64.2.191
 
Tamburrelli G, Margara A (2014). Towards automated a/b testing. In: International Symposium on Search Based Software Engineering, 184–198. Springer.
 
Wang SK, Tsiatis AA (1987). Approximately optimal one-parameter boundaries for group sequential trials. Biometrics, 193–199. https://doi.org/10.2307/2531959
 
Zhou W, Kroehl M, Meier M, Kaizer A (2023). Approaches to analyzing binary data for large-scale A/B testing. Contemporary Clinical Trials Communications, 101091–101091. https://doi.org/10.1016/j.conctc.2023.101091

PDF XML
PDF XML

Copyright
2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
A/B testing error spending function interim monitoring stopping rule

Funding
AMK and WZ supported by NHLBI K01 HL151754.

Metrics
since February 2021
517

Article info
views

282

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy