<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1099</article-id>
<article-id pub-id-type="doi">10.6339/23-JDS1099</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Building a Foundation for More Flexible A/B Testing: Applications of Interim Monitoring to Large Scale Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-0381-6489</contrib-id>
<name><surname>Zhou</surname><given-names>Wenru</given-names></name><email xlink:href="mailto:wenru.zhou@cuanschutz.edu">wenru.zhou@cuanschutz.edu</email><xref ref-type="aff" rid="j_jds1099_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Kroehl</surname><given-names>Miranda</given-names></name><xref ref-type="aff" rid="j_jds1099_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Meier</surname><given-names>Maxene</given-names></name><xref ref-type="aff" rid="j_jds1099_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Kaizer</surname><given-names>Alexander</given-names></name><xref ref-type="aff" rid="j_jds1099_aff_001">1</xref>
</contrib>
<aff id="j_jds1099_aff_001"><label>1</label>13001 E 17th Pl, Aurora, CO 80045, <institution>Department of Biostatistics and Informatics University of Colorado</institution>, <country>USA</country></aff>
<aff id="j_jds1099_aff_002"><label>2</label>6380 S Fiddlers Green Cir, Greenwood Village, CO 80111, <institution>Charter Communication</institution>, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:wenru.zhou@cuanschutz.edu">wenru.zhou@cuanschutz.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>21</day><month>4</month><year>2023</year></pub-date><volume>21</volume><issue>2</issue><fpage>412</fpage><lpage>427</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1099_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>All tables and Figures are uploaded as Supplementary Materials.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>15</day><month>12</month><year>2022</year></date><date date-type="accepted"><day>17</day><month>4</month><year>2023</year></date></history>
<permissions><copyright-statement>2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>The use of error spending functions and stopping rules has become a powerful tool for conducting interim analyses. The implementation of an interim analysis is broadly desired not only in traditional clinical trials but also in A/B tests. Although many papers have summarized error spending approaches, limited work has been done in the context of large-scale data that assists in finding the “optimal” boundary. In this paper, we summarized fifteen boundaries that consist of five error spending functions that allow early termination for futility, difference, or both, as well as a fixed sample size design without interim monitoring. The simulation is based on a practical A/B testing problem comparing two independent proportions. We examine sample sizes across a range of values from 500 to 250,000 per arm to reflect different settings where A/B testing may be utilized. The choices of optimal boundaries are summarized using a proposed loss function that incorporates different weights for the expected sample size under a null experiment with no difference between variants, the expected sample size under an experiment with a difference in the variants, and the maximum sample size needed if the A/B test did not stop early at an interim analysis. The results are presented for simulation settings based on adequately powered, under-powered, and over-powered designs with recommendations for selecting the “optimal” design in each setting.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>A/B testing</kwd>
<kwd>error spending function</kwd>
<kwd>interim monitoring</kwd>
<kwd>stopping rule</kwd>
</kwd-group>
<funding-group><award-group><funding-source xlink:href="https://doi.org/10.13039/100000050">NHLBI</funding-source><award-id>K01 HL151754</award-id></award-group><funding-statement>AMK and WZ supported by NHLBI K01 HL151754. </funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1099_reflist_001">
<title>References</title>
<ref id="j_jds1099_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Armitage</surname> <given-names>P</given-names></string-name>, <string-name><surname>McPherson</surname> <given-names>C</given-names></string-name>, <string-name><surname>Rowe</surname> <given-names>B</given-names></string-name> (<year>1969</year>). <article-title>Repeated significance tests on accumulating data</article-title>. <source><italic>Journal of the Royal Statistical Society. Series A. General</italic></source>, <volume>132</volume>(<issue>2</issue>): <fpage>235</fpage>–<lpage>244</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.2307/2343787" xlink:type="simple"> https://doi.org/10.2307/2343787</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Azevedo</surname> <given-names>EM</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>A</given-names></string-name>, <string-name><surname>Montiel Olea</surname> Rao <given-names>JL</given-names></string-name>, <string-name><surname>Rao</surname> <given-names>J</given-names></string-name> <string-name><surname>Weyl</surname> <given-names>EG</given-names></string-name> (<year>2020</year>). <article-title>A/b testing with fat tails</article-title>. <source><italic>Journal of Political Economy</italic></source>, <volume>128</volume>(<issue>12</issue>): <fpage>4614</fpage>–<lpage>000</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1086/710607" xlink:type="simple"> https://doi.org/10.1086/710607</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_003">
<mixed-citation publication-type="other"> <string-name><surname>Balsubramani</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ramdas</surname> <given-names>A</given-names></string-name> (2015). Sequential nonparametric testing with the law of the iterated logarithm. arXiv preprint: <uri>https://arxiv.org/abs/1506.03486</uri>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>D’agostino</surname> <given-names>RB</given-names></string-name>, <string-name><surname>Chase</surname> <given-names>W</given-names></string-name>, <string-name><surname>Belanger</surname> <given-names>A</given-names></string-name> (<year>1988</year>). <article-title>The appropriateness of some common procedures for testing the equality of two independent binomial populations</article-title>. <source><italic>American Statistician</italic></source>, <volume>42</volume>(<issue>3</issue>): <fpage>198</fpage>–<lpage>202</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1080/00031305.1988.10475563" xlink:type="simple"> https://doi.org/10.1080/00031305.1988.10475563</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Demets</surname> <given-names>DL</given-names></string-name>, <string-name><surname>Lan</surname> <given-names>KG</given-names></string-name> (<year>1994</year>). <article-title>Interim analysis: The alpha spending function approach</article-title>. <source><italic>Statistics in Medicine</italic></source>, <volume>13</volume>(<issue>13–14</issue>): <fpage>1341</fpage>–<lpage>1352</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1002/sim.4780131308" xlink:type="simple"> https://doi.org/10.1002/sim.4780131308</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_006">
<mixed-citation publication-type="book"> <string-name><surname>Friedman</surname> <given-names>LM</given-names></string-name>, <string-name><surname>Furberg</surname> <given-names>CD</given-names></string-name>, <string-name><surname>DeMets</surname> <given-names>DL</given-names></string-name>, <string-name><surname>Reboussin</surname> <given-names>DM</given-names></string-name>, <string-name><surname>Granger</surname> <given-names>CB</given-names></string-name> (<year>2015</year>). <source><italic>Fundamentals of Clinical Trials</italic></source>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Gao</surname> <given-names>P</given-names></string-name>, <string-name><surname>Ware</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Mehta</surname> <given-names>C</given-names></string-name> (<year>2008</year>). <article-title>Sample size re-estimation for adaptive sequential design in clinical trials</article-title>. <source><italic>Journal of Biopharmaceutical Statistics</italic></source>, <volume>18</volume>(<issue>6</issue>): <fpage>1184</fpage>–<lpage>1196</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1080/10543400802369053" xlink:type="simple"> https://doi.org/10.1080/10543400802369053</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Gordon Lan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Reboussin</surname> <given-names>DM</given-names></string-name>, <string-name><surname>DeMets</surname> <given-names>DL</given-names></string-name> (<year>1994</year>). <article-title>Information and information fractions for design and sequential monitoring of clinical trials</article-title>. <source><italic>Communications in Statistics. Theory and Methods</italic></source>, <volume>23</volume>(<issue>2</issue>): <fpage>403</fpage>–<lpage>420</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1080/03610929408831263" xlink:type="simple"> https://doi.org/10.1080/03610929408831263</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Haybittle</surname> <given-names>J</given-names></string-name> (<year>1971</year>). <article-title>Repeated assessment of results in clinical trials of cancer treatment</article-title>. <source><italic>British Journal of Radiology</italic></source>, <volume>44</volume>(<issue>526</issue>): <fpage>793</fpage>–<lpage>797</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1259/0007-1285-44-526-793" xlink:type="simple"> https://doi.org/10.1259/0007-1285-44-526-793</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_010">
<mixed-citation publication-type="book"> <string-name><surname>Jennison</surname> <given-names>C</given-names></string-name>, <string-name><surname>Turnbull</surname> <given-names>BW</given-names></string-name> (<year>1999</year>). <source><italic>Group Sequential Methods with Applications to Clinical Trials</italic></source>. <publisher-name>CRC Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_011">
<mixed-citation publication-type="chapter"> <string-name><surname>Johari</surname> <given-names>R</given-names></string-name>, <string-name><surname>Koomen</surname> <given-names>P</given-names></string-name>, <string-name><surname>Pekelis</surname> <given-names>L</given-names></string-name>, <string-name><surname>Walsh</surname> <given-names>D</given-names></string-name> (<year>2017</year>). <chapter-title>Peeking at a/b tests: Why it matters, and what to do about it</chapter-title>. In: <source><italic>Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</italic></source>, <fpage>1517</fpage>–<lpage>1525</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Johari</surname> <given-names>R</given-names></string-name>, <string-name><surname>Koomen</surname> <given-names>P</given-names></string-name>, <string-name><surname>Pekelis</surname> <given-names>L</given-names></string-name>, <string-name><surname>Walsh</surname> <given-names>D</given-names></string-name> (<year>2022</year>). <article-title>Always valid inference: Continuous monitoring of a/b tests</article-title>. <source><italic>Operations Research</italic></source>, <volume>70</volume>(<issue>3</issue>): <fpage>1806</fpage>–<lpage>1821</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1287/opre.2021.2135" xlink:type="simple"> https://doi.org/10.1287/opre.2021.2135</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_013">
<mixed-citation publication-type="other"> <string-name><surname>Johari</surname> <given-names>R</given-names></string-name>, <string-name><surname>Pekelis</surname> <given-names>L</given-names></string-name>, <string-name><surname>Walsh</surname> <given-names>DJ</given-names></string-name> (2015). Always valid inference: Bringing sequential analysis to a/b testing. arXiv preprint: <uri>https://arxiv.org/abs/1512.04922</uri>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_014">
<mixed-citation publication-type="chapter"> <string-name><surname>Kohavi</surname> <given-names>R</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>A</given-names></string-name>, <string-name><surname>Frasca</surname> <given-names>B</given-names></string-name>, <string-name><surname>Walker</surname> <given-names>T</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Pohlmann</surname> <given-names>N</given-names></string-name> (<year>2013</year>). <chapter-title>Online controlled experiments at large scale</chapter-title>. In: <source><italic>Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</italic></source>, <fpage>1168</fpage>–<lpage>1176</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_015">
<mixed-citation publication-type="other"> <string-name><surname>Koning</surname> <given-names>R</given-names></string-name>, <string-name><surname>Hasan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chatterji</surname> <given-names>A</given-names></string-name> (2022). Experimentation and start-up performance: Evidence from a/b testing. <italic>Management Science.</italic></mixed-citation>
</ref>
<ref id="j_jds1099_ref_016">
<mixed-citation publication-type="book"> <string-name><surname>Miller</surname> <given-names>E</given-names></string-name> (<year>2010</year>). <source><italic>How Not to Run an A/B Test</italic></source>. <comment>URL: <ext-link ext-link-type="uri" xlink:href="http://www.evanmiller.org/how-not-to-run-an-ab-test.html">http://www.evanmiller.org/how-not-to-run-an-ab-test.html</ext-link></comment></mixed-citation>
</ref>
<ref id="j_jds1099_ref_017">
<mixed-citation publication-type="book"> <string-name><surname>Miller</surname> <given-names>E</given-names></string-name> (<year>2015</year>). <source><italic>Simple Sequential A/B Testing</italic></source>. <comment>URL <ext-link ext-link-type="uri" xlink:href="http://www.evanmiller.org/sequential-abtesting.html">http://www.evanmiller.org/sequential-abtesting.html</ext-link>, blog post</comment>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>O’Brien</surname> <given-names>PC</given-names></string-name>, <string-name><surname>Fleming</surname> <given-names>TR</given-names></string-name> (<year>1979</year>). <article-title>A multiple testing procedure for clinical trials</article-title>. <source><italic>Biometrics</italic></source>, <fpage>549</fpage>–<lpage>556</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.2307/2530245" xlink:type="simple"> https://doi.org/10.2307/2530245</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Pocock</surname> <given-names>SJ</given-names></string-name> (<year>1977</year>). <article-title>Group sequential methods in the design and analysis of clinical trials</article-title>. <source><italic>Biometrika</italic></source>, <volume>64</volume>(<issue>2</issue>): <fpage>191</fpage>–<lpage>199</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1093/biomet/64.2.191" xlink:type="simple"> https://doi.org/10.1093/biomet/64.2.191</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_020">
<mixed-citation publication-type="chapter"> <string-name><surname>Tamburrelli</surname> <given-names>G</given-names></string-name>, <string-name><surname>Margara</surname> <given-names>A</given-names></string-name> (<year>2014</year>). <chapter-title>Towards automated a/b testing</chapter-title>. In: <source><italic>International Symposium on Search Based Software Engineering</italic></source>, <fpage>184</fpage>–<lpage>198</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1099_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>SK</given-names></string-name>, <string-name><surname>Tsiatis</surname> <given-names>AA</given-names></string-name> (<year>1987</year>). <article-title>Approximately optimal one-parameter boundaries for group sequential trials</article-title>. <source><italic>Biometrics</italic></source>, <fpage>193</fpage>–<lpage>199</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.2307/2531959" xlink:type="simple"> https://doi.org/10.2307/2531959</ext-link></mixed-citation>
</ref>
<ref id="j_jds1099_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Zhou</surname> <given-names>W</given-names></string-name>, <string-name><surname>Kroehl</surname> <given-names>M</given-names></string-name>, <string-name><surname>Meier</surname> <given-names>M</given-names></string-name>, <string-name><surname>Kaizer</surname> <given-names>A</given-names></string-name> (<year>2023</year>). <article-title>Approaches to analyzing binary data for large-scale A/B testing</article-title>. <source><italic>Contemporary Clinical Trials Communications</italic></source>, <fpage>101091</fpage>–<lpage>101091</lpage>. <ext-link ext-link-type="doi" xlink:href=" https://doi.org/10.1016/j.conctc.2023.101091" xlink:type="simple"> https://doi.org/10.1016/j.conctc.2023.101091</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
