<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1011</article-id>
<article-id pub-id-type="doi">10.6339/21-JDS1011</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Fast and Efficient Data Science Techniques for COVID-19 Group Testing</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-9697-6686</contrib-id>
<name><surname>Kutateladze</surname><given-names>Varlam</given-names></name><email xlink:href="mailto:varlam.kutateladze@email.ucr.edu">varlam.kutateladze@email.ucr.edu</email><xref ref-type="aff" rid="j_jds1011_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-4591-4239</contrib-id>
<name><surname>Seregina</surname><given-names>Ekaterina</given-names></name><xref ref-type="aff" rid="j_jds1011_aff_001">1</xref>
</contrib>
<aff id="j_jds1011_aff_001"><label>1</label>Department of Economics, <institution>University of California</institution>, Riverside, CA 92521, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:varlam.kutateladze@email.ucr.edu">varlam.kutateladze@email.ucr.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>26</day><month>3</month><year>2021</year></pub-date>
<volume>19</volume><issue>3</issue><fpage>390</fpage><lpage>408</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1011_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The code supplement (<xref ref-type="bibr" rid="j_jds1011_ref_015">Kutateladze and Seregina</xref>, <xref ref-type="bibr" rid="j_jds1011_ref_015">2020</xref>) is available in Google Colab environment. It is written in Python and readily allows to replicate all the graphs provided, as well as produce additional exercises.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>17</day><month>12</month><year>2020</year></date><date date-type="accepted"><day>5</day><month>3</month><year>2021</year></date></history>
<permissions><copyright-statement>2021 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Researchers and public officials tend to agree that until a vaccine is readily available, stopping SARS-CoV-2 transmission is the name of the game. Testing is the key to preventing the spread, especially by asymptomatic individuals. With testing capacity restricted, group testing is an appealing alternative for comprehensive screening and has recently received FDA emergency authorization. This technique tests pools of individual samples, thereby often requiring fewer testing resources while potentially providing multiple folds of speedup. We approach group testing from a data science perspective and offer two contributions. First, we provide an extensive empirical comparison of modern group testing techniques based on simulated data. Second, we propose a simple one-round method based on <inline-formula id="j_jds1011_ineq_001"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi>ℓ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\ell _{1}}$]]></tex-math></alternatives></inline-formula>-norm sparse recovery, which outperforms current state-of-the-art approaches at certain disease prevalence rates.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>compressed sensing</kwd>
<kwd>coronavirus</kwd>
<kwd>lasso</kwd>
<kwd>pooled testing</kwd>
<kwd>SARS-CoV-2</kwd>
<kwd>sensing matrix</kwd>
<kwd>sparse recovery</kwd>
</kwd-group>
<funding-group><funding-statement>This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Declarations of interest: none.</funding-statement></funding-group>
</article-meta>
</front>
<body/>
<back>
<ref-list id="j_jds1011_reflist_001">
<title>References</title>
<ref id="j_jds1011_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Abdalhamid</surname> <given-names>B</given-names></string-name>, <string-name><surname>Bilder</surname> <given-names>CR</given-names></string-name>, <string-name><surname>McCutchen</surname> <given-names>EL</given-names></string-name>, <string-name><surname>Hinrichs</surname> <given-names>SH</given-names></string-name>, <string-name><surname>Koepsell</surname> <given-names>SA</given-names></string-name>, <string-name><surname>Iwen</surname> <given-names>PC</given-names></string-name> (<year>2020</year>). <article-title>Assessment of specimen pooling to conserve SARS CoV-2 testing resources</article-title>. <source>American Journal of Clinical Pathology</source>, <volume>153</volume>(<issue>6</issue>): <fpage>715</fpage>–<lpage>718</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_002">
<mixed-citation publication-type="chapter"> <string-name><surname>Aldridge</surname> <given-names>M</given-names></string-name>, <string-name><surname>Johnson</surname> <given-names>O</given-names></string-name>, <string-name><surname>Scarlett</surname> <given-names>J</given-names></string-name> (<year>2016</year>). <chapter-title>Improved group testing rates with constant column weight designs</chapter-title>. In: <source>2016 IEEE International Symposium on Information Theory (ISIT)</source>, <fpage>1381</fpage>–<lpage>1385</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Bandeira</surname> <given-names>AS</given-names></string-name>, <string-name><surname>Dobriban</surname> <given-names>E</given-names></string-name>, <string-name><surname>Mixon</surname> <given-names>DG</given-names></string-name>, <string-name><surname>Sawin</surname> <given-names>WF</given-names></string-name> (<year>2013</year>). <article-title>Certifying the restricted isometry property is hard</article-title>. <source>IEEE Transactions on Information Theory</source>, <volume>59</volume>(<issue>6</issue>): <fpage>3448</fpage>–<lpage>3450</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Baraniuk</surname> <given-names>R</given-names></string-name>, <string-name><surname>Davenport</surname> <given-names>M</given-names></string-name>, <string-name><surname>DeVore</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wakin</surname> <given-names>M</given-names></string-name> (<year>2008</year>). <article-title>A simple proof of the restricted isometry property for random matrices</article-title>. <source>Constructive Approximation</source>, <volume>28</volume>(<issue>3</issue>): <fpage>253</fpage>–<lpage>263</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Candes</surname> <given-names>EJ</given-names></string-name>, <string-name><surname>Romberg</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>T</given-names></string-name> (<year>2006</year>). <article-title>Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information</article-title>. <source>IEEE Transactions on Information Theory</source>, <volume>52</volume>(<issue>2</issue>): <fpage>489</fpage>–<lpage>509</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_006">
<mixed-citation publication-type="chapter"> <string-name><surname>Chan</surname> <given-names>CL</given-names></string-name>, <string-name><surname>Che</surname> <given-names>PH</given-names></string-name>, <string-name><surname>Jaggi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Saligrama</surname> <given-names>V</given-names></string-name> (<year>2011</year>). <chapter-title>Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms</chapter-title>. In: <source>2011 49th Annual Allerton Conference on Communication, Control, and Computing</source> (<conf-loc>Allerton</conf-loc>), <fpage>1832</fpage>–<lpage>1839</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Donoho</surname> <given-names>DL</given-names></string-name> (<year>2006</year>). <article-title>Compressed sensing</article-title>. <source>IEEE Transactions on Information Theory</source>, <volume>52</volume>(<issue>4</issue>): <fpage>1289</fpage>–<lpage>1306</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Dorfman</surname> <given-names>R</given-names></string-name> (<year>1943</year>). <article-title>The detection of defective members of large populations</article-title>. <source>Ann. Math. Statist.</source>, <volume>14</volume>(<issue>4</issue>): <fpage>436</fpage>–<lpage>440</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Emmanuel</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Bassett</surname> <given-names>MT</given-names></string-name>, <string-name><surname>Smith</surname> <given-names>HJ</given-names></string-name>, <string-name><surname>Jacobs</surname> <given-names>JA</given-names></string-name> (<year>1988</year>). <article-title>Pooling of sera for human immunodeficiency virus (hiv) testing: An economical method for use in developing countries</article-title>. <source>Journal of Clinical Pathology</source>, <volume>41</volume>(<issue>5</issue>): <fpage>582</fpage>–<lpage>585</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_010">
<mixed-citation publication-type="other"> FDA (2020). Emergency Authorization for Sample Pooling. <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-issues-first-emergency-authorization-sample-pooling-diagnostic">https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-issues-first-emergency-authorization-sample-pooling-diagnostic</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_011">
<mixed-citation publication-type="other"> <string-name><surname>Ghosh</surname> <given-names>S</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>R</given-names></string-name>, <string-name><surname>Rehan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Pathak</surname> <given-names>S</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>P</given-names></string-name>, <string-name><surname>Gupta</surname> <given-names>Y</given-names></string-name>, et al. (2020). A compressed sensing approach to group-testing for COVID-19 detection.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Hogan</surname> <given-names>CA</given-names></string-name>, <string-name><surname>Sahoo</surname> <given-names>MK</given-names></string-name>, <string-name><surname>Pinsky</surname> <given-names>BA</given-names></string-name> (<year>2020</year>). <article-title>Sample pooling as a strategy to detect community transmission of SARS-CoV-2</article-title>. <source>JAMA</source>, <volume>323</volume>(<issue>19</issue>): <fpage>1967</fpage>–<lpage>1969</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_013">
<mixed-citation publication-type="book"> <string-name><surname>Hughes-Oliver</surname> <given-names>JM</given-names></string-name> (<year>2006</year>). <source>Pooling Experiments for Blood Screening and Drug Discovery</source>. <fpage>48</fpage>–<lpage>68</lpage>. <publisher-name>Springer, New York</publisher-name>, <publisher-loc>New York, NY</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Johnson</surname> <given-names>O</given-names></string-name>, <string-name><surname>Aldridge</surname> <given-names>M</given-names></string-name>, <string-name><surname>Scarlett</surname> <given-names>J</given-names></string-name> (<year>2019</year>). <article-title>Performance of group testing algorithms with near-constant tests per item</article-title>. <source>IEEE Transactions on Information Theory</source>, <volume>65</volume>(<issue>2</issue>): <fpage>707</fpage>–<lpage>723</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_015">
<mixed-citation publication-type="other"> <string-name><surname>Kutateladze</surname> <given-names>V</given-names></string-name>, <string-name><surname>Seregina</surname> <given-names>E</given-names></string-name> (2020). Code supplement to “Fast and Efficient Data Science Techniques for COVID-19 Group Testing. <uri>https://tinyurl.com/y4vo86sb</uri>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_016">
<mixed-citation publication-type="other"> <string-name><surname>Litvak</surname> <given-names>E</given-names></string-name>, <string-name><surname>Tu</surname> <given-names>XM</given-names></string-name>, <string-name><surname>Pagano</surname> <given-names>M</given-names></string-name> (1994). Screening for the presence of a disease by pooling sera samples.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_017">
<mixed-citation publication-type="other"> <string-name><surname>Mutesa</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ndishimye</surname> <given-names>P</given-names></string-name>, <string-name><surname>Butera</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Souopgui</surname> <given-names>J</given-names></string-name>, <string-name><surname>Uwineza</surname> <given-names>A</given-names></string-name>, <string-name><surname>Rutayisire</surname> <given-names>R</given-names></string-name>, et al. (2020). A strategy for finding people infected with SARS-CoV-2: optimizing pooled testing at low prevalence. medRxiv preprint: <uri>https://doi.org/10.1101/2020.05.02.20087924</uri>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_018">
<mixed-citation publication-type="chapter"> <string-name><surname>Chen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Donoho</surname> <given-names>D</given-names></string-name> (<year>1994</year>). <chapter-title>Basis pursuit</chapter-title>. In: <source>Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers</source>, volume <volume>1</volume>, <fpage>41</fpage>–<lpage>44</lpage>. <comment>1</comment>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Sobel</surname> <given-names>M</given-names></string-name>, <string-name><surname>Groll</surname> <given-names>PA</given-names></string-name> (<year>1959</year>). <article-title>Group testing to eliminate efficiently all defectives in a binomial sample</article-title>. <source>Bell System Technical Journal</source>, <volume>38</volume>(<issue>5</issue>): <fpage>1179</fpage>–<lpage>1252</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Sterrett</surname> <given-names>A</given-names></string-name> (<year>1957</year>). <article-title>On the detection of defective members of large populations</article-title>. <source>The Annals of Mathematical Statistics</source>, <volume>28</volume>(<issue>4</issue>): <fpage>1033</fpage>–<lpage>1036</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Taylor</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Juliano</surname> <given-names>JJ</given-names></string-name>, <string-name><surname>Trottman</surname> <given-names>PA</given-names></string-name>, <string-name><surname>Griffin</surname> <given-names>JB</given-names></string-name>, <string-name><surname>Landis</surname> <given-names>SH</given-names></string-name>, <string-name><surname>Kitsa</surname> <given-names>P</given-names></string-name>, <etal>et al.</etal> (<year>2010</year>). <article-title>High-throughput pooling and real-time pcr-based strategy for malaria detection</article-title>. <source>Journal of Clinical Microbiology</source>, <volume>48</volume>(<issue>2</issue>): <fpage>512</fpage>–<lpage>519</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Van</surname> <given-names>TT</given-names></string-name>, <string-name><surname>Miller</surname> <given-names>J</given-names></string-name>, <string-name><surname>Warshauer</surname> <given-names>DM</given-names></string-name>, <string-name><surname>Reisdorf</surname> <given-names>E</given-names></string-name>, <string-name><surname>Jernigan</surname> <given-names>D</given-names></string-name>, <string-name><surname>Humes</surname> <given-names>R</given-names></string-name>, <etal>et al.</etal> (<year>2012</year>). <article-title>Pooling nasopharyngeal/throat swab specimens to increase testing capacity for influenza viruses by pcr</article-title>. <source>Journal of Clinical Microbiology</source>, <volume>50</volume>(<issue>3</issue>): <fpage>891</fpage>–<lpage>896</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_023">
<mixed-citation publication-type="other"> Worldometer (2020). US SARS-CoV-2 cases. <uri>https://www.worldometers.info/coronavirus/country/us/</uri>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_024">
<mixed-citation publication-type="other"> <string-name><surname>Yelin</surname> <given-names>I</given-names></string-name>, <string-name><surname>Aharony</surname> <given-names>N</given-names></string-name>, <string-name><surname>Shaer-Tamar</surname> <given-names>E</given-names></string-name>, <string-name><surname>Argoetti</surname> <given-names>A</given-names></string-name>, <string-name><surname>Messer</surname> <given-names>E</given-names></string-name>, <string-name><surname>Berenbaum</surname> <given-names>D</given-names></string-name>, et al. (2020). Evaluation of COVID-19 rt-qpcr test in multi-sample pools. medRxiv preprint: <uri>https://doi.org/10.1101/2020.03.26.20039438</uri>.</mixed-citation>
</ref>
<ref id="j_jds1011_ref_025">
<mixed-citation publication-type="other"> <string-name><surname>Yi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Mudumbai</surname> <given-names>R</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>W</given-names></string-name> (2020). Low-cost and high-throughput testing of COVID-19 viruses and antibodies via compressed sensing: System concepts and computational experiments.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
