<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1199</article-id>
<article-id pub-id-type="doi">10.6339/25-JDS1199</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Pseudo Partial Likelihood Method for Proportional Hazards Models when Time Origin Is Missing for Control Group with Applications to SARS-CoV-2 Seroprevalence Study</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-9125-9277</contrib-id>
<name><surname>Chung</surname><given-names>Yunro</given-names></name><xref ref-type="aff" rid="j_jds1199_aff_001">1</xref><xref ref-type="aff" rid="j_jds1199_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Murugan</surname><given-names>Vel</given-names></name><xref ref-type="aff" rid="j_jds1199_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Beyene</surname><given-names>Kassu Mehari</given-names></name><xref ref-type="aff" rid="j_jds1199_aff_003">3</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Ding-Geng</given-names></name><email xlink:href="mailto:Ding-Geng.Chen@asu.edu">Ding-Geng.Chen@asu.edu</email><xref ref-type="aff" rid="j_jds1199_aff_001">1</xref><xref ref-type="aff" rid="j_jds1199_aff_004">4</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1199_aff_001"><label>1</label>College of Health Solutions, <institution>Arizona State University</institution>, Phoenix, AZ, <country>U.S.A.</country></aff>
<aff id="j_jds1199_aff_002"><label>2</label>Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, <institution>Arizona State University</institution>, Tempe, AZ, <country>U.S.A.</country></aff>
<aff id="j_jds1199_aff_003"><label>3</label>Department of Neurology, <institution>Barrow Neurological Institute</institution>, Phoenix, AZ, <country>U.S.A.</country></aff>
<aff id="j_jds1199_aff_004"><label>4</label>Department of Statistics, <institution>University of Pretoria</institution>, Pretoria, Gauteng, <country>South Africa</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:Ding-Geng.Chen@asu.edu">Ding-Geng.Chen@asu.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>7</day><month>10</month><year>2025</year></pub-date><volume content-type="ahead-of-print">0</volume><issue>0</issue><fpage>1</fpage><lpage>14</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1199_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>Sections A and B of the Supplementary Material provide the proofs of Theorems 1–2 and additional simulation results, respectively. The SARS-CoV-2 serological prevalence data and corresponding R code used for analysis are also included in the Supplementary Material. The <italic>coxphm</italic> package (<xref ref-type="bibr" rid="j_jds1199_ref_004">Chung</xref>, <xref ref-type="bibr" rid="j_jds1199_ref_004">2025</xref>), which implements the methods developed in this article, is publicly available on CRAN.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>7</day><month>12</month><year>2024</year></date><date date-type="accepted"><day>18</day><month>9</month><year>2025</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Time-to-event data analysis without a well-defined time origin commonly occurs in observational studies that retrospectively collect survival endpoints. For instance, after enrolling participants who have or have not received a specific treatment, an event status can be observed for all participants; however, the start date of treatment is only observable for the treatment group. The corresponding time origin does not exist for the control group, resulting in missing survival time data. Complete-case analysis is often considered the standard approach, but it disregards information from all participants in the control group and does not allow us to compare their survival distributions. To address this challenge, we propose a novel semiparametric proportional hazards model by regarding these missing time origins as nuisance parameters. We approximate the risk sets as cumulative normal distributions to deal with these nuisance parameters and develop estimation and inference procedures for our proposed estimator. We study the asymptotic properties of this model and conduct the simulation studies to validate its finite sample property. Analysis of data from a recent SARS-CoV-2 seroprevaluence study illustrates the applicability of our methods. The proposed methods are implemented in the R package <italic>coxphm</italic>.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>COVID-19</kwd>
<kwd>missing data</kwd>
<kwd>observational study</kwd>
<kwd>right censoring</kwd>
<kwd>semiparametric regression</kwd>
<kwd>vaccine efficacy</kwd>
</kwd-group>
<funding-group><funding-statement>The work was supported by funding from Arizona State University Knowledge Enterprise.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1199_reflist_001">
<title>References</title>
<ref id="j_jds1199_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Anand</surname> <given-names>S</given-names></string-name>, <string-name><surname>Montez-Rath</surname> <given-names>M</given-names></string-name>, <string-name><surname>Han</surname> <given-names>J</given-names></string-name>, <string-name><surname>Bozeman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kerschmann</surname> <given-names>R</given-names></string-name>, <string-name><surname>Beyer</surname> <given-names>P</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on dialysis in the USA: a cross-sectional study</article-title>. <source><italic>The Lancet</italic></source>, <volume>396</volume>(<issue>10259</issue>): <fpage>1335</fpage>–<lpage>1344</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/S0140-6736(20)32009-2" xlink:type="simple">https://doi.org/10.1016/S0140-6736(20)32009-2</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Baden</surname> <given-names>LR</given-names></string-name>, <string-name><surname>El Sahly</surname> <given-names>HM</given-names></string-name>, <string-name><surname>Essink</surname> <given-names>B</given-names></string-name>, <string-name><surname>Kotloff</surname> <given-names>K</given-names></string-name>, <string-name><surname>Frey</surname> <given-names>S</given-names></string-name>, <string-name><surname>Novak</surname> <given-names>R</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <article-title>Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine</article-title>. <source><italic>New England Journal of Medicine</italic></source>, <volume>384</volume>(<issue>5</issue>): <fpage>403</fpage>–<lpage>416</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1056/NEJMoa2035389" xlink:type="simple">https://doi.org/10.1056/NEJMoa2035389</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Chen</surname> <given-names>DG</given-names></string-name>, <string-name><surname>Chung</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Beyene</surname> <given-names>KM</given-names></string-name> (<year>2024</year>). <article-title>Estimate time-to-infection (TTI) vaccination effect when TTI for unvaccinated group is unknown</article-title>. <source><italic>Statistics in Biosciences</italic></source>, <volume>16</volume>(<issue>3</issue>): <fpage>723</fpage>–<lpage>741</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s12561-024-09417-w" xlink:type="simple">https://doi.org/10.1007/s12561-024-09417-w</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_004">
<mixed-citation publication-type="other"> <string-name><surname>Chung</surname> <given-names>Y</given-names></string-name> (<year>2025</year>). coxphm: Time-to-Event Data Analysis with Missing Survival Times. R package version 0.2.1.</mixed-citation>
</ref>
<ref id="j_jds1199_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Cox</surname> <given-names>DR</given-names></string-name> (<year>1972</year>). <article-title>Regression models and life-tables (with discussion)</article-title>. <source><italic>Journal of the Royal Statistical Society. Series B</italic></source>, <volume>34</volume>(<issue>2</issue>): <fpage>187</fpage>–<lpage>220</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.2517-6161.1972.tb00899.x" xlink:type="simple">https://doi.org/10.1111/j.2517-6161.1972.tb00899.x</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Efron</surname> <given-names>B</given-names></string-name> (<year>1988</year>). <article-title>Logistic regression, survival analysis, and the Kaplan-Meier curve</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>83</volume>(<issue>402</issue>): <fpage>414</fpage>–<lpage>425</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/01621459.1988.10478612" xlink:type="simple">https://doi.org/10.1080/01621459.1988.10478612</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_007">
<mixed-citation publication-type="book"> <string-name><surname>Fleming</surname> <given-names>TR</given-names></string-name>, <string-name><surname>Harrington</surname> <given-names>D</given-names></string-name> (<year>2013</year>). <source><italic>Counting Processes and Survival Analysis</italic></source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1199_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Havers</surname> <given-names>FP</given-names></string-name>, <string-name><surname>Reed</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lim</surname> <given-names>T</given-names></string-name>, <string-name><surname>Montgomery</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Klena</surname> <given-names>JD</given-names></string-name>, <string-name><surname>Hall</surname> <given-names>AJ</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020</article-title>. <source><italic>JAMA Internal Medicine</italic></source>, <volume>180</volume>(<issue>12</issue>): <fpage>1576</fpage>–<lpage>1586</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1001/jamainternmed.2020.4130" xlink:type="simple">https://doi.org/10.1001/jamainternmed.2020.4130</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Hou</surname> <given-names>CW</given-names></string-name>, <string-name><surname>Williams</surname> <given-names>S</given-names></string-name>, <string-name><surname>Taylor</surname> <given-names>K</given-names></string-name>, <string-name><surname>Boyle</surname> <given-names>V</given-names></string-name>, <string-name><surname>Bobbett</surname> <given-names>B</given-names></string-name>, <string-name><surname>Kouvetakis</surname> <given-names>J</given-names></string-name>, <etal>et al.</etal> (<year>2023</year>). <article-title>Serological survey to estimate SARS-CoV-2 infection and antibody seroprevalence at a large public university: a cross-sectional study</article-title>. <source><italic>BMJ Open</italic></source>, <volume>13</volume>(<issue>8</issue>): <fpage>e072627</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1136/bmjopen-2023-072627" xlink:type="simple">https://doi.org/10.1136/bmjopen-2023-072627</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Lombardi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mangioni</surname> <given-names>D</given-names></string-name>, <string-name><surname>Consonni</surname> <given-names>D</given-names></string-name>, <string-name><surname>Cariani</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bono</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cantù</surname> <given-names>AP</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <article-title>Seroprevalence of anti-SARS-CoV-2 IgG among healthcare workers of a large university hospital in Milan, Lombardy, Italy: a cross-sectional study</article-title>. <source><italic>BMJ Open</italic></source>, <volume>11</volume>(<issue>2</issue>): <fpage>e047216</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1136/bmjopen-2020-047216" xlink:type="simple">https://doi.org/10.1136/bmjopen-2020-047216</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Mercado-Reyes</surname> <given-names>M</given-names></string-name>, <string-name><surname>Malagón-Rojas</surname> <given-names>J</given-names></string-name>, <string-name><surname>Rodríguez-Barraquer</surname> <given-names>I</given-names></string-name>, <string-name><surname>Zapata-Bedoya</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wiesner</surname> <given-names>M</given-names></string-name>, <string-name><surname>Cucunubá</surname> <given-names>Z</given-names></string-name>, <etal>et al.</etal> (<year>2022</year>). <article-title>Seroprevalence of anti-SARS-CoV-2 antibodies in Colombia, 2020: a population-based study</article-title>. <source><italic>The Lancet Regional Health–Americas</italic></source>, <volume>9</volume>: <elocation-id>100195</elocation-id>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.lana.2022.100195" xlink:type="simple">https://doi.org/10.1016/j.lana.2022.100195</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Moreira-Soto</surname> <given-names>A</given-names></string-name>, <string-name><surname>Pachamora Diaz</surname> <given-names>JM</given-names></string-name>, <string-name><surname>González-Auza</surname> <given-names>L</given-names></string-name>, <string-name><surname>Merino Merino</surname> <given-names>XJ</given-names></string-name>, <string-name><surname>Schwalb</surname> <given-names>A</given-names></string-name>, <string-name><surname>Drosten</surname> <given-names>C</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <article-title>High SARS-CoV-2 seroprevalence in rural Peru, 2021: a cross-sectional population-based study</article-title>. <source><italic>Msphere</italic></source>, <volume>6</volume>(<issue>6</issue>): <elocation-id>e00685-21</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1199_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Nah</surname> <given-names>EH</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>S</given-names></string-name>, <string-name><surname>Park</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hwang</surname> <given-names>I</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>HI</given-names></string-name> (<year>2021</year>). <article-title>Nationwide seroprevalence of antibodies to SARS-CoV-2 in asymptomatic population in South Korea: a cross-sectional study</article-title>. <source><italic>BMJ Open</italic></source>, <volume>11</volume>(<issue>4</issue>): <elocation-id>e049837</elocation-id>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1136/bmjopen-2021-049837" xlink:type="simple">https://doi.org/10.1136/bmjopen-2021-049837</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Polack</surname> <given-names>FP</given-names></string-name>, <string-name><surname>Thomas</surname> <given-names>SJ</given-names></string-name>, <string-name><surname>Kitchin</surname> <given-names>N</given-names></string-name>, <string-name><surname>Absalon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gurtman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lockhart</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Safety and efficacy of the BNT162b2 mRNA COVID-19 vaccine</article-title>. <source><italic>New England Journal of Medicine</italic></source>, <volume>383</volume>(<issue>27</issue>): <fpage>2603</fpage>–<lpage>2615</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1056/NEJMoa2034577" xlink:type="simple">https://doi.org/10.1056/NEJMoa2034577</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_015">
<mixed-citation publication-type="book"> <collab>R Core Team</collab> (<year>2025</year>). <source><italic>R: A Language and Environment for Statistical Computing</italic></source>. <publisher-name>R Foundation for Statistical Computing</publisher-name>, <publisher-loc>Vienna, Austria</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1199_ref_016">
<mixed-citation publication-type="book"> <string-name><surname>Rosenbaum</surname> <given-names>PR</given-names></string-name>, <string-name><surname>Rosenbaum</surname> <given-names>P</given-names></string-name>, <string-name><surname>Briskman</surname></string-name> (<year>2010</year>). <source><italic>Design of Observational Studies</italic></source>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1199_ref_017">
<mixed-citation publication-type="other"> <string-name><surname>Therneau</surname> <given-names>TM</given-names></string-name> (<year>2024</year>). survival: Survival Analysis. R package version 3.8-3.</mixed-citation>
</ref>
<ref id="j_jds1199_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Venugopal</surname> <given-names>U</given-names></string-name>, <string-name><surname>Jilani</surname> <given-names>N</given-names></string-name>, <string-name><surname>Rabah</surname> <given-names>S</given-names></string-name>, <string-name><surname>Shariff</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Jawed</surname> <given-names>M</given-names></string-name>, <string-name><surname>Batres</surname> <given-names>AM</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <article-title>SARS-CoV-2 seroprevalence among health care workers in a New York City hospital: a cross-sectional analysis during the COVID-19 pandemic</article-title>. <source><italic>International Journal of Infectious Diseases</italic></source>, <volume>102</volume>: <fpage>63</fpage>–<lpage>69</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.ijid.2020.10.036" xlink:type="simple">https://doi.org/10.1016/j.ijid.2020.10.036</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Vusirikala</surname> <given-names>A</given-names></string-name>, <string-name><surname>Whitaker</surname> <given-names>H</given-names></string-name>, <string-name><surname>Jones</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tessier</surname> <given-names>E</given-names></string-name>, <string-name><surname>Borrow</surname> <given-names>R</given-names></string-name>, <string-name><surname>Linley</surname> <given-names>E</given-names></string-name>, <etal>et al.</etal> (<year>2021</year>). <article-title>Seroprevalence of SARS-CoV-2 antibodies in university students: cross-sectional study, December 2020, England</article-title>. <source><italic>Journal of Infection</italic></source>, <volume>83</volume>(<issue>1</issue>): <fpage>104</fpage>–<lpage>111</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jinf.2021.04.028" xlink:type="simple">https://doi.org/10.1016/j.jinf.2021.04.028</ext-link></mixed-citation>
</ref>
<ref id="j_jds1199_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Xiong</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Braun</surname> <given-names>WJ</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>XJ</given-names></string-name> (<year>2021</year>). <article-title>Estimating duration distribution aided by auxiliary longitudinal measures in presence of missing time origin</article-title>. <source><italic>Lifetime Data Analysis</italic></source>, <volume>27</volume>: <fpage>388</fpage>–<lpage>412</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10985-021-09520-w" xlink:type="simple">https://doi.org/10.1007/s10985-021-09520-w</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
