Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Pseudo Partial Likelihood Method for Pro ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Pseudo Partial Likelihood Method for Proportional Hazards Models when Time Origin Is Missing for Control Group with Applications to SARS-CoV-2 Seroprevalence Study
Yunro Chung ORCID icon link to view author Yunro Chung details   Vel Murugan   Kassu Mehari Beyene     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1199
Pub. online: 7 October 2025      Type: Statistical Data Science      Open accessOpen Access

Received
7 December 2024
Accepted
18 September 2025
Published
7 October 2025

Abstract

Time-to-event data analysis without a well-defined time origin commonly occurs in observational studies that retrospectively collect survival endpoints. For instance, after enrolling participants who have or have not received a specific treatment, an event status can be observed for all participants; however, the start date of treatment is only observable for the treatment group. The corresponding time origin does not exist for the control group, resulting in missing survival time data. Complete-case analysis is often considered the standard approach, but it disregards information from all participants in the control group and does not allow us to compare their survival distributions. To address this challenge, we propose a novel semiparametric proportional hazards model by regarding these missing time origins as nuisance parameters. We approximate the risk sets as cumulative normal distributions to deal with these nuisance parameters and develop estimation and inference procedures for our proposed estimator. We study the asymptotic properties of this model and conduct the simulation studies to validate its finite sample property. Analysis of data from a recent SARS-CoV-2 seroprevaluence study illustrates the applicability of our methods. The proposed methods are implemented in the R package coxphm.

Supplementary material

 Supplementary Material
Sections A and B of the Supplementary Material provide the proofs of Theorems 1–2 and additional simulation results, respectively. The SARS-CoV-2 serological prevalence data and corresponding R code used for analysis are also included in the Supplementary Material. The coxphm package (Chung, 2025), which implements the methods developed in this article, is publicly available on CRAN.

References

 
Anand S, Montez-Rath M, Han J, Bozeman J, Kerschmann R, Beyer P, et al. (2020). Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on dialysis in the USA: a cross-sectional study. The Lancet, 396(10259): 1335–1344. https://doi.org/10.1016/S0140-6736(20)32009-2
 
Baden LR, El Sahly HM, Essink B, Kotloff K, Frey S, Novak R, et al. (2021). Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. New England Journal of Medicine, 384(5): 403–416. https://doi.org/10.1056/NEJMoa2035389
 
Chen DG, Chung Y, Beyene KM (2024). Estimate time-to-infection (TTI) vaccination effect when TTI for unvaccinated group is unknown. Statistics in Biosciences, 16(3): 723–741. https://doi.org/10.1007/s12561-024-09417-w
 
Chung Y (2025). coxphm: Time-to-Event Data Analysis with Missing Survival Times. R package version 0.2.1.
 
Cox DR (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society. Series B, 34(2): 187–220. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x
 
Efron B (1988). Logistic regression, survival analysis, and the Kaplan-Meier curve. Journal of the American Statistical Association, 83(402): 414–425. https://doi.org/10.1080/01621459.1988.10478612
 
Fleming TR, Harrington D (2013). Counting Processes and Survival Analysis. John Wiley & Sons.
 
Havers FP, Reed C, Lim T, Montgomery JM, Klena JD, Hall AJ, et al. (2020). Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Internal Medicine, 180(12): 1576–1586. https://doi.org/10.1001/jamainternmed.2020.4130
 
Hou CW, Williams S, Taylor K, Boyle V, Bobbett B, Kouvetakis J, et al. (2023). Serological survey to estimate SARS-CoV-2 infection and antibody seroprevalence at a large public university: a cross-sectional study. BMJ Open, 13(8): e072627. https://doi.org/10.1136/bmjopen-2023-072627
 
Lombardi A, Mangioni D, Consonni D, Cariani L, Bono P, Cantù AP, et al. (2021). Seroprevalence of anti-SARS-CoV-2 IgG among healthcare workers of a large university hospital in Milan, Lombardy, Italy: a cross-sectional study. BMJ Open, 11(2): e047216. https://doi.org/10.1136/bmjopen-2020-047216
 
Mercado-Reyes M, Malagón-Rojas J, Rodríguez-Barraquer I, Zapata-Bedoya S, Wiesner M, Cucunubá Z, et al. (2022). Seroprevalence of anti-SARS-CoV-2 antibodies in Colombia, 2020: a population-based study. The Lancet Regional Health–Americas, 9: 100195. https://doi.org/10.1016/j.lana.2022.100195
 
Moreira-Soto A, Pachamora Diaz JM, González-Auza L, Merino Merino XJ, Schwalb A, Drosten C, et al. (2021). High SARS-CoV-2 seroprevalence in rural Peru, 2021: a cross-sectional population-based study. Msphere, 6(6): e00685-21.
 
Nah EH, Cho S, Park H, Hwang I, Cho HI (2021). Nationwide seroprevalence of antibodies to SARS-CoV-2 in asymptomatic population in South Korea: a cross-sectional study. BMJ Open, 11(4): e049837. https://doi.org/10.1136/bmjopen-2021-049837
 
Polack FP, Thomas SJ, Kitchin N, Absalon J, Gurtman A, Lockhart S, et al. (2020). Safety and efficacy of the BNT162b2 mRNA COVID-19 vaccine. New England Journal of Medicine, 383(27): 2603–2615. https://doi.org/10.1056/NEJMoa2034577
 
R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 
Rosenbaum PR, Rosenbaum P, Briskman (2010). Design of Observational Studies. Springer.
 
Therneau TM (2024). survival: Survival Analysis. R package version 3.8-3.
 
Venugopal U, Jilani N, Rabah S, Shariff MA, Jawed M, Batres AM, et al. (2021). SARS-CoV-2 seroprevalence among health care workers in a New York City hospital: a cross-sectional analysis during the COVID-19 pandemic. International Journal of Infectious Diseases, 102: 63–69. https://doi.org/10.1016/j.ijid.2020.10.036
 
Vusirikala A, Whitaker H, Jones S, Tessier E, Borrow R, Linley E, et al. (2021). Seroprevalence of SARS-CoV-2 antibodies in university students: cross-sectional study, December 2020, England. Journal of Infection, 83(1): 104–111. https://doi.org/10.1016/j.jinf.2021.04.028
 
Xiong Y, Braun WJ, Hu XJ (2021). Estimating duration distribution aided by auxiliary longitudinal measures in presence of missing time origin. Lifetime Data Analysis, 27: 388–412. https://doi.org/10.1007/s10985-021-09520-w

Related articles PDF XML
Related articles PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
COVID-19 missing data observational study right censoring semiparametric regression vaccine efficacy

Funding
The work was supported by funding from Arizona State University Knowledge Enterprise.

Metrics
since February 2021
217

Article info
views

65

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy