Comparing Estimators of Discriminative Performance of Time-to-Event Models

Jin, Ying; Leroux, Andrew

doi:10.6339/25-JDS1163

Journal of Data Science

Comparing Estimators of Discriminative Performance of Time-to-Event Models

Volume 23, Issue 3 (2025): Special Issue: 2024 WNAR/IMS/Graybill Annual Meeting, pp. 470–490

Ying Jin

Andrew Leroux

https://doi.org/10.6339/25-JDS1163

Pub. online: 18 February 2025 Type: Statistical Data Science

Open Access

Received
5 June 2024

Accepted
1 January 2025

Published
18 February 2025

Abstract

Predicting the timing and occurrence of events is a major focus of data science applications, especially in the context of biomedical research. Performance for models estimating these outcomes, often referred to as time-to-event or survival outcomes, is frequently summarized using measures of discrimination, in particular time-dependent AUC and concordance. Many estimators for these quantities have been proposed which can be broadly categorized as either semi-parametric estimators or non-parametric estimators. In this paper, we review the mathematical construction of the two classes of estimators and compare their behavior. Importantly, we identify a previously unknown feature of the class of semi-parametric estimators that can result in vastly overoptimistic out-of-sample estimation of discriminative performance in common applied tasks. Although these semi-parametric estimators are popular in practice, the phenomenon we identify here suggests that this class of estimators may be inappropriate for use in model assessment and selection based on out-of-sample evaluation criteria. This is due to the semi-parametric estimators’ bias in favor of models that are overfit when using out-of-sample prediction criteria (e.g. cross-validation). Non-parametric estimators, which do not exhibit this behavior, are highly variable for local discrimination. We propose to address the high variability problem through penalized regression splines smoothing. The behavior of various estimators of time-dependent AUC and concordance are illustrated via a simulation study using two different mechanisms that produce overoptimistic out-of-sample estimates using semi-parametric estimators. Estimators are further compared using a case study using data from the National Health and Nutrition Examination Survey (NHANES) 2011–2014.

Supplementary material

Supplementary Material

The supplementary material includes additional information that is relevant but not included in the manuscript, including figures, mathematical derivation and data file used for the data application section. It also includes a zipped file containing code scripts to reproduce the results presented above. Here is a brief summary of is content: • outlier_exp.R: to generate data and produce Figure 1 in the Introduction. • Simulation: code scripts used to implement the simulation study. – Sim_overfit.R: for the first scenario of model overfit in Section 3.2.1. – Sim_contamination.R: for the second scenario of covariate misalignment in Section 3.2.2. – helpers.R: functions to calculate discussed estimators. – trueAUC.R: calculate the true values of incident/dynamic AUC. – SimFigs.R: produce Figures 2 and 3. • DataAppl: scripts to reproduce the data application section. – data_appl.R: scripts to reproduce the data application results. – helpers_appl.R: functions to calculate discussed estimators. – DataApplFigs.R: produce Figure 4. • SuppFigs.R: to produce figures included in the supplement.

References

Abd ElHafeez S, D’Arrigo G, Leonardis D, Fusaro M, Tripepi G, Roumeliotis S (2021). Methods to analyze time-to-event data: The Cox regression analysis. Oxidative Medicine and Cellular Longevity.

Arlot S, Celisse A (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4: 40–79. https://doi.org/10.1214/09-SS054

Blanche P, Latouche A, Viallon V (2013). Time-dependent auc with right-censored data: A survey. In: Risk Assessment and Evaluation of Predictions (MLT Lee, M Gail, R Pfeiffer, G Satten, T Cai, A Gandy, eds.), 239–251. Springer New York, New York, NY.

Burman P (1989). A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3): 503–514. https://doi.org/10.1093/biomet/76.3.503

Cornec-Le Gall E, Audrézet MP, Rousseau A, Hourmant M, Renaudineau E, Charasse C, et al. (2016). The propkd score: A new algorithm to predict renal survival in autosomal dominant polycystic kidney disease. Journal of the American Society of Nephrology, 27(3): 942–951. https://doi.org/10.1681/ASN.2015010016

Cox D (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B, Methodological, 34(2): 187–220. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x

Crainiceanu C, Goldsmith J, Leroux A, Cui E (2024). Functional Data Analysis with R, 1st ed. Chapman and Hall/CRC.

Cui E, Crainiceanu C, Leroux A (2021). Additive functional Cox model. Journal of Computational and Graphical Statistics, 30(3): 780–793. https://doi.org/10.1080/10618600.2020.1853550

Gonen M, Heller G (2005). Concordance probability and discriminatory power in proportional hazards regression. Biometrika, 92(4): 965–970. https://doi.org/10.1093/biomet/92.4.965

Harrell FE, Lee KL, Mark DB (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15(4): 361–387. https://doi.org/10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4

Heagerty PJ, Zheng Y (2005). Survival model predictive accuracy and roc curves. Biometrics, 61(1): 92–105. https://doi.org/10.1111/j.0006-341X.2005.030814.x

Leroux A, Di J, Smirnova E, McGuffey EJ, Cao Q, Bayatmokhtari E, et al. (2019). Organizing and analyzing the activity data in NHANES. Statistics in Biosciences, 11(2): 262–287. https://doi.org/10.1007/s12561-018-09229-9

Leroux A, Xu S, Kundu P, Muschelli J, Smirnova E, Chatterjee N, et al. (2021). Quantifying the predictive performance of objectively measured physical activity on mortality in the UK Biobank. The Journals of Gerontology. Series A, Biological Sciences and Medical Sciences, 76(8): 1486–1494. https://doi.org/10.1093/gerona/glaa250

Mortensen RN, Gerds TA, Jeppesen JL, Torp-Pedersen C (2017). Office blood pressure or ambulatory blood pressure for the prediction of cardiovascular events. European Heart Journal, 38(44): 3296–3304. https://doi.org/10.1093/eurheartj/ehx464

Pya N (2021). scam: Shape Constrained Additive Models. R package version 1.2-12.

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Ramlau-Hansen H (1983). Smoothing counting process intensities by means of kernel functions. The Annals of Statistics, 11(2): 453–466.

Ramsay JO, Silverman BW (2005). Functional Data Analysis. Springer New York, NY.

Schmid M, Potapov S (2012). A comparison of estimators to evaluate the discriminatory power of time-to-event models. Statistics in Medicine, 31(23): 2588–2609. https://doi.org/10.1002/sim.5464

Shen W, Ning J, Yuan Y (2015). A direct method to evaluate the time-dependent predictive accuracy for biomarkers. Biometrics, 71(2): 439–449. https://doi.org/10.1111/biom.12293

Smirnova E, Leroux A, Cao Q, Tabacu L, Zipunnikov V, Crainiceanu C, et al. (2020). The predictive performance of objective measures of physical activity derived from accelerometry data for 5-year all-cause mortality in older adults: National health and nutritional examination survey 2003–2006. The Journals of Gerontology. Series A, Biological Sciences and Medical Sciences, 75(9): 1779–1785. https://doi.org/10.1093/gerona/glz193

Song X, Zhou XH (2008). A semiparametric approach for the covariate specific roc curve with survival outcome. Statistica Sinica, 18(3): 947–965.

Song X, Zhou XH, Ma S (2012). Nonparametric receiver operating characteristic-based evaluation for survival outcomes. Statistics in Medicine, 31(23): 2660–2675. https://doi.org/10.1002/sim.5386

Stephenson AJ, Scardino PT, Eastham JA, Bianco FJ, Dotan ZA, DiBlasio CJ, et al. (2005). Postoperative nomogram predicting the 10-year probability of prostate cancer recurrence after radical prostatectomy. Journal of Clinical Oncology, 23(28): 7005–7012. https://doi.org/10.1200/JCO.2005.01.867

Uno H, Cai T, Pencinac MJ, D’Agostinod RB, Weib LJ (2011). On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine, 30(10): 1105–1117. https://doi.org/10.1002/sim.4154

van Geloven N, He Y, Zwinderman A, Putter H (2021). Estimation of incident dynamic auc in practice. Computational Statistics & Data Analysis, 154: 107095. https://doi.org/10.1016/j.csda.2020.107095

Wang JL (2014). Smoothing Hazard Rates. John Wiley & Sons, Ltd.

Wood S (2003). Thin-plate regression splines. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 65(1): 95–114. https://doi.org/10.1111/1467-9868.00374

Wood S (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99(467): 673–686. https://doi.org/10.1198/016214504000000980

Wood S (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 73(1): 3–36. https://doi.org/10.1111/j.1467-9868.2010.00749.x

Wood S (2017). Generalized Additive Models: An Introduction with R, 2 edition. Chapman and Hall/CRC.

Xu R, O’Quigley J (2000). Proportional hazards estimate of the conditional survival function. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 62(4): 667–680. https://doi.org/10.1111/1467-9868.00256

Yates LA, Aandahl Z, Richards SA, Brook BW (2023). Cross validation for model selection: A review with examples from ecology. Ecological Monographs, 93(1): e1557. https://doi.org/10.1002/ecm.1557

2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

C-index concordance proportional hazard model survival prediction time-dependent AUC

Metrics

since February 2021

269

Article info
views

108

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file