Predicting the timing and occurrence of events is a major focus of data science applications, especially in the context of biomedical research. Performance for models estimating these outcomes, often referred to as time-to-event or survival outcomes, is frequently summarized using measures of discrimination, in particular time-dependent AUC and concordance. Many estimators for these quantities have been proposed which can be broadly categorized as either semi-parametric estimators or non-parametric estimators. In this paper, we review the mathematical construction of the two classes of estimators and compare their behavior. Importantly, we identify a previously unknown feature of the class of semi-parametric estimators that can result in vastly overoptimistic out-of-sample estimation of discriminative performance in common applied tasks. Although these semi-parametric estimators are popular in practice, the phenomenon we identify here suggests that this class of estimators may be inappropriate for use in model assessment and selection based on out-of-sample evaluation criteria. This is due to the semi-parametric estimators’ bias in favor of models that are overfit when using out-of-sample prediction criteria (e.g. cross-validation). Non-parametric estimators, which do not exhibit this behavior, are highly variable for local discrimination. We propose to address the high variability problem through penalized regression splines smoothing. The behavior of various estimators of time-dependent AUC and concordance are illustrated via a simulation study using two different mechanisms that produce overoptimistic out-of-sample estimates using semi-parametric estimators. Estimators are further compared using a case study using data from the National Health and Nutrition Examination Survey (NHANES) 2011–2014.
Abstract: This paper provides an introduction to multivariate non-parametric hazard model for the occurrence of earthquakes since the hazard function defines the statistical distribution of inter-event times. The method is ap plied to the Turkish seismicity since a significant portion of Turkey is subject to frequent earthquakes and presents several advantages compared to other more traditional approaches. Destructive earthquakes from 1903 to 2009 between the longitudes of (39-42)N◦ and the latitudes of (26-45)E◦ are used. The paper demonstrates how seismicity and tectonics/physics parameters that can potentially influence the spatio-temporal variability of earthquakes and presents several advantages compared to more traditional approaches.