VAROC: Value Added Receiver Operating Characteristics Curve
Pub. online: 30 January 2026
Type: Statistical Data Science
Open Access
Received
6 January 2025
6 January 2025
Accepted
15 January 2026
15 January 2026
Published
30 January 2026
30 January 2026
Abstract
The receiver operating characteristics (ROC) curve has been widely used to evaluate the discrimination performance of biomarkers, but it has been criticized for overlooking their underlying distributions. In this paper, we propose a continuous version of the ROC curve that can assess not only the discrimination performance of biomarkers but also their continuity performance. Our method summarizes the biomarker values as conditional tail expectations at varying thresholds and compare them with true positive and false positive rates. The proposed method is particularly useful for an early phase of biomarker study that enrolls heterogeneous disease populations. Analysis of data from an ovarian cancer biomarker study illustrates the practical utility of the proposed method over the standard ROC curve analysis. The proposed methods are implemented in the R package varoc.
Supplementary material
Supplementary MaterialThe R code and data used in Section 5 are available at the Supplementary Material of this paper.
References
Bangdiwala SI, Haedo AS, Natal ML, Villaveces A (2008). The agreement chart as an alternative to the receiver-operating characteristic curve for diagnostic tests. Journal of Clinical Epidemiology, 61(9): 866–874. https://doi.org/10.1016/j.jclinepi.2008.04.002
Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 57(1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Bickel PJ (1965). On some robust estimates of location. The Annals of Mathematical Statistics, 36(3): 847–858. https://doi.org/10.1214/aoms/1177700058
Chen LA, Chen DT, Chan W (2010). The distribution-based p-value for the outlier sum in differential gene expression analysis. Biometrika, 97(1): 246–253. https://doi.org/10.1093/biomet/asp075
Efron B (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1): 1–26. https://doi.org/10.1214/aos/1176344552
Feng Z, Pepe MS (2020). Adding rigor to biomarker evaluations—EDRN experience. Cancer Epidemiology, Biomarkers & Prevention, 29(12): 2575–2582. https://doi.org/10.1158/1055-9965.EPI-20-0240
Gönen M (2013). Mixtures of receiver operating characteristic curves. Academic Radiology, 20(7): 831–837. https://doi.org/10.1016/j.acra.2013.03.003
Huang Y, Sullivan Pepe M, Feng Z (2007). Evaluating the predictiveness of a continuous marker. Biometrics, 63(4): 1181–1188. https://doi.org/10.1111/j.1541-0420.2007.00814.x
Huber PJ (1972). The 1972 Wald lecture robust statistics: A review. The Annals of Mathematical Statistics, 43(4): 1041–1067. https://doi.org/10.1214/aoms/1177692459
Lee WC (1999). Probabilistic analysis of global performances of diagnostic tests: Interpreting the Lorenz curve-based summary measures. Statistics in Medicine, 18(4): 455–471. https://doi.org/10.1002/(SICI)1097-0258(19990228)18:4<455::AID-SIM44>3.0.CO;2-A
Lee WC, Hsiao CK (1996). Alternative summary indices for the receiver operating characteristic curve. Epidemiology, 7(6): 605–611. https://doi.org/10.1097/00001648-199611000-00007
López-Ratón M, Rodríguez-Álvarez MX, Cadarso-Suárez C, Gude-Sampedro F (2014). Optimalcutpoints: An R package for selecting optimal cutpoints in diagnostic tests. Journal of Statistical Software, 61: 1–36. https://doi.org/10.18637/jss.v061.i08
Lorenz MO (1905). Methods of measuring the concentration of wealth. Publications of the American Statistical Association, 9(70): 209–219. https://doi.org/10.1080/15225437.1905.10503443
Martínez-Camblor P, Corral N, Rey C, Pascual J, Cernuda-Morollón E (2017). Receiver operating characteristic curve generalization for non-monotone relationships. Statistical Methods in Medical Research, 26(1): 113–123. https://doi.org/10.1177/0962280214541095
Martínez-Camblor P, Pardo-Fernández JC (2019). Parametric estimates for the receiver operating characteristic curve generalization for non-monotone relationships. Statistical Methods in Medical Research, 28(7): 2032–2048. https://doi.org/10.1177/0962280217747009
Martínez-Camblor P, Pérez-Fernández S, Díaz-Coto S (2019). Improving the biomarker diagnostic capacity via functional transformations. Journal of Applied Statistics, 46(9): 1550–1566. https://doi.org/10.1080/02664763.2018.1554628
Parmigiani G (2019). The fuzzy ROC. arXiv preprint: https://arxiv.org/abs/1903.01868v1
Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, et al. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute, 93(14): 1054–1061. https://doi.org/10.1093/jnci/93.14.1054
Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, et al. (2008a). Integrating the predictiveness of a marker with its performance as a classifier. American Journal of Epidemiology, 167(3): 362–368. https://doi.org/10.1093/aje/kwm305
Pepe MS, Feng Z, Janes H, Bossuyt PM, Potter JD (2008b). Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: Standards for study design. Journal of the National Cancer Institute, 100(20): 1432–1438. https://doi.org/10.1093/jnci/djn326
Pepe MS, Longton G, Anderson GL, Schummer M (2003). Selecting differentially expressed genes from microarray experiments. Biometrics, 59(1): 133–142. https://doi.org/10.1111/1541-0420.00016
Schummer M, Ng WV, Bumgarner RE, Nelson PS, Schummer B, Bednarski DW, et al. (1999). Comparative hybridization of an array of 21500 ovarian cdnas for the discovery of genes overexpressed in ovarian carcinomas. Gene, 238(2): 375–385. https://doi.org/10.1016/S0378-1119(99)00342-X
Stigler SM (1973). The asymptotic distribution of the trimmed mean. The Annals of Statistics, 1(3): 472–477. https://doi.org/10.1214/aos/1176342412
Tibshirani R, Hastie T (2007). Outlier sums for differential gene expression analysis. Biostatistics, 8(1): 2–8. https://doi.org/10.1093/biostatistics/kxl005
Vickers AJ, Elkin EB (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26(6): 565–574. https://doi.org/10.1177/0272989X06295361
Wu B (2007). Cancer outlier differential gene expression detection. Biostatistics, 8(3): 566–575. https://doi.org/10.1093/biostatistics/kxl029
Yang J, Kuan PF, Li X, Li J, Zhou XH (2024). Transformed ROC curve for biomarker evaluation. Statistics in Medicine, 43(30): 5681–5697. https://doi.org/10.1002/sim.10268
Youden WJ (1950). Index for rating diagnostic tests. Cancer, 3(1): 32–35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3
Yuen KK (1974). The two-sample trimmed t for unequal population variances. Biometrika, 61(1): 165–170. https://doi.org/10.1093/biomet/61.1.165
Yuen KK, Dixon W (1973). The approximate behaviour and performance of the two-sample trimmed t. Biometrika, 60(2): 369–374. https://doi.org/10.1093/biomet/60.2.369