Statistical Learning in Medical Research with Decision Threshold and Accuracy Evaluation

Sande, Sumaiya Z.; Seng, Loraine; Li, Jialiang; D’Agostino, Ralph

doi:10.6339/21-JDS1022

Journal of Data Science

Statistical Learning in Medical Research with Decision Threshold and Accuracy Evaluation

Volume 19, Issue 4 (2021), pp. 634–657

Sumaiya Z. Sande Loraine Seng Jialiang Li All authors (4)

https://doi.org/10.6339/21-JDS1022

Pub. online: 23 September 2021 Type: Data Science Reviews

Received
10 May 2021

Accepted
18 August 2021

Published
23 September 2021

Abstract

Machine learning methods are increasingly applied for medical data analysis to reduce human efforts and improve our understanding of disease propagation. When the data is complicated and unstructured, shallow learning methods may not be suitable or feasible. Deep learning neural networks like multilayer perceptron (MLP) and convolutional neural network (CNN), have been incorporated in medical diagnosis and prognosis for better health care practice. For a binary outcome, these learning methods directly output predicted probabilities for patient’s health condition. Investigators still need to consider appropriate decision threshold to split the predicted probabilities into positive and negative regions. We review methods to select the cut-off values, including the relatively automatic methods based on optimization of the ROC curve criteria and also the utility-based methods with a net benefit curve. In particular, decision curve analysis (DCA) is now acknowledged in medical studies as a good complement to the ROC analysis for the purpose of decision making. In this paper, we provide the R code to illustrate how to perform the statistical learning methods, select decision threshold to yield the binary prediction and evaluate the accuracy of the resulting classification. This article will help medical decision makers to understand different classification methods and use them in real world scenario.

Supplementary material

Supplementary Materials

Supplementary material online include: The review of different smoothers used in Generalized additive models, Installation details for R interface for Keras and Tensorflow, data and R code needed to reproduce the results.

References

Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M, et al. (2017). A comparison of a machine learning model with EuroSCORE II in predicting mortality after elective cardiac surgery: A decision curve analysis. PLoS ONE, 12(1): e0169772.

Alvarez I, Bernard S, Deffuant G (2007). Keep the decision tree and estimate the class probabilities using its decision boundary. In: IJCAI, 654–659.

Andrychowicz M, Denil M, Colmenarejo SG, Hoffman MW, Pfau D, Schaul T, et al. (2016). Learning to learn by gradient descent by gradient descent. CoRR, arXiv preprint: https://arxiv.org/abs/1606.04474.

Baker SG, Kramer BS (2007). Peirce, Youden, and receiver operating characteristic curves. American Statistician, 61(4): 343–346.

Bengio Y (2012). Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade (G Montavon, G Orr, KR Müller, eds.), 437–478. Springer.

Bottou L (1991). Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91(8): 12.

Bottou L (2010). Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010 (Y Lechevallier, G Saporta, eds.), 177–186. Springer.

Breiman L (1996). Bagging predictors. Machine Learning, 24(2): 123–140.

Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32.

Breiman L (2017). Classification and Regression Trees. Routledge.

Breiman L, Friedman J, Stone C, Olshen R (1984). Classification and Regression Trees. The Wadsworth and Brooks-Cole Statistics-Probability Series. Taylor & Francis.

Chang CC, Lin CJ (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3): 1–27.

Chen J, Huang H, Tian S, Qu Y (2009). Feature selection for text classification with naïve Bayes. Expert Systems with Applications, 36(3): 5432–5435.

Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. (2021). xgboost: Extreme Gradient Boosting. R package version 1.3.2.1.

Chollet F, Allaire J, et al. (2017). R interface to keras, https://github.com/rstudio/keras.

Cortes C, Vapnik V (1995). Support-vector networks. Machine Learning, 20(3): 273–297.

Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, et al. (2012). Large scale distributed deep networks. In: Advances in Neural Information Processing Systems (F Pereira, CJC Burges, L Bottou, KQ Weinberger, eds.), volume 25. Curran Associates, Inc.

Duchi J, Hazan E, Singer Y (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7): 2121–2159.

Erkanli A, Sung M, Costello E, Angold A (2006). Bayesian semi-parametric ROC analysis. Statistics in Medicine, 25: 3905–3928.

Fitzgerald M, Saville BR, Lewis RJ (2015). Decision curve analysis. JAMA, 313(4): 409–410.

Friedman J, Hastie T, Tibshirani R (2001). The Elements of Statistical Learning, volume 1. Springer Series in Statistics. Springer, New York.

Friedman J, Hastie T, Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1.

Friedman JH (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5): 1189–1232.

Gardner MW, Dorling S (1998). Artificial neural networks (the multilayer perceptron) a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14–15): 2627–2636.

Hastie T (2020a). gam: Generalized Additive Models. R package version 1.20.

Hastie T (2020b). svmpath: The SVM Path Algorithm. R package version 0.970.

Hastie T, Tibshirani R (1986). Generalized additive models. Statistical Science, 1(3): 297–310.

Hastie TJ (2017). Generalized additive models. In: Statistical Models in S (JM Chambers, TJ Hastie, eds.), 249–307. Routledge.

Hester J, Csárdi G, Wickham H, Chang W, Morgan M, Tenenbaum D (2021). remotes: R Package Installation from Remote Repositories, Including ‘GitHub’. R package version 2.3.0.

Hothorn T, Hornik K, Zeileis A (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3): 651–674.

Hsu CW, Lin CJ (2002). A simple decomposition method for support vector machines. Machine Learning, 46(1): 291–314.

Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). kernlab – An S4 package for kernel methods in R. Journal of Statistical Software, 11(9): 1–20.

Kass GV (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society. Series C. Applied Statistics, 29(2): 119–127.

Keller JM, Gray MR, Givens JA (1985). A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics, SMC-15(4): 580–585.

Kerr KF, Brown MD, Zhu K, Janes H (2016). Assessing the clinical impact of risk prediction models with decision curves: Guidance for correct interpretation and appropriate use. Journal of Clinical Oncology, 34(21): 2534.

Khirirat S, Feyzmahdavian HR, Johansson M (2017). Mini-batch gradient descent: Faster convergence under data sparsity. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 2880–2887.

Kingma DP, Ba J (2014). Adam: A method for stochastic optimization. CoRR, arXiv preprint: https://arxiv.org/abs/1412.6980.

Krzanowski WJ, Hand DJ (2009). ROC Curves for Continuous Data. Chapman and Hall/CRC.

Kuhn M (2020). caret: Classification and Regression Training. R package version 6.0-86.

Li J, Fine JP (2010). Weighted area under the receiver operating characteristic curve and its application to gene selection. Journal of the Royal Statistical Society. Series C. Applied Statistics, 59: 673–692.

Li J, Gao M, D’Agostino R (2019). Evaluating classification accuracy for modern learning approaches. Statistics in Medicine, 38: 2477–2503.

Li J, Zhou XH (2009). Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. Journal of Statistical Planning and Inference, 139: 4133–4142.

Liaw A, Wiener M (2002). Classification and regression by randomforest. R News, 2(3): 18–22.

Majka M (2019). naivebayes: High Performance Implementation of the Naive Bayes Algorithm in R. R package version 0.9.7.

Mehta M, Rissanen J, Agrawal R (1995). Mdl-based decision tree pruning. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD’95 (U Fayyad, R Uthurusamy, eds.), 216–221. AAAI Press.

Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2019). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-2.

Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2021). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-6.

Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999). Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468) (YH Hu, ed.), 41–48.

Molinaro AM, Lostritto K, van der Laan M (2010). partdsa: deletion/substitution/addition algorithm for partitioning the covariate space in prediction. Bioinformatics, 26(10): 1357–1363.

Nakas CT, Alonzo TA, Yiannoutsos CT (2010). Accuracy and cut-off point selection in three-class classification problems using a generalization of the Youden index. Statistics in Medicine, 29: 2946–2955.

Nakas CT, Dalrymple-Alford JC, Anderson TJ, Alonzo TA (2012). Generalization of Youden index for multiple-class classification problems applied to the assessment of externally validated cognition in Parkinson disease screening. Statistics in Medicine, 95: 995–1003.

Nelder JA, Wedderburn RW (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A. General, 135(3): 370–384.

Niblett T, Bratko I (1987). Learning decision rules in noisy domains. In: Proceedings of Expert Systems ’86, The 6th Annual Technical Conference on Research and Development in Expert Systems III (MA Bramer, ed.), 25–34. Cambridge University Press, USA.

O’Malley A, Zou K (2006). Bayesian multivariate hierarchical transformation models for ROC analysis. Statistics in Medicine, 25: 459–479.

Pau G, Fuchs F, Sklyar O, Boutros M, Huber W (2010). Ebimage—an R package for image processing with applications to cellular phenotypes. Bioinformatics, 26(7): 979–981.

Pepe MS, et al. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Medicine.

Perkins Neil J, Schisterman Enrique F (2006). The inconsistency of “optimal” cut-points using two roc based criteria. American Journal of Epidemiology, 163: 670–675.

Quinlan JR (1993). C4.5: Programming for Machine Learning. Morgan Kauffmann, 38: 48.

Reddi SJ, Kale S, Kumar S (2019). On the convergence of adam and beyond. CoRR, arXiv preprint: https://arxiv.org/abs/1904.09237.

Roberts DW (2020). optpart: Optimal Partitioning of Similarity Relations. R package version 3.0-3.

Rousson V, Zumbrunn T (2011). Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case-control studies. BMC Medical Informatics and Decision Making, 11(1): 1–9.

Ruder S (2016). An overview of gradient descent optimization algorithms. CoRR, arXiv preprint: https://arxiv.org/abs/1609.04747.

Safavian SR, Landgrebe D (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics, 21(3): 660–674.

Salzberg SL (1994). C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Machine Learning, 16(3): 235–240.

Sanchez IE (2016). Optimal threshold estimation for binary classifiers using game theory. F1000Research, 5.

Sande SZ, Li J, D’Agostino R, Yin Wong T, Cheng CY (2020). Statistical inference for decision curve analysis, with applications to cataract diagnosis. Statistics in Medicine, 39(22): 2980–3002.

Schalkoff RJ (1997). Artificial Neural Networks. McGraw-Hill Higher Education.

Schapire RE, Freund Y, Bartlett P, Lee WS, et al. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5): 1651–1686.

Schaul T, Zhang S, LeCun Y (2013). No more pesky learning rates. Proceedings of Machine Learning Research 28(3): 343–351.

Sing T, Sander O, Beerenwinkel N, Lengauer T (2005). Rocr: visualizing classifier performance in R. Bioinformatics, 21(20): 7881.

Sjoberg DD (2021). dca: Decision Curve Analysis. R package version 0.1.0.9000.

Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, volume 261. American Medical Informatics Association.

Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. (2010). Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology, 21(1): 128.

Talluri R, Shete S (2016). Using the weighted area under the net benefit curve for decision curve analysis. BMC Medical Informatics and Decision Making, 16(1): 94.

Therneau T, Atkinson B (2019). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-15.

Van Calster B, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW (2013). Evaluation of markers and risk prediction models: Overview of relationships between NRI and decision-analytic measures. Medical Decision Making, 33(4): 490–501.

Van Calster B, Wynants L, Verbeek JF, Verbakel JY, Christodoulou E, Vickers AJ, et al. (2018). Reporting and interpreting decision curve analysis: A guide for investigators. European Urology, 74(6): 796–804.

Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Springer, New York, fourth edition. 0-387-95457-0.

Vickers AJ, Cronin AM, Gönen M (2012). A simple decision analytic solution to the comparison of two binary diagnostic tests. Statistics in Medicine, 32(11): 1865–1876.

Vickers AJ, Elkin EB (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26(6): 565–574. PMID: 17099194.

Vickers AJ, van Calster B, Steyerberg EW (2019). A simple, step-by-step guide to interpreting decision curve analysis. Diagnostic and Prognostic Research, 3(1): 1–8.

Weihs C, Ligges U, Luebke K, Raabe N (2005). klaR analyzing German business cycles. In: Data Analysis and Decision Support (D Baier, R Decker, L Schmidt-Thieme, eds.), 335–343. Springer Berlin Heidelberg, Berlin, Heidelberg.

Wood SN (2003). Thin-plate regression splines. Journal of the Royal Statistical Society, Series B, 65(1): 95–114.

Youden WJ (1950). Index for rating diagnostic tests. Cancer, 3(1): 32–35.

Yu T, Li J, Ma S (2017). Accounting for clinical covariates and interactions in ranking genomic markers using ROC. Communications in Statistics. Simulation and Computation, 46(5): 3735–3755.

Zeiler MD (2012). ADADELTA: An adaptive learning rate method. CoRR, arXiv preprint: https://arxiv.org/abs/1212.5701.

Zhang Z, Rousson V, Lee WC, Ferdynus C, Chen M, Qian X, et al. (2018). Decision curve analysis: a technical note. Annals of Translational Medicine, 6(15): 308.

Zhou XH, McClish DK, Obuchowski NA (2009). Statistical Methods in Diagnostic Medicine, volume 569. John Wiley & Sons.

Zou K, Liu A, Bandos A, Ohno-Machado L, Rockette H (2012). Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis. CRC Press.

This is a free to read article.

Keywords

deep learning machine learning net benefit ROC threshold

Funding

The work was partly supported by Academic Research Funds R-155-000-205-114, R-155-000-195-114 and Tier 2 MOE funds in Singapore MOE2017-T2-2-082: R-155-000-197-112 (Direct cost) and R-155-000-197-113 (IRC).

Metrics

since February 2021

5518

Article info
views

1304

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file