Active Data Science for Improving Clinical Risk Prediction
Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 177–192
Pub. online: 23 November 2022
Type: Data Science In Action
Open Access
Received
11 July 2022
11 July 2022
Accepted
8 November 2022
8 November 2022
Published
23 November 2022
23 November 2022
Abstract
Clinical risk prediction models are commonly developed in a post-hoc and passive fashion, capitalizing on convenient data from completed clinical trials or retrospective cohorts. Impacts of the models often end at their publication rather than with the patients. The field of clinical risk prediction is rapidly improving in a progressively more transparent data science era. Based on collective experience over the past decade by the Prostate Biopsy Collaborative Group (PBCG), this paper proposes the following four data science-driven strategies for improving clinical risk prediction to the benefit of clinical practice and research. The first proposed strategy is to actively design prospective data collection, monitoring, analysis and validation of risk tools following the same standards as for clinical trials in order to elevate the quality of training data. The second suggestion is to make risk tools and model formulas available online. User-friendly risk tools will bring quantitative information to patients and their clinicians for improved knowledge-based decision-making. As past experience testifies, online tools expedite independent validation, providing helpful information as to whether the tools are generalizable to new populations. The third proposal is to dynamically update and localize risk tools to adapt to changing demographic and clinical landscapes. The fourth strategy is to accommodate systematic missing data patterns across cohorts in order to maximize the statistical power in model training, as well as to accommodate missing information on the end-user side too, in order to maximize utility for the public.
Supplementary material
Supplementary MaterialR code for producing figures is provided along with the TRIPOD checklist for prediction model development.
References
Austin EJ, Lee JR, Ko CW, Kilgore MR, Parker EU, Bergstedt B, et al. (2020). Improving the impact of clinical documentation through patient-driven co-design: experiences with cancer pathology reports. Healthcare Informatics, 27(3). https://doi.org/10.1136/bmjhci-2020-100197.
Engel JC, Palsdottir T, Ankerst D, Remmers S, Mortezavi A, Chellappa V, et al. (2022). External validation of the prostate biopsy collaborative group risk calculator and the rotterdam prostate cancer risk calculator in a Swedish population-based screening cohort. European Urology Open Science, 41: 1–7.
Hickey GL, Grant SW, Murphy GJ, Bhabra M, Pagano D, McAllister K, et al. (2013). Dynamic trends in cardiac surgery: why the logistic EuroSCORE is no longer suitable for contemporary cardiac surgery and implications for future risk models. European Journal of Cardio-Thoracic Surgery, 43(6): 1146–1152.
Patel HD, Koehne EL, Shea SM, Fang AM, Gerena M, Gorbonos A, et al.(2022). A prostate biopsy risk calculator based on MRI: development and comparison of PLUM to the PBCG. BJU International. https://doi.org/10.1186/1471-2407-6-120
Pfeiffer RM, Chen Y, Gail MH, Ankerst DP (2022). Accommodating population differences when validating risk prediction models. Statistics in Medicine. https://doi.org/10.1002/sim.9447.
Roobol MJ, Schröder FH, Hugosson J, Jones JS, Kattan MW, Klein EA, et al. (2012). Importance of prostate volume in the European Randomised Study of Screening for Prostate Cancer (ERSPC) risk calculators: results from the prostate biopsy collaborative group. The World Journal of Urology, 30(2): 149–155.