Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022
  4. Active Data Science for Improving Clinic ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Active Data Science for Improving Clinical Risk Prediction
Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 177–192
Donna P. Ankerst ORCID icon link to view author Donna P. Ankerst details   Matthias Neumair  

Authors

 
Placeholder
https://doi.org/10.6339/22-JDS1078
Pub. online: 23 November 2022      Type: Data Science In Action      Open accessOpen Access

Received
11 July 2022
Accepted
8 November 2022
Published
23 November 2022

Abstract

Clinical risk prediction models are commonly developed in a post-hoc and passive fashion, capitalizing on convenient data from completed clinical trials or retrospective cohorts. Impacts of the models often end at their publication rather than with the patients. The field of clinical risk prediction is rapidly improving in a progressively more transparent data science era. Based on collective experience over the past decade by the Prostate Biopsy Collaborative Group (PBCG), this paper proposes the following four data science-driven strategies for improving clinical risk prediction to the benefit of clinical practice and research. The first proposed strategy is to actively design prospective data collection, monitoring, analysis and validation of risk tools following the same standards as for clinical trials in order to elevate the quality of training data. The second suggestion is to make risk tools and model formulas available online. User-friendly risk tools will bring quantitative information to patients and their clinicians for improved knowledge-based decision-making. As past experience testifies, online tools expedite independent validation, providing helpful information as to whether the tools are generalizable to new populations. The third proposal is to dynamically update and localize risk tools to adapt to changing demographic and clinical landscapes. The fourth strategy is to accommodate systematic missing data patterns across cohorts in order to maximize the statistical power in model training, as well as to accommodate missing information on the end-user side too, in order to maximize utility for the public.

Supplementary material

 Supplementary Material
R code for producing figures is provided along with the TRIPOD checklist for prediction model development.

References

 
Amaya-Fragoso E, García-Pérez CM (2021). Improving prostate biopsy decision making in Mexican patients: still a major public health concern. Urologic Oncology, 39(12): 831.e11–831.e18.
 
Ankerst DP, Straubinger J, Selig K, Guerrios L, de Hoedt A, Hernandez J, et al. (2018). A contemporary prostate biopsy risk calculator based on multiple heterogeneous cohorts. European Urology, 74(2): 197–203.
 
Austin EJ, Lee JR, Ko CW, Kilgore MR, Parker EU, Bergstedt B, et al. (2020). Improving the impact of clinical documentation through patient-driven co-design: experiences with cancer pathology reports. Healthcare Informatics, 27(3). https://doi.org/10.1136/bmjhci-2020-100197.
 
Carbunaru S, Nettey OS, Gogana P, Helenowski IB, Jovanovic B, Ruden M, et al. (2019). A comparative effectiveness analysis of the PBCG vs. PCPT risks calculators in a multi-ethnic cohort. BMC Urology, 19(1): 121.
 
Coemans M, Verbeke G, Döhler B, Süsal C, Naesens M (2022). Bias by censoring for competing events in survival analysis. BMJ Clinical Research, 378: e071349.
 
Collins GS, Dhiman P, Navarro CL, Ma J, Hooft L, Reitsma JB, et al. (2021). Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open, 11(7): e048008.
 
Collins GS, Reitsma JB, Altman DG, Moons KGM (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BJS, 102(3): 148–158.
 
Cook NR, Ridker PM (2014). Further insight into the cardiovascular risk calculator: the roles of statins, revascularizations, and underascertainment in the Women’s Health Study. JAMA Internal Medicine, 174(12): 1964–1971.
 
Doan P, Graham P, Lahoud J, Remmers S, Roobol MJ, Kim L, et al. (2021). A comparison of prostate cancer prediction models in men undergoing both magnetic resonance imaging and transperineal biopsy: are the models still relevant? BJU International, 128(Suppl 3): 36–44.
 
Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O (2021). A survey on missing data in machine learning. Big Data, 8(1): 140.
 
Engel JC, Palsdottir T, Ankerst D, Remmers S, Mortezavi A, Chellappa V, et al. (2022). External validation of the prostate biopsy collaborative group risk calculator and the rotterdam prostate cancer risk calculator in a Swedish population-based screening cohort. European Urology Open Science, 41: 1–7.
 
Grinspan ZM, Patel AD, Shellhaas RA, Berg AT, Axeen ET, Bolton J, et al. (2021). Design and implementation of electronic health record common data elements for pediatric epilepsy: foundations for a learning health care system. Epilepsia, 62(1): 198–216.
 
Hickey GL, Grant SW, Murphy GJ, Bhabra M, Pagano D, McAllister K, et al. (2013). Dynamic trends in cardiac surgery: why the logistic EuroSCORE is no longer suitable for contemporary cardiac surgery and implications for future risk models. European Journal of Cardio-Thoracic Surgery, 43(6): 1146–1152.
 
Hoogland J, van Barreveld M, Debray TPA, Reitsma JB, Verstraelen TE, Dijkgraaf MGW, et al. (2020). Handling missing predictor values when validating and applying a prediction model to new patients. Statistics in Medicine, 39(25): 3591–3607.
 
Jalali A, Foley RW, Maweni RM, Murphy K, Lundon DJ, Lynch T, et al. (2020). A risk calculator to inform the need for a prostate biopsy: a rapid access clinic cohort. BMC Medical Informatics and Decision Making, 20(1): 148.
 
Ji X, Kattan MW (2018). Tutorial: development of an online risk calculator platform. Annals of Translational Medicine, 6(3): 46.
 
Ma S, Schreiner PJ, Seaquist ER, Ugurbil M, Zmora R, Chow LS (2020). Multiple predictively equivalent risk models for handling missing data at time of prediction: with an application in severe hypoglycemia risk prediction for type 2 diabetes. Journal of Biomedical Informatics, 103: 103379.
 
Mortezavi A, Palsdottir T, Eklund M, Chellappa V, Murugan SK, Saba K, et al. (2021). Head-to-head comparison of conventional, and image- and biomarker-based prostate cancer risk calculators. European Urology Focus, 7(3): 546–553.
 
Neumair M, Kattan MW, Freedland SJ, Haese A, Guerrios-Rivera L, de Hoedt AM, et al. (2022). Accommodating heterogeneous missing data patterns for prostate cancer risk prediction. BMC Medical Research Methodology, 22(1): 200.
 
Patel AA, Gilbertson JR, Parwani AV, Dhir R, Datta MW, Gupta R, et al. (2006). An informatics model for tissue banks–lessons learned from the cooperative prostate cancer tissue resource. BMC Cancer, 6: 120.
 
Patel HD, Koehne EL, Shea SM, Fang AM, Gerena M, Gorbonos A, et al.(2022). A prostate biopsy risk calculator based on MRI: development and comparison of PLUM to the PBCG. BJU International. https://doi.org/10.1186/1471-2407-6-120
 
Pfeiffer RM, Chen Y, Gail MH, Ankerst DP (2022). Accommodating population differences when validating risk prediction models. Statistics in Medicine. https://doi.org/10.1002/sim.9447.
 
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
 
Raynaud M, Aubert O, Divard G, Reese PP, Kamar N, Yoo D, et al. (2021). Dynamic prediction of renal survival among deeply phenotyped kidney transplant recipients using artificial intelligence: an observational, international, multicohort study. The Lancet Digital Health, 3(12): e795–e805.
 
Roobol MJ, Schröder FH, Hugosson J, Jones JS, Kattan MW, Klein EA, et al. (2012). Importance of prostate volume in the European Randomised Study of Screening for Prostate Cancer (ERSPC) risk calculators: results from the prostate biopsy collaborative group. The World Journal of Urology, 30(2): 149–155.
 
Strobl AN, Vickers AJ, van Calster B, Steyerberg E, Leach RJ, Thompson IM, et al. (2015). Improving patient prostate cancer risk assessment: moving from static, globally-applied to dynamic, practice-specific risk calculators. The Journal of Biomedical Informatics, 56: 87–93.
 
Tan KW, Tan B, Thein TL, Leo YS, Lye DC, Dickens BL, et al. (2020). Dynamic Dengue haemorrhagic fever calculators as clinical decision support tools in adult Dengue. Transactions of the Royal Society of Tropical Medicine and Hygiene, 114(1): 7–15.
 
Thompson IM, Ankerst DP, Chi C, Goodman PJ, Tangen CM, Lucia MS, et al. (2006). Assessing prostate cancer risk: results from the prostate cancer prevention trial. Journal of the National Cancer Institute, 98(8): 529–534.
 
Tolksdorf J, Kattan MW, Boorjian SA, Freedland SJ, Saba K, Poyet C, et al. (2019). Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools. BMC Medical Research Methodology, 19(1): 191.
 
Tolksdorf JE (2019). Data scientific approaches to contemporary clinical risk tool construction. Universitätsbibliothek der TU München, München.
 
Vickers AJ, Cronin AM, Roobol MJ, Hugosson J, Jones JS, Kattan MW, et al. (2010). The relationship between prostate-specific antigen and prostate cancer risk: the prostate biopsy collaborative group. Clinical Cancer Research, 16(17): 4374–4381.
 
Wei G, Kelly BD, Timm B, Perera M, Lundon DJ, Jack G, et al. (2021). Clash of the calculators: external validation of prostate cancer risk calculators in men undergoing mpMRI and transperineal biopsy. BJUI Compass, 2(3): 194–201.
 
Westra BL, Lytle KS, Whittenburg L, Adams M, Ali S, Furukawa M, et al. (2020). A refined methodology for validation of information models derived from flowsheet data and applied to a genitourinary case. Journal of the American Medical Informatics Association, 27(11): 1732–1740.
 
Yıldızhan M, Balcı M, Eroğlu U, Asil E, Coser S, Özercan AY, et al. (2022). An analysis of three different prostate cancer risk calculators applied prior to prostate biopsy: a Turkish cohort validation study. Andrologia, 54(2): e14329.

Related articles PDF XML
Related articles PDF XML

Copyright
2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
logistic regression missing data prostate cancer risk calculator

Funding
Funding for the PBCG was provided by the US National Institutes of Health R01 grant CA179115.

Metrics
since February 2021
1218

Article info
views

468

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy