Clinical Prediction Models in Epidemiological Studies: Lessons from the Application of QRISK3 to UK Biobank Data
Volume 20, Issue 1 (2022), pp. 1–13
Pub. online: 8 February 2022 Type: Philosophy Of Data Science Open Access
5 January 2022
5 January 2022
27 January 2022
27 January 2022
8 February 2022
8 February 2022
Statistical models for clinical risk prediction are often derived using data from primary care databases; however, they are frequently used outside of clinical settings. The use of prediction models in epidemiological studies without external validation may lead to inaccurate results. We use the example of applying the QRISK3 model to data from the United Kingdom (UK) Biobank study to illustrate the challenges and provide suggestions for future authors. The QRISK3 model is recommended by the National Institute for Health and Care Excellence (NICE) as a tool to aid cardiovascular risk prediction in English and Welsh primary care patients aged between 40 and 74. QRISK3 has not been externally validated for use in studies where data is collected for more general scientific purposes, including the UK Biobank study. This lack of external validation is important as the QRISK3 scores of participants in UK Biobank have been used and reported in several publications. This paper outlines: (i) how various publications have used QRISK3 on UK Biobank data and (ii) the ways that the lack of external validation may affect the conclusions from these publications. We then propose potential solutions for addressing these challenges; for example, model recalibration and considering alternative models, for the application of traditional statistical models such as QRISK3, in cohorts without external validation.
Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of Internal Medicine, 162(1): W1–W73. https://doi.org/10.7326/M14-0698.
Hippisley-Cox J, Coupland C, Brindle P (2017). Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ (Online), 357: j2099. https://doi.org/10.1136/bmj.j2099.
National Health Service (2019). NHS Health Check. Available: https://www.nhs.uk/conditions/nhs-health-check/. [Accessed November 2021].
UK Biobank (2022). About Us. [Online]. Available: https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us. [Accessed January 2022].
Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. (2017). Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. American Journal of Epidemiology, 186(9): 1026–1034. https://doi.org/10.1093/aje/kwx246.
Altman DG, Vergouwe Y, Royston P, Moons KGM (2009). Prognosis and prognostic research: validating a prognostic model. BMJ (Online), 338(7708): 1432–1435. https://doi.org/10.1136/bmj.b605.
Royston P, Altman DG (2013). External validation of a Cox prognostic model: principles and methods. BMC Medical Research Methodology, 13: 33. http://www.biomedcentral.com/1471-2288/13/33.
Livingstone S, Morales DR, Donnan PT, Payne K, Thompson AJ, Youn JH, et al. (2021). Effect of competing mortality risks on predictive performance of the QRISK3 cardiovascular risk prediction tool in older people and those with comorbidity: external validation population cohort study. The Lancet Healthy Longevity, 2(6): e352–e361. https://doi.org/10.1016/S2666-7568(21)00088-X.
Ramspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M (2021). External validation of prognostic models: what, why, how, when and where? Clinical Kidney Journal, 14(1): 49–58. https://doi.org/10.1093/ckj/sfaa188.
Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, et al. (2020). Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA. Journal of the American Medical Association, 323(7): 636–645. https://doi.org/10.1001/jama.2019.22241.
Trinder M, Uddin MM, Finneran P, Aragam KG, Natarajan P (2021). Clinical utility of lipoprotein(a) and LPA genetic risk score in risk prediction of incident atherosclerotic cardiovascular disease. JAMA Cardiology, 6(3): 287–295. https://doi.org/10.1001/jamacardio.2020.5398.
Agrawal S, Klarqvist MDR, Emdin C, Patel AP, Paranjpe MD, Ellinor PT, et al. (2021). Selection of 51 predictors from 13,782 candidate multimodal features using machine learning improves coronary artery disease prediction. Patterns, 2(12): 100364. https://doi.org/10.1016/j.patter.2021.100364.
Dolezalova N, Reed AB, Despotovic A, Obika BD, Morelli D, Aral M, et al. (2021). Development of an accessible 10-year Digital CArdioVAscular (DiCAVA) risk assessment: a UK Biobank study. European Heart Journal – Digital Health, 2(3): 528–538. https://doi.org/10.1093/ehjdh/ztab057.
Welsh C, Welsh P, Celis-Morales CA, Mark PB, Mackay D, Ghouri N, et al. (2020). Glycated hemoglobin, prediabetes, and the links to cardiovascular disease: data from UK Biobank. Diabetes Care, 43(2): 440–445. https://doi.org/10.2337/dc19-1683.
Diabetes.co.uk (2019). Guide to HBA1c. [Online]. Available: https://www.diabetes.co.uk/what-is-hba1c.html. [Accessed January 2022].
Carter AR, Gill D, Davey Smith G, Taylor AE, Davies NM, Howe LD (2021). Cross-sectional analysis of educational inequalities in primary prevention statin use in UK Biobank. Heart. https://doi.org/10.1136/heartjnl-2021-319238.
Yang C, Starnecker F, Pang S, Chen Z, Güldener U, Li L, et al. (2021). Polygenic risk for coronary artery disease in the Scottish and English population. BMC Cardiovascular Disorders, 21(1): 586. https://doi.org/10.1186/s12872-021-02398-4.
Patel AP, Wang M, Kartoun U, Ng K, Khera Av (2021). Quantifying and understanding the higher risk of atherosclerotic cardiovascular disease among South Asian individuals: results from the UK Biobank prospective cohort study. Circulation, 144(6): 410–422. https://doi.org/10.1161/CIRCULATIONAHA.120.052430.
Berry A, Yung AR, Carr MJ, Webb RT, Ashcroft DM, Firth J, et al. (2021). Prevalence of major cardiovascular disease events among people diagnosed with schizophrenia who have sleep disturbance, sedentary behavior, or muscular weakness. Schizophrenia Bulletin Open, 2(1): sgaa069. https://doi.org/10.1093/schizbullopen/sgaa069.
Siontis GCM, Tzoulaki I, Castaldi PJ, Ioannidis JPA (2015). External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. Journal of Clinical Epidemiology, 68(1): 25–34. https://doi.org/10.1016/j.jclinepi.2014.09.007.
van Calster B, Vickers AJ (2015). Calibration of risk prediction models: impact on decision-analytic performance. Medical Decision Making, 35(2): 162–169. https://doi.org/10.1177/0272989X14547233.
van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. (2019). Calibration: the Achilles heel of predictive analytics. BMC Medicine, 17(1): 230. https://doi.org/10.1186/s12916-019-1466-7.
Steyerberg E (2009). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, New York, NY. https://doi.org/10.1007/978-0-387-77244-8.
van Houwelingen HC (2000). Validation, calibration, revision and combination of prognostic survival models. Statistics in Medicine, 19: 3401–3415. https://doi.org/10.1002/1097-0258(20001230)19:24<3401::AID-SIM554>3.0.CO;2-2.
Pencina MJ, D’Agostino Sr RB (2015). Evaluating discrimination of risk prediction models: the C statistic. JAMA, 314(10): 1063–1064. https://doi.org/10.1001/jama.2015.11082.
Sun L, Pennells L, Kaptoge S, Nelson CP, Ritchie SC, Abraham G, et al. (2021). Polygenic risk scores in cardiovascular risk prediction: a cohort study and modelling analyses. PLoS Medicine, 18(1): e1003498. https://doi.org/10.1371/JOURNAL.PMED.1003498.
Allan S, Olaiya R, Burhan R (2021). Reviewing the use and quality of machine learning in developing clinical prediction models for cardiovascular disease. Postgraduate Medical Journal. BMJ Publishing Group. https://doi.org/10.1136/postgradmedj-2020-139352.
Li S, Cai TT, Li H (2020). Transfer learning for high-dimensional linear regression: prediction, estimation, and minimax optimality. arXiv preprint: http://arxiv.org/abs/2006.10593.
Wolford BN, Surakka I, Graham SE, Nielsen JB, Zhou W, Gabrielsen ME, et al. (2021). Utility of family history in disease prediction in the era of polygenic scores. medRxiv preprint: https://doi.org/10.1101/2021.06.25.21259158.