Statistical models for clinical risk prediction are often derived using data from primary care databases; however, they are frequently used outside of clinical settings. The use of prediction models in epidemiological studies without external validation may lead to inaccurate results. We use the example of applying the QRISK3 model to data from the United Kingdom (UK) Biobank study to illustrate the challenges and provide suggestions for future authors. The QRISK3 model is recommended by the National Institute for Health and Care Excellence (NICE) as a tool to aid cardiovascular risk prediction in English and Welsh primary care patients aged between 40 and 74. QRISK3 has not been externally validated for use in studies where data is collected for more general scientific purposes, including the UK Biobank study. This lack of external validation is important as the QRISK3 scores of participants in UK Biobank have been used and reported in several publications. This paper outlines: (i) how various publications have used QRISK3 on UK Biobank data and (ii) the ways that the lack of external validation may affect the conclusions from these publications. We then propose potential solutions for addressing these challenges; for example, model recalibration and considering alternative models, for the application of traditional statistical models such as QRISK3, in cohorts without external validation.
Abstract: Graphical procedures can be useful for illustrating and evaluating the process of inverse regression. We first review some simple and well-known graphical approaches for univariate linear and nonlinear models. We then propose a new graphical tool applicable to situations where the response is bivariate and repeated measures data are available. The proposed method is illustrated with an example of the age determination of tern chicks using measurements on body weight and wing length.
Abstract: An analysis of air quality data is provided for the municipal area of Taranto characterized by high environmental risks, due to the massive presence of industrial sites with elevated environmental impact activities. The present study is focused on particulate matter as measured by PM10 concentrations. Preliminary analysis involved addressing several data problems, mainly: (i) an imputation techniques were considered to cope with the large number of missing data, due to both different working periods for groups of monitoring stations and occasional malfunction of PM10 sensors; (ii) due to the use of different validation techniques for each of the three monitoring networks, a calibration procedure was devised to allow for data comparability. Missing data imputation and calibration were addressed by three alternative procedures sharing a leave-one-out type mechanism and based on ad hoc exploratory tools and on the recursive Bayesian estimation and prediction of spatial linear mixed effects models. The three procedures are introduced by motivating issues and compared in terms of performance.