Spatial data display correlation between observations collected at nearby locations. Generally, machine and deep learning methods either do not account for this correlation or do so indirectly through correlated features. To account for spatial correlation, we propose preprocessing the data using a spatial decorrelation transform motivated from properties of a multivariate Gaussian distribution and Vecchia approximations. The preprocessed, transformed data can then be ported into a machine or deep learning tool. After model fitting on the transformed data, the output can be spatially re-correlated via the corresponding inverse transformation. We show that including this spatial adjustment results in higher predictive accuracy on simulated and real spatial datasets.
Abstract: Simple parametric functional forms, if appropriate, are preferred over more complicated functional forms in clinical prediction models. In this paper, we illustrate our practical approach to obtaining the appropriate functional forms for continuous variables in developing a clinical prediction model for risk of Clostridium difficile infection. First, we used a nonpara metric regression smoother to establish the reference curve. Then, we used regression spline function-restricted cubic spline (RCS) and simple para metric forms to approximate the reference curve. Based on the shape of the reference curve, the model fit information (AIC), and the formal statistical test (Vuong test), we selected the simple parametric forms to replace the more elaborated RCS functions. Finally, we refined the simple parametric forms in the multiple variable regression model using the Wald test and the likelihood-ratio test. In addition, we compared the calibration and discrim ination aspects between the model with appropriate functional forms and the model with simple linear terms. The calibration χ 2 (8.4 versus 10) and calibration plot, the area under ROC curve (0.88 vs 0.84, p < 0.05), and inte grated discrimination improvement (0.0072, p < 0.001) indicated the model with appropriate forms was better calibrated and had higher discrimination ability.