Approximately 15% of adults in the United States (U.S.) are afflicted with chronic kidney disease (CKD). For CKD patients, the progressive decline of kidney function is intricately related to hospitalizations due to cardiovascular disease and eventual “terminal” events, such as kidney failure and mortality. To unravel the mechanisms underlying the disease dynamics of these interdependent processes, including identifying influential risk factors, as well as tailoring decision-making to individual patient needs, we develop a novel Bayesian multivariate joint model for the intercorrelated outcomes of kidney function (as measured by longitudinal estimated glomerular filtration rate), recurrent cardiovascular events, and competing-risk terminal events of kidney failure and death. The proposed joint modeling approach not only facilitates the exploration of risk factors associated with each outcome, but also allows dynamic updates of cumulative incidence probabilities for each competing risk for future subjects based on their basic characteristics and a combined history of longitudinal measurements and recurrent events. We propose efficient and flexible estimation and prediction procedures within a Bayesian framework employing Markov Chain Monte Carlo methods. The predictive performance of our model is assessed through dynamic area under the receiver operating characteristic curves and the expected Brier score. We demonstrate the efficacy of the proposed methodology through extensive simulations. Proposed methodology is applied to data from the Chronic Renal Insufficiency Cohort study established by the National Institute of Diabetes and Digestive and Kidney Diseases to address the rising epidemic of CKD in the U.S.
Pub. online:3 Nov 2022Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 20, Issue 4 (2022): Special Issue: Large-Scale Spatial Data Science, pp. 533–544
Abstract
Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.