Abstract: The application of linear mixed models or generalized linear mixed models to large databases in which the level 2 units (hospitals) have a wide variety of characteristics is a problem frequently encountered in studies of medical quality. Accurate estimation of model parameters and standard errors requires accounting for the grouping of outcomes within hospitals. Including the hospitals as random effect in the model is a common method of doing so. However in a large, diverse population, the required assump tions are not satisfied, which can lead to inconsistent and biased parameter estimates. One solution is to use cluster analysis with clustering variables distinct from the model covariates to group the hospitals into smaller, more homogeneous groups. The analysis can then be carried out within these groups. We illustrate this analysis using an example of a study of hemoglobin A1c control among diabetic patients in a national database of United States Department of Veterans’ Affairs (VA) hospitals.
Abstract: A multilevel model (allowing for individual risk factors and geo graphic context) is developed for jointly modelling cross-sectional differences in diabetes prevalence and trends in prevalence, and then adapted to provide geographically disaggregated diabetes prevalence forecasts. This involves a weighted binomial regression applied to US data from the Behavioral Risk Factor Surveillance System (BRFSS) survey, specifically totals of diagnosed diabetes cases, and populations at risk. Both cases and populations are dis aggregated according to survey year (2000 to 2010), individual risk factors (e.g., age, education), and contextual risk factors, namely US census division and the poverty level of the county of residence. The model includes a linear growth path in decadal time units, and forecasts are obtained by extending the growth path to future years. The trend component of the model controls for interacting influences (individual and contextual) on changing prevalence. Prevalence growth is found to be highest among younger adults, among males, and among those with high school education. There are also regional shifts, with a widening of the US “diabetes belt”.
Abstract: Response variables that are scored as counts, for example, number of mastitis cases in dairy cattle, often arise in quantitative genetic analysis. When the number of zeros exceeds the amount expected such as under the Poisson density, the zero-inflated Poisson (ZIP) model is more appropriate. In using the ZIP model in animal breeding studies, it is necessary to accommodate genetic and environmental covariances. For that, this study proposes to model the mixture and Poisson parameters hierarchically, each as a function of two random effects, representing the genetic and environmental sources of variability, respectively. The genetic random effects are allowed to be correlated, leading to a correlation within and between clusters. The environmental effects are introduced by independent residual terms, accounting for overdispersion above that caused by extra-zeros. In addition, an inter correlation structure between random genetic effects affecting mixture and Poisson parameters is used to infer pleiotropy, an expression of the extent to which these parameters are influenced by common genes. The methods described here are illustrated with data on number of mastitis cases from Norwegian Red cows. Bayesian analysis yields posterior distributions useful for studying environmental and genetic variability, as well as genetic correlation.
Abstract: In the United States, diabetes is common and costly. Programs to prevent new cases of diabetes are often carried out at the level of the county, a unit of local government. Thus, efficient targeting of such programs re quires county-level estimates of diabetes incidence−the fraction of the non diabetic population who received their diagnosis of diabetes during the past 12 months. Previously, only estimates of prevalence−the overall fraction of population who have the disease−have been available at the county level. Counties with high prevalence might or might not be the same as counties with high incidence, due to spatial variation in mortality and relocation of persons with incident diabetes to another county. Existing methods cannot be used to estimate county-level diabetes incidence, because the fraction of the population who receive a diabetes diagnosis in any year is too small. Here, we extend previously developed methods of Bayesian small-area esti mation of prevalence, using diffuse priors, to estimate diabetes incidence for all U.S. counties based on data from a survey designed to yield state-level estimates. We found high incidence in the southeastern United States, the Appalachian region, and in scattered counties throughout the western U.S. Our methods might be applicable in other circumstances in which all cases of a rare condition also must be cases of a more common condition (in this analysis, “newly diagnosed cases of diabetes” and “cases of diabetes”). If ap propriate data are available, our methods can be used to estimate proportion of the population with the rare condition at greater geographic specificity than the data source was designed to provide.
Abstract: We propose two simple, easy-to-implement methods for obtaining simultaneous credible bands in hierarchical models from standard Markov chain Monte Carlo output. The methods generalize Scheff´e’s (1953) approach to this problem, but in a Bayesian context. A small simulation study is followed by an application of the methods to a seasonal model for Ache honey gathering.