Pub. online:3 Nov 2022Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 20, Issue 4 (2022): Special Issue: Large-Scale Spatial Data Science, pp. 533–544
Abstract
Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.
Abstract: Different models are used in practice for describing a binary lon gitudinal data. In this paper we consider the joint probability models, the marginal models, and the combined models for describing such data the best. The combined model consists of a joint probability model and a marginal model at two different levels. We present some striking empirical observa tions on the closeness of the estimates and their standard errors for some parameters of the models considered in describing a data from Fitzmaurice and Laird (1993) and consequently giving new insight from this data. We present the data in a complete factorial arrangement with 4 factors at 2 levels. We introduce the concept of “data representing a model completely” and explain “data balance” as well as “chance balance”. We also consider the best model selection problem for describing this data and use the Search Linear Model concepts known in Fractional Factorial Design research (Sri vastava (1975)).
Abstract: Fisher’s exact test (FET) is a conditional method that is frequently used to analyze data in a 2 × 2 table for small samples. This test is conservative and attempts have been made to modify the test to make it less conservative. For example, Crans and Shuster (2008) proposed adding more points in the rejection region to make the test more powerful. We provide another way to modify the test to make it less conservative by using two independent binomial distributions as the reference distribution for the test statistic. We compare our new test with several methods and show that our test has advantages over existing methods in terms of control of the type 1 and type 2 errors. We reanalyze results from an oncology trial using our proposed method and our software which is freely available to the reader.
Abstract: Interval estimation for the proportion parameter in one-sample misclassified binary data has caught much interest in the literature. Re cently, an approximate Bayesian approach has been proposed. This ap proach is simpler to implement and performs better than existing frequen tist approaches. However, because a normal approximation to the marginal posterior density was used in this Bayesian approach, some efficiency may be lost. We develop a closed-form fully Bayesian algorithm which draws a posterior sample of the proportion parameter from the exact marginal posterior distribution. We conducted simulations to show that our fully Bayesian algorithm is easier to implement and has better coverage than the approximate Bayesian approach.