SENSITIVITY ANALYSIS OF BAYES FACTOR FOR CATEGORICAL DATA WITH EMPHASIS ON SPARSE MULTINOMIAL DATA

This article considers hypothesis testing using Bayes factor in the context of categorical data models represented in twodimensional contingency tables. The study includes multinomial model for a general I × J table data. Other data characteristics such as low as well as polarized cell counts and size of the tables are also considered. The objective is to investigate the sensitivity of Bayes factor taking these features into account so as to understand the performance of non-informative priors itself. Consistency has been studied based on different types of data and using Dirichlet prior with eight different choices for multinomial model followed by a bootstrap simulation. Study has emphasized the reasonable choice of values for the parameters that normally represents the underlying physical phenomena, though partially vague in nature.


Introduction
Categorical data are generally presented in a contingency table of size I × J with I representing the number of rows and J as number of columns.In most of the survey sampling, cell counts in I × J table follow a multinomial distribution and Chi square test of association is widely applied to understand the association between the variables.However, in practice, it is impossible to have an agreement about the appropriateness of the model to be used unless there is a well-established theoretical framework or a mechanism underlying the problem.Therefore, it is imperative to take into account uncertainties in the model-building process, and so to start with, a set of competing models has to be considered with each model viewed as a different state of a random variable.Marden (2000) has emphasized the need to expand the scope of hypotheses testing beyond p values and noted that Bayesian inference with automatic computing methods will be a promising approach.Statistical inference on parameter estimation based on Bayesian approach is conceptually straightforward in most of the problems.Once a prior distribution is defined and with a reasonable likelihood function, inference about parameters of interest is obtained from marginal posterior distributions and nuisance parameters are integrated.The role of probability in measuring the uncertainty at each stage of Bayesian estimation process is well established.In order to arrive at the posterior distribution of possible states of the model, prior distribution of each model is updated using the information contained in the data.Now, the inference is drawn from the entire posterior distribution of the most plausible model.Johnson (2005) proposed methods based on Bayes factors for modeling the sampling distributions of standard test statistics.The study indicated the possible extensions to test statistics associated with categorical data.However, influence of priors and nature of subjectivity affects the sensitivity of results in a significant way.Vanpaemel (2010) has provided exhaustive list of studies that address the sensitivity of priors and has emphasized the importance of prior in sensitivity analysis.Particularly Hashemi (1997) and Nandram and Choi (2007) separated by a decade but similar in approach, have provided motivation to investigate deeply the Bayes factor for categorical data under multinomial design, which is ubiquitous in many social science survey sampling designs and problems.
The above studies provided varying recommendations in literature on the use of Bayes factors for contingency tables based on the study design and size of the table (Upton, 1982;1992).Recently, Hitchcock (2009) emphasized that hypotheses testing problems related to I × J tables require elaborate studies in understanding the elegance of Yates correction.Mirkin (2001) listed different ways to look at Chi square tests and observed that it is an appealing measure for studying the association.The relevance of study design and analyses based on 2 × 2 tables too need more careful attention (Upton, 1982;Campbell, 2007).
In this paper, algebraic form of Bayes factor, in the context of categorical data represented in two-dimensional contingency tables has been presented based on the underlying sampling design.The multinomial model for a general I × J table with two competing models of 'no association' and 'association' between categorical variables has been considered.The important aspects of this study relate to multinomial models on categorical data (i) to understand the sensitivity of priors and (ii) to incorporate the sparseness of data in the case of higher order I × J tables with zero and / or positive low counts, polarized cell counts and size of the tables.The sensitivity of priors has witnessed an active discussion in Bayes factor applications in both designs and sparseness of data has been well incorporated for 2 × 2 tables (Subbiah and Srinivasan, 2008).
In the following Section, a quick overview of Bayes factor in general and statistical details for categorical data is presented.In Section 3, comparative analysis based on data collected from literature that exemplify the above listed features are presented followed by a study using bootstrap simulation based on the data extracted and Section 4 has concluding remarks.

Bayes Factor
If there are several competing hypotheses or models about a system, then the set of models can be considered as mutually exclusive and exhaustive.A prior probability p(Hi) (i=1,2,…,N) can be assigned to each hypothesis such that Σp(Hi) = 1, with N denoting the number of hypotheses.After observing data y, the posterior probability of hypothesis Hi is where p(y|Hi) is the marginal density which is the expected value of all possible likelihoods.
Then hypothesis i relative to j is of the form It could be observed that the posterior odds ratio is the product of the prior odds ratio and ratio of the marginal probabilities under each of the hypotheses.Then Bayes factor Bij is defined as odds prior odds posterior Bij is not affected by prior specifications and Bij > 1 can be interpreted as the hypothesis Hi to be more plausible than Hj in the light of y.However, the above interpretation holds only when both Hi and Hj are simple hypotheses.Berger and Delampady (1987), Kass (1993), Bernardo and Smith (1994), Kass and Raftery (1995), Goodman (1999), Delampady and Berger (1990), Lavine and Schervish (1999), Ghosh et al (2006) provide a better insight into the concept of Bayes factor.The explicit forms of Bayes factor for multinomial model is derived in the following sub section.

Bayes Factors for I × J multinomial model
In the case of general I × J tables, if xij (i=1,2,…I; j=1,2,…,J) denotes the observed cell counts, with ri =  x is the column total and n =  ij x is the grand total then the Multinomial likelihood is Also, the conjugate prior (Gelman et al, 2002) for the proportion parameter vector θ = (θij) could be a multivariate generalization of Beta distribution known as Dirichlet (αij) with αij > 0 and density function is The pervasive inferential problem related to a categorical data summarized in contingency tables is testing the statistical independence of two categories of the categorical data.Model H0 corresponds to the null hypothesis that there is no association between the two categories whereas Model H1 takes that there is an association between the categories constituting I x J contingency table.
Then under H0, the prior distribution π0(θ) for the parameter θ = (θij) is based on the law of independence where Also for the prior π1(θ) for model H1 is θ = (πij) ~ Dirichlet (αij).Hence the marginal likelihood under the model Mt (t = 0, 1) is Hence, the Bayes factor for comparing these two models is However computing B01 on log scale will alleviate the problem of overflow that may occur if it is computed directly.Kass and Raftery (1995) have provided appropriate guidelines for interpreting B01 and log(B01) as the degree of evidence for H0 and is as follows;  1 < B01 < 3 indicates 'H0 is not worth more than a bare mention'  3 < B01 < 20 indicates 'H0 is positive'  20 < B01 < 150 indicates 'strong evidence for H0'  150 < B01 indicates 'very strong evidence for H0'

Comparative Data Analysis
Statistical literature on theory and applications of inferential procedures associated with contingency tables of two-dimensional categorical data provide a few essential characteristics necessary for comparative studies.These include order of tables (k=IJ), sample size (n), zero counts, notable polarized cell counts and positive low counts (cell counts not more than 6).
All I × J data sets are extracted from Agresti (2002), a classical book for social sciences that illustrate most of the issues in categorical data.In line with the above features, tables vary in sizes from 3 × 3 to 6 × 4 rectangular tables; 5 % to 43 % of the cells have low counts; grand total spans over a range of 96 to 3600; cell counts vary from 14 to 711.Also, to quantify the polarized cell counts a metric v = range/k has been used to indicate the nature of distribution of values.The summary of the selected characteristics for the data sets are presented in Table 1.It is necessary to consider three cases of Dirichlet parameters as prior distributions for multinomial model; αij for model H1 and γi's and δj's for model H0.In all the cases, an equal value for these parameters has been considered with eight choices (0.5 to 2.5) that include either side of Uniform distribution (αij = 1 in π(θ)) and Table 2 presents the log Bayes factor in favor of H0.
It can be noted that Bayes factors have shown a consistent pattern except in one case (VI data set) within these two groups of choice of Dirichlet parameters.These data sets are of reasonable size with no zeros and the total is of moderately high value (1660) with reasonably non-polarized counts.This may be a significant observation that choice of Dirichlet parameter (the only prior parameters) for a well-behaved data set is more critical.However, sparseness or polarized count compel to select values either less than 1 or greater than1 for all the Dirichlet distribution parameters so that Bayes factors are not much sensitive for making conclusions based on the usual recommendations.Further, a simulation study has been carried out to supplement the findings.Since the study is focused to consider identified characteristics of I × J tables, bootstrap simulation are used for the computation of Bayes factor and testing the sensitivity of prior parameters.Based on each of the data sets, 1000 bootstrap samples are generated so that the noted features are expected to be consistently present in the samples.Estimates comprising mean together with its standard error (SE) and 95% limits for confidence interval as 2.5 and 97.5 percentiles are presented in Table 3.Also, Figures 1 and 2 depict the box plots to show the distribution of estimated Bayes factor from bootstrap samples based on data sets I-IV and V-VIII, respectively.From the box plots it could be observed that values log Bayes factor are exactly on one side of zero with an exemption in data sets II and VI.Though they could be considered as less-polarized data set, low and / or zero counts would cause the changes in the estimates.This is further Interestingly, such changes are evident when prior values are near to 1 and this tends to provide a method on choice of prior parameters based on the study which deals with rare or non-rare phenomena.Further, among other data sets with moderately polarized and notable low counts such as data set VI, such directional changes in the estimated values are visible from Table 3 but may not be fully captured by box plots.All other tables with sufficiently large cell counts do not yield any such feature irrespective of polarized cell counts.
However, if a point estimate like mean value is compared for a data set over different choices of priors then no directional changes can be observed.Such a pattern prevails among all the data sets of distinct characteristics.This is consistent with the values of corresponding standard errors too and hence reporting point estimate may not be fully sufficient to study the sensitivity of estimates over various choices of parametric values.The entire comparison is to demonstrate the sensitivity of priors in the values of Bayes factors attributed to data characteristics, especially when they are sparse in nature.Hence, sparse data occurrence could be reckoned by the researcher as a priori based on the problem under consideration and the way respondents might behave to the choices of variables.This partially elicited information will help to set the range for the prior parameters to estimate as well as consider the sensitivity analysis.

Conclusion
Two contrasting recommendations regarding Bayes factors for I × J tables and a 2 × 2 table provide ample scope to investigate the data model, sparseness and pattern of cell counts as it could affect the estimates and thereby the conclusions.Sensitivity of priors on Bayes factor has been accepted in principle; yet understanding the nature of data representing the physical phenomena is important.In such cases, even a controlled vague prior would reflect partial information which tends to reduce the extreme sensitivity as exhibited in Nandram and Choi (2007).Three examples considered in Nandram and Choi (2007) have ignored the distinct features of I × J tables as listed in the present study and could be well influenced by the specific choice of Dirichlet parameters.
Subsequently, Nandram et al (2013) have provided a test of independence related to data from a two-stage cluster sampling design; simulation study has revealed the fact that Bayes factor will not be sensitive to small changes in the uniform prior.This finding has been attributed to the nature of cell counts that are expected to be larger than zero but one or two cells can have zero counts.
The main objective of the paper is to carry out the inevitable sensitivity analysis for Bayes factor but with reasonable parameter choice related to the problem of interest as observed in Vanpaemel (2010).The present work considered the problem on sparse multinomial tables not only limited to zero counts but also low and / or polarized counts as polarized counts tend to affect estimation of multinomial probabilities in the sense of aberrations (May and Johnson, 2000).
Also, to substantiate the findings, Bootstrap samples have been used to study the sensitivity mainly to preserve the expected features of data / underlying phenomena.Computational tool has been provided for a variety of Dirichlet parameters beyond the usual uniform distribution.The procedure envisages a more flexible approach to handle priors for null hypothesis and replication of these procedures can be done through R codes presented in the Appendix.Kruschke (2010) has pointed out the need to attempt mildly informed or consensually informed prior distributions rather than objective priors.The present study has made an attempt to divide the parameter space appropriately for choosing prior distributions in obtaining Bayes factor to test the independence related to data from a I x J contingency table.Such recommendations are suitable for a more realistic sensitivity analysis for Bayes factor computed for various choices of prior values.Hence the researchers are encouraged to adopt Bayes Factor with plausible priors that may be partially informative as a result of theoretical background of the problem.However, a more concrete way to define the distance between the cell counts could be attempted to study the effect on Bayes factors and redefine the recommendations in the analysis of contingency tables.

Figure 1 :Figure 2 :
Figure 1: Box plots for estimated Bayes factors from the bootstrap samples based on data sets I -IV.Each plot corresponds to eight different choices of prior parameters.
numerical summaries which indicate the changes in terms of positive and negative estimated values.

Table 1 :
Details of the eight data sets considered for the comparative study of Bayes factor associated with two dimensional contingency tables

Table 2 :
Bayes factors (in natural log scale) for the evidence of null hypothesis (no association) in I × J data sets under multinomial sampling model.

Table 3 :
Bootstrap estimates for the log Bayes factor in favor of null hypothesis of independence using eight distinct prior choices for Dirichlet parameters.