Bayes Factors for Comparing Two Restricted Means: An Example Involving Hypertense Individuals

We are interesed in comparing the mean the diastolic blood pressure of individuals submitted to a stress stimulus to that of individuals under normal conditions with the prior knowledge that the subjects in both groups are hypertense. Essentially, this may be formulated as a two sample problem for Gaussian populations with bounded means. For such purposes, we consider two different approaches to obtain Bayes factors. The first is based on predictive distributions and the second is based on Markov Chain Monte Carlo methods. The sensitivity of the Bayes factors with respect to choice of prior distributions is also investigated.

1. Introduction Giampaoli and Singer (2004) consider the problem of comparing the mean diastolic blood pressure of individuals submitted to a stress stimulus (like the death of a close relative or discharge from employment) to that of individuals under normal (no stress) conditions based on blood pressure measurements obtained on two independent samples of sizes n 1 and n 2 , respectively.The data are reproduced in Table 1, the entries of which correspond to the average of series of 30 measurements taken over periods of one hour to eliminate short term fluctuations.
Assuming normality and homocedasticity, this is the typical situation for which Student's t test is an appropriate statistical tool.In fact, a simple two sample t-test with df=20 yields a p-value of 0.0595.For the sake of comparison, the p-value for the correponding Wilcoxon two sample test is 0.2929.Hence the data does not provide sufficient evidence for rejection of the null hypothesis that the mean diastolic blood pressure of subjects under normal or stress conditions are equal.Here, however, we are interested in comparing the corresponding means with the additional information that the subjects in both groups are hypertense, i.e., have mean diastolic blood pressures known to be at least 90 mmHg.This Control group 81.7 89.9 81.5 89.4 94.6 93.5 85.5 95.5 88.9 95.4 97.0 occurs, for example, in studies where patients are being screened for the detection of possible causes of hypertension.
More specifically, let y 1 = (y 11 , . . ., y 1n 1 ) and y 2 = (y 21 , . . ., y 2n 2 ) respectively denote independent random samples of N (µ 1 , σ 2 ) and N (µ 2 , σ 2 ) distributions.Under this framework, we consider tests of the hypothesis with or without the assumption that σ 2 is known under the restriction that µ 1 ≥ c and µ 2 ≥ c where c is some constant.We may take c = 0 without loss of generality.
Classical statistical inference under constrained parametric spaces has been addressed by many authors, among which we mention Bartholomew (1959aBartholomew ( , 1959bBartholomew ( , 1961)), Kudô (1963), Nüesch (1966), Perlman (1969) and Barlow et al. (1972).In particular, Giampaoli and Singer (2004) proposed tests for the comparison of Gaussian distributions with restricted means (assuming or not known variance), which may be considered as alternatives to the usual Z or t tests.Using simulation studies, the author has shown that the proposed tests are more powerful than the usual Z or t tests in most circumstances.However, they are only approximate level α tests and depend on estimators of the unknown parameters.We attack the problem under a Bayesian approach, since it seems that from such a perspective, the restricted parameter space can be handled with less effort than under a classical point of view.
Here M is an integer-valued parameter indexing the models, such that M = 1 corresponds to H 0 and M = 2 corresponds to H 1 .
In Section 2 we present details for the computation of the Bayes factors under two different approaches assuming that the common variance is known.In Section 3 we illustrate the procedures with the data depicted in Table 1.Section 4 is devoted to an extension for the unknown variance case.We conclude with a discussion of the results in Section 5.

Bayes Factors
The ratio of the observed marginal densities for the two models, namely where y = (y 1 , y 2 ) denotes the vector of observations and π j = p(M = j), j = 1, 2, K i=1 π i = 1 represent the prior model probabilities, is known as the Bayes factor and may be used for deciding in favor or against each model.Kass and Raftery (1995) discuss the use of these factors for hypothesis testing and suggest the rule displayed in Table 2 for deciding between one model or the other.Although Bayes factors constitute important tools for statistical analysis, their use is not free of controversy.First, they may be severely affected by variation of prior distributions.Also, they lack interpretation when improper prior distributions are used, as pointed by O'Hagan (1995).Variations commonly employed for model comparison include pseudo-Bayes factors as advocated by Geisser andEddy (1979), Gelfand et al. (1992) or Gelfand and Dey (1998), posterior Bayes factors as suggested by Aitkin (1991) or criteria that minimize some posterior loss function as proposed by Gelfand and Ghosh (1998).We can also mention intrinsic Bayes factors suggested by Berger and Pericchi (1996) and fractional Bayes factors, discussed in O' Hagan (1995) or De Santis and Spezzaferri (1997).These techniques, however, use training samples and are unstable when the size of sample is small as in the example described above.
To compute Bayes factors, we need to obtain the posterior probability associated with each model, namely, p(M = j|y), j = 1, 2. We consider two different techniques for such purposes.The first, due to Irony and Pereira (1995) provides an analytic expression for the Bayes factor obtained by a direct computation of the posterior distribution; it accommodates any proper prior distribution and does not require Gibbs sampling, since usual methods of numerical integration may be employed.The second, proposed by Carlin and Chib (1995), is based on the Gibbs sampler and on Markov Chain Monte Carlo methods for the computations.It requires either the use of conjugate prior distributions or the existence of conditional complete conjugation or conditional complete posterior density.Unfortunately, a direct comparison of the results obtained by the two techniques is quite complicated, since the first technique relies on a prior distribution specified for the entire parameter space while the second requires different prior distributions under the null and the alternative hypotheses.

Computation of Bayes factors via predictive distributions
First note that for j = 1, 2, a direct application of Bayes's Theorem yields Then let and note that the null (j = 1) and the alternative (j = 2) hypotheses may be written as The prior density of µ under H j , (j = 1, 2) is defined as where g is a convenient prior density function and g j denotes the function g restricted to the set Ω j , i.e., the function with domain Ω j that g j (µ) = g(µ) if µ ∈ Ω j .Note that the integral Ω j g j dΩ j represents the volume under the prior density function g(µ) in Ω j .When the integral is null as in the case where the dimension of Ω j is less than the dimension of the parameter space, we consider the corresponding line integral.Naturally the prior density (2.4) is well defined if 0 < Ω j gdΩ j < ∞.Under the hypothesis (2.3), j = 1, 2, the required predictive densities are p(y|M = j) = Ω j g j (µ)l(y|µ)dµ where l(y|µ) is the likelihood function, then p(M = j|y) may be obtained from (2.2).

Computation of Bayes factors via MCMC methods
We now describe an alternative way of computing p(M = j|y), j = 1, 2 via an MCMC algorithm proposed by Carlin and Chib (1995).Essentially they consider M as an component of the random vector (µ, (µ 1 , µ 2 ), M) so can it be sampled via Gibbs methods.After convergence, the estimators of p(M = j|y), j = 1, 2 may be obtained as the ratios between the number of iterations for which M = j and the total number of iterations.Direct sampling of the marginal distributions themselves or of the joint distribution p(υ, y) is complicated, but sampling of the full conditional posterior distributions p(µ|(µ 1 , µ 2 ), M, y), p((µ 1 , µ 2 )|µ, M, y) and p(M |µ, (µ 1 , µ 2 ), y) is straightforward using the Gibbs sampler.To satisfy the MCMC conditions for convergence, we need to specify a full probability model.For such purposes we assume that µ and (µ 1 , µ 2 ) are independent given the model indicator M , and that the prior distributions p(µ|M = j) and p(( (2.6) We also assume that y is independent of (µ 1 , µ 2 ) given M = 1 and of µ given M = 2.We complete the Bayesian model specification by choosing proper "pseudoprior" distributions p(µ|M = 2) and p(µ 1 , µ 2 |M = 1).Because of the conditional independence assumptions, these "pseudoprior" distributions do not interfere with the expression of the marginal densities and so their form is irrelevant.
The joint distribution of y and (µ, (µ To implement the Gibbs sampler we need the full conditional distributions of µ and (µ 1 , µ 2 ) as well as that of M .The former are given by (2.8) Thus, when M = 1, the required distribution for µ (2.7) is generated from model 1; otherwise, when M = 2 the distribution is generated from the corresponding "pseudoprior" distribution.Similarly, when M = 1, the required distribution for (µ 1 , µ 2 ) (2.8) is generated from the corresponding "pseudoprior" distribution; otherwise, the required distribution is generated from model 2.
For M , we have and Thus, samples of the posterior distribution p(µ, (µ 1 , µ 2 ), M|y) may be obtained by samples from the full conditional distributions (2.7) -(2.8) and (2.9) -(2.10), via Gibbs algorithm.In each iteration, l = 1, . . ., N, a sample of size one (µ (l) , (µ 1 , µ 2 ) (l) , M (l) ), is obtained.The ratio provides a simple estimate of p(M = j|y) that may be used to compute the Bayes factor from (2.1).Although the form of the "pseudoprior" distributions is theoretically arbitrary, it is convenient to have them close to the conditional densities p(µ|(µ 1 , µ 2 ), M, y) and p((µ 1 , µ 2 )|µ, M, y) so plausible values are generated even when the assumed model is false.Carlin and Chib (1995) recommend separate runs for each model, i.e., considering π 1 = 1 and π 2 = 0, and an approximation of the resulting posterior distribution by pseudoprior distribution for the parameter µ under model M = 2. Subsequently, considering π 2 = 0 and π 1 = 1, an approximation of the resulting posterior distribution may also be taken as pseudoprior distribution for the parameters (µ 1 , µ 2 ) under model M = 1.

Analysis of the Diastolic Blood Pressure Data
In this section we consider the analysis of the data presented in Table (1) assuming that the variance is known and using either truncated Gaussian distributions or exponential distributions as prior distributions for µ 1 and µ 2 .These distributions do not belong to conjugate families of distributions.We start with the technique proposed by Irony and Pereira (1995).

Analysis via predictive distributions
The likelihood function is where τ = σ −2 and K 1 = (τ /2π) (n 1 +n 2 )/2 .We assume that µ 1 and µ 2 have Gaussian prior distributions truncated at zero, with parameters (a 1 , γ 1 ) and (a 2 , γ 2 ) respectively, i.e., with density functions with Φ denoting the standard Gaussian distribution function, j = 1, 2, so that Using the independence assumption on µ 1 , µ 2 it follows that the corresponding joint prior density is where From (3.1) and (3.5), we may write the joint density of the data and the Let g 1 (µ) = g(µ, µ) and Ω 1 = {µ : µ ≥ 0}.Then the denominator of (2.5) is The corresponding numerator is with and From (3.7) and (3.8) is follows that the predictive density (2.5) is Similarly, we may compute for j = 1, 2. Using (3.10) and (3.11), we conclude that the predictive density (2.5) under the alternative hypothesis is Then, choosing, for example, equal prior probabilities for both models, i.e., (π 1 = π 2 = 1/2), the Bayes factor (2.1) is the ratio between (3.12) and (3.9).Using these results, we computed Bayes factors for the data in Table 1 under different choices for the hyperparameters.We assumed σ 2 to be known with a value equal to its estimate based on residual sum of least squares; thus τ = σ −2 = 0.03.To obtain the hyperparameters for the distributions of µ 1 and µ 2 , we fixed E(µ j ) and τ j = [V ar(µ j )] −1 , j = 1, 2 and solved the system of equations defined by (3.3) and (3.4) for a 1 , γ 1 , a 2 and γ 2 .The correponding Bayes factors obtained by fixing E(µ 1 ) = y 1 − 90 = 4.1636, and E(µ 2 ) = y 2 − 90 = 0.2636 for different values of τ 1 = τ 2 are presented in the Table 3.The Bayes factor decreases as the precision τ 1 = τ 2 decreases, and it is more stable in the case of less informative prior distributions.Also, fixing γ 1 = γ 2 = 0.349, we varied a 1 and a 2 in the interval [0, 5] and obtained Bayes factors in the interval [1.65, 2.97].These results suggest that the Bayes factor is sensitive to the choice of the hyperparameters.However, within the limits considered here, there is some weak evidence in favour of the alternative hypothesis in all cases.This evidence decreases as the prior distributions become less informative.
For the sake of comparison, we conducted a similar analysis, computing Bayes factors under unrestricted parameter spaces, i.e., ) We assumed Gaussian prior distributions with parameters (a 1 , γ 1 ) and (a 2 , γ 2 ), for µ 1 and µ 2 , respectively.The predictive densities may be computed as in the previous case, from (3.7), (3.8), (3.10) and (3.11) with the obvious modifications.
Here we obtained the values of the hyperparameters by setting a j = E(µ j ), and γ j = τ j = τ , j = 1, 2. In Table 4 we present the corresponding Bayes factors for different values of τ .The results also suggest less evidence against the null hypothesis as the prior distributions become less informative.3) and ( 4) for equal values of the dispersion parameters, τ 1 = τ 2 = τ = 0.35, we conclude that the Bayes factor obtained under the restricted parameter space (B 21 = 2.81) is greater than the unrestricted parameter space counterpart (B 21 = 2.16), which suggests that the restriction increases the evidence against the null hypothesis.For less informative prior distributions, however, the Bayes factors associated to the unrestricted parameter space are smaller than those obtained under the restricted parameter space.
To explore the flexibility of the methodology proposed by Irony e Pereira (1995), we mantained the same normal density for the data y and considered an exponential prior distribution with parameter λ j , for µ j , j = 1, 2, i.e., g(µ j ) = λ j exp(−λ j µ j )I [0,∞) , (3.15) so that E(µ j ) = λ −1 j and V ar(µ j ) = λ −2 j , j = 1, 2. The joint prior density of (µ 1 , µ 2 ) is and the joint likelihood of data y and the parameter µ = (µ 1 , µ 2 ) is The denominator of the ratio that defines p(y|M = 1) is where g 1 (µ) = g(µ, µ).The corresponding numerator is Similarly, the denominator of the ratio that defines p(y|M = 2) is and the corresponding numerator is Taking the hyperparameters as the inverse of the sample means, i.e. λ 1 = 0.24 1/4.1636, and λ 2 = 3.79 1/0.2636, it follows that the Bayes factor (2.1) is equal to 5.22, leading to the rejection of the null hypothesis.

Analysis via MCMC methods
Now we analyse the same data via the proposal of Carlin and Chib (1995).We considered Gaussian distributions for the data and truncated prior Gaussian distributions for the parameters µ, µ 1 and µ 2 .We did not consider a pseudoprior distribution for σ 2 , since the data are informative about such parameter, regardless of the value of M , as suggested by Carlin and Chib (1995).In all prior distributions the hyperparameters (mean and standard deviation) were set equal to the least squares estimator of the corresponding parameters in each model.
These values are presented in Table 5.Although both the likelihood and the chosen prior distributions are standard in Bayesian analyses, it is not simple to obtain expressions for the posterior distributions required to compute the Bayes factor.This difficulty is related to the restriction on the parameter space so we must rely on numerical procedures for the computations.We used the BUGS (version 0.6) software (Bayesian Inference using Gibbs Sampling) developed by Thomas et al. (1992) for such purposes.
Running each model separately, we obtained estimates of the posterior means and standard deviations, which we used as values for the hyperparameters of the pseudoprior distributions.We used the methods proposed by Geweke (1992) as convergence diagnostics.They are available in CODA (Convergence Diagnosis and Output Analysis Software for Gibbs sampling), a menu-driven set of S-Plus functions, originally developed by Cowles (1994), that serves as an output processor for BUGS.To compute the Bayes factors, we considered a BUGS run of 500 burn-in iterations and 10000 updating iterations.The Bayes factors for comparison of models 1 and 2 under different a priori probabilities π j , j = 1, 2 are presented in Table 6.
As with the previous approach, the Bayes factors provide evidence (although weak) against the null hypothesis, (see Table 2), suggesting that the data are more compatible with model 2. To evaluate the effect of the choice of the hyperparameters of the prior distributions, we assumed π 1 = π 2 = 0.5 and varied the corresponding precision (inverse of the variance) parameters, τ µ , τ µ 1 and τ µ 2 , obtaining the results displayed in Table 6.
Along the lines of the previous analyses, we observe that the Bayes factor increases as the precision increases, indicating that it is sensitive to the choice of the hyperparameters of the prior distributions.If we consider, for example, a noninformative prior distribution with precision hyperparameter equal to 0.00001, the Bayes factor is B 21 = 0.02, suggesting that the null hypothesis should be accepted, i.e. that model 1 is more compatible with the data.
We also computed Bayes factors for models with unrestricted means considering π 1 = π 2 = 0.5, obtaining B 21 = 2.52, which is greater than its counterpart under the restricted parameter space (B 21 = 1.65).This suggests that the inclusion of the restriction in the parameter space decreases the evidence against the null hypothesis null, an unexpected conclusion, for which we have no explanation.Finally, still working under an unrestricted parameter space, we considered a non-informative prior distribution with precision parameter equal to 0.00001, obtaining B 21 = 0.03, which is in line with the value obtained under the restricted parameter space (B 21 = 0.02).

Extension to the Unknown Variance Case
In this section we first consider the unknown variance case under the Irony and Pereira (1995) approach.In addition to the truncated normal distributions (3.2) adopted for µ 1 and µ 2 , we chose a prior gamma distribution with parameters ν 1 and ν 2 for τ = σ −2 so that the joint prior density is with τ > 0 and Here, the relevant (restricted) parameter space sets are The expressions (3.7) and (3.10) reduce to and Similarly, one may show that (3.8) and (3.11) reduce to and The required predictive densities may then be obtained from (2.5) using either (4.1) and (4.3) or (4.2) and (4.4).Specifying the prior probabilites π 1 and π 2 we may compute the Bayes factor from (2.1).Actual computations require numerical integration methods such as the generalizations of Romberg integration, for example, routine MIDEXP (see, Press et al., 1994).
For numerical illustration we first considered an informative gamma distribution for τ by taking ν 1 = ν 2 = 0.03 (the estimator of τ based on residual sum of squares).We also considered a non-informative gamma distribution for τ by choosing ν 1 = 1 and ν 2 = 33.3= 1/0.03.The results are displayed in Table 7.Here too, the Bayes factor increases unexpectedly, when the precision τ 1 = τ 2 decreases as opposed to the known variance case (see Table 3).The behaviour of the Bayes factor is similar for informative or non-informative prior distributions for the precision parameter.Also, the Bayes factors computed under unknown variance situations are smaller that those obtained under the known variance case (see Tables 3 and 7).
We also repeated the computations under an unrestricted parameter space with the same choice for the hyperparameter values.The corresponding results are indicated within parentheses in Table 7.It is clear that the Bayes factors increase when τ 1 = τ 2 decreases, suggesting that the more informative the prior distribution, the smaller is the evidence against the null hypothesis, again in opposition to the results obtained under known variance (see Table 4).The corresponding Bayes factors are smaller than those obtained under the restricted parameter space for informative prior distribuitions (τ 1 = τ 2 = 0.35 or τ 1 = τ 2 = 0.20), but they are comparable for the non-informative prior distributions.
We now perform a similar analysis under the methodology proposed by Carlin and Chib (1995).The prior distributions for µ, µ 1 and µ 2 are the same considered in the known variance case.We also considered the same informative and noninformative prior distributions for τ used in the analysis under the Irony and Pereira (1995) methodology.We performed a BUGS run of 500 (1000) burnin iterations and 10000 updating iterations for the informative (non-informative) prior distribution.The Bayes factors associated to the different prior distributions and different choices of π j , j = 1, 2 are presented in Table 8.In all cases the Bayes factors provide weak evidence against the null hypothesis.
Finally, we computed the corresponding Bayes factors under an unrestricted parameter space, obtaining B 21 = 2.72 and B 21 = 2.65, for the informative and As in the known variance case the incorporation of restrictions on the parameters decreases the evidence against the null hypothesis.

Discussion
In this work we study two different Bayesian approaches for comparing to bounded means of Gaussian distributions, a problem of interest in situations where the phenomenon under investigation induces restrictions, as in the example described in the Introduction.The incorporation of prior knowledge about the restriction generates some technical difficulties under a frequentist approach as seen in Giampaoli and Singer (2004).
Although a direct comparison of the methodologies proposed by Irony and Pereira (1995) and Carlin and Chib (1995) is complicated in view of the differences in formulation, the Bayes factors computed under the latter are smaller, which might be an indication that the former is more efficient to detect small differences and do not present contradictory results.
The two approaches differ in the computation of the posterior probabilities p(M = j|y), j = 1, 2. Under the first approach, they are directly computed from the predictive density (2.5) while under the second approach they are obtained by a Gibbs sampling of the full conditional distribution.The prior opinion on the full parameter space is expressed in a certain way via the construction of pseudoprior distributions when working under the Carlin and Chib (1995) approach.It may not be implemented with exponential prior distributions, for example, because the full conditional distribution in this case is not acceptable.The methodology proposed by Irony and Pereira (1995), on the other hand, could be inappropriate for the comparison of non-nested models.However, not only it requires a smaller number of hyperparameters but it is also less sensitive to their choice.
We finally conclude by recognizing that the use of Bayes factors for hypothesis testing is a controversial subject and other alternatives must be considered for such purposes.In particular we mention the Bayesian reference criterion (BRC) as suggested in Bernardo and Rueda (2002) or the recent proposal of Bayesian p-values by Pereira andStern (1999, 2001).In a future paper we will treat this problem considering the intrinsic statistic and the corresponding Bayes decision rule, the Bayesian reference criterion (BRC).This approach may be necessary the study and consequently generation of probability densities.

Table 2 :
Kass and Raftery's decision rule based on Bayes factors.

Table 3 :
Bayes factors for different values of the hyperparameters

Table 4 :
Bayes factors for unrestricted parameter spaces

Table 5 :
Least squares estimates of the parameters in models (1.3) and (1.4)

Table 6 :
Bayes factors computed under the Carlin and Chib proposal

Table 7 :
Bayes factors computed under the Irony and Pereira proposal -unknown variance case

Table 8 :
Bayes factors computed under the methodoloy of Carlin and Chibunknown variance case prior distributions respectively; these values are greater than those computed under the restricted parameter space, namely B 21 = 1.72 and B 21 = 1.69.