Tests of Independence with Incomplete Contingency Tables Using Likelihood Functions

Kang (2006) used the log-likelihood function with Lagrangian multipliers for estimation of cell probabilities in two-way incomplete contingency tables. The constraints on cell probabilities can be incorporated through Lagrangian multipliers for the likelihood function. The method can be readily extended to multidimensional tables. Variances of the MLEs are derived from the matrix of second derivatives of the log likelihood with respect to cell probabilities and the Lagrange multiplier. Wald and likelihood ratio tests of independence are derived using the estimates and estimated variances. Simulation results, when data are missing at random, reveal that maximum likelihood estimation (MLE) produces more efficient estimates of population proportions than either multiple imputation (MI) based on data augmentation or complete case (CC) analysis. Neither MLE nor MI, however, leads to an improvement over CC analysis with respect to power of tests for independence in 2×2 tables. Thus, the partially classified marginal information increases precision about proportions, but is not helpful for judging independence.


Introduction
It may happen that some observations are not fully cross-classified when forming contingency tables from two or more categorical variables.Complete-case (CC) analysis discards cases with missing data, thereby restricting analysis to only counts with fully observed variables used for cross classification into a contingency table.
An alternative approach involves constructing a complete table, in which all cases are completely classified, by imputing information for the missing classification dimensions.Multiple imputation, proposed by Rubin (1978; see also Rubin 1987Rubin , 1996)), provides a way to take advantage of commonly used tests of independence for completely classified tables.Li et al. (1991) proposed a Wald test statistic and Meng and Rubin (1992) proposed likelihood ratio tests with F reference distributions.In addition to such tests of independence, one can estimate joint probabilities and their standard errors using sets of, say, five imputed contingency tables obtained from data augmentation.Finch (2010) has studied some methods of imputation for categorical data.In this paper, we consider multiple imputation of categorical values using a Jeffrey's prior distribution on the unknown cell probabilities.
Instead of imputing information for missing classifications, maximum likelihood estimates of population proportions can be obtained from the observed information, including both the completely and partially classified cases.Little (1982) used a simple EM algorithm to estimate cell probabilities.Lipsitz et al. (1998) show how to use generalized linear model software to evaluate maximum likelihood estimates of cell probabilities using the connection between the multinomial and Poisson likelihoods.Molenberghs and Goetghebeur (1997) present estimation methods using the observe data log likelihood directly.In their approach, they use Fisher scoring, Newton-Raphson, and other algorithms.They cite McCullagh and Nelder (1989) for their algorithms, and this source does not utilize Lagrangian multipliers.Instead, they incorporate the constraints on probabilities directly into the likelihood function.
Maximum likelihood estimates of population proportions can be obtained from the partial log-likelihood function proposed by Chen and Fienberg (1974) for the cell probabilities of two way incomplete contingency tables.Kang (2006) proposed the partial log-likelihood function with a Lagrangian multiplier incorporating the constraint on cell probabilities.
Both the MLE method and the MI method should be appropriate when data are missing completely at random (MCAR) or missing at random (MAR) in the sense of Rubin (1976).The complete case method can cause bias for estimating cell probabilities when data are in fact MAR.
Less computation is required to get the matrix of second derivatives of the loglikelihood function with respect to cell probabilities when a Lagrangian multiplier is used than when it is not used.In this paper, variances of MLE estimators of population proportions are derived from the matrix of second derivatives.Wald test of independence are derived using the variances of MLEs.Likelihood ratio tests of independence are derived based on the likelihood function evaluated at the MLEs.
Section 2 presents notation and the likelihood function.Section 3 reviews MLEs and their variances proposed by Kang (2006).Section 4 derives tests of independence for incomplete contingency tables.The performance of tests of independence provided by the complete-case analysis, multiple imputation, and maximum likelihood approaches are examined through Monte Carlo simulation studies in Section 5 considering both type I error level and power.Section 6 contains a summary.

Notation and Likelihood Function
Consider an I × J contingency table where the row factor X 1 has I categories and the column factor X 2 has J categories.Assume simple random sampling with replacement.In a complete table, where the row and column categories are observed for every case in the sample, the counts have a multinomial distribution with sample size N and probability vector θ.Let θ ij , an element of θ, denote the population proportion for the (i, j) cell.
When information on either the row or column classification is missing, we can construct a table of counts for the completely classified cases where x ij denotes the number of cases observed in the (i, j) cell.We can also construct one-way tables of counts for partially classified cases.Let x i m denote the number of cases in the i th row category, i = 1, 2, • • • , I, where the column category is unknown, and let x mj denote the number of cases in the j th column category, j = 1, 2, • • • , J, where the row category is unknown.Then, x im and x mj are marginally observed counts on a single variable.Let x mm denote the number of cases where both the row and column categories are missing.The total sample size is Discarding the x mm cases for which both variables are missing does not affect any results in this paper except that it necessitates changing N to n = N − x mm .Those cases do not contain any information about the joint distribution or marginal distributions of X 1 and X 2 .
The log-likelihood function for the cell probabilities θ presented by Chen and Fienberg (1974) is the following; Note that the log-likelihood function in (2.1) does not include x mm .In the general case, there are r variables, r be the proportion and count, respectively, in cell (a, b, • • • , r).The complete-data log likelihood for a multinomial model is When some variables are missing, partially classified counts appear in the observeddata log likelihood multiplying the log of aggregated probabilities for the corresponding cells, as in equation 2.1.In the case that r = 3, the log likelihood of this sort with the largest number of terms, ignoring the x mmm cases with all three variables missing, is x abc log θ abc + a b x abm log θ ab•

Maximum Likelihood Estimation of Cell Probability and Variances
The EM algorithm (Dempster, Laird and Rubin, 1977) can be used to get maximum likelihood estimates (MLEs) of proportions in an incomplete contingency table.In the case of an I × J table, let θ (0) ij be an initial estimate of θ ij , such as x ij /x cc .The estimate of θ ij at the t th iteration of the algorithm is The algorithm converges to the MLEs of θ.Little (1982) presented examples of the algorithm for 2×2 tables.In the general case of multidimensional tables, the EM algorithm was described by Fuchs (1982).See also Little and Rubin (2002) and Schafer (1997).
The methodology to produce variances of MLEs (Kang, 2006) is reviewed in this section.Since the proportions are constrained to sum to one ( ij θ ij = 1), the likelihood function (2.1) incorporating the constraint can be expressed with a Lagrangian multiplier as where θ * = (θ , γ) .
The first derivative of l(θ * ) in (3.2) with respective to θ ij and to γ are The second partial derivatives are Let I(θ * ) be the matrix of second derivatives of the log likelihood with respect to θ * .An estimate of the covariance matrix of θ * , the MLE of θ * , is −I −1 ( θ * ).Let ΣM be a matrix omitting the last row and column of −I −1 ( θ * ).The matrix ΣM gives an estimate of the covariance matrix of θM , where θM is the MLE of θ.For a 2 × 2 table, θ * = (θ 11 , θ 12 , θ 21 , θ 22 , γ) and I(θ * ) is , .
The likelihood function with a Lagrangian multiplier and corresponding derivatives for a three-way table are given in the Appendix.

Tests of Independence
A Wald test and a likelihood ratio test for independence in an incomplete two-way contingency table are described in this section.Extensions for threeway tables also are described.

A Wald Test Using MLE in an Incomplete Contingency Table
For a complete two-dimensional contingency table with sample size N , the null hypothesis of statistical independence is for all i and j. (4.1) Defining the null hypothesis of statistical independence can be expressed as Then (4.2) can be expressed as H 0 : g(θ) = 0 ¯.For a complete three-way table with sample size N , the null hypothesis is , for all a, b, and c. (4.3) Defining the null hypothesis can be expressed as H 0 : g abc = 0 for all a, b, and c, ).The estimator, θ, has an approximate multivariate normal distribution with variance Σ = V θ /N by the Central Limit Theorem, where V θ = (∆ θ − θθ ) and ∆ θ is a diagonal matrix with the elements of θ on the main diagonal.Under H 0 , for a two-dimensional table, g( θ) has an approximate p = I × J dimensional normal distribution with variance GΣG , where G p×p is the matrix of first partial derivatives of g(θ) with respect to θ ij .The elements of G p×p are A Wald statistic for testing where T = ( Ĝ Σ Ĝ ) is obtained by substituting θ for θ and − denotes generalized matrix inverse.For a complete table, Q has a distribution converging to a central chi-squared distribution with df = k = (I − 1)(J − 1) when H 0 is true.
Results for a three-dimensional table are analogous.Under H 0 , g( θ) has dimension p = A × B × C. The elements of G p ×p depend on the overlap in dimensions between g abc and θ ijk .Examples are given below with overlap in three, two, one, or zero dimensions: For an incomplete contingency table, g( θ) and Ĝ are obtained by substituting θM for θ.Then T = ( Ĝ Σ Ĝ ) is obtained by substituting ΣM for Σ in (4.4).Thus a Wald statistic for testing where TM = ( Ĝ ΣM Ĝ ) and Ĝ is obtained by substituting θM for θ.Then for a incomplete two-way table, QM has an approximate central chi-square distribution with df = k when H 0 is true.For an incomplete three-way table the analogous statistic has df = k when H 0 is true.

Likelihood Ratio Test
Under the model of independence in (4.1), the MLE of θ ij is Let L 0 be the maximized value of the log-likelihood function under the null hypothesis of (4.1) and L 1 be the maximized value under H 0 H a .L 0 can be calculated directly by substituting θ0 ij for θ ij in (3.2) and L 1 can be obtained by using θM .Then 2L 1 − 2L 0 has a limiting null chi-squared distribution as n goes to infinity when H 0 is true.A size α test is implemented by rejecting k,α .For a three-way table, under independence, the MLE of θ abc is , where k = (A − 1)(B − 1)(C − 1) and L 0 and L 1 are defined analogously to how they were defined for two-way tables.

Simulation Comparing Methods
The performance of tests of independence on incomplete two-way tables using maximum likelihood estimation (MLE), multiple imputation (MI), and complete case analysis (CC) are compared through Monte Carlo simulations.Two missing data mechanisms corresponding to the missing at random (MAR) assumption are used in simulations.Type I error levels are estimated from 1,000 tables simulated tables under the independence assumption.Power levels are examined by simulating 1,000 tables under an alternative to independence.
For multiple imputation, the Wald statistic proposed by Li, Raghunathan, and Rubin (1991) and the likelihood ratio test statistic proposed by Meng and Rubin (1992) based on five imputed data sets were applied to test independence.The algorithms for MI were programmed through S-PLUS 6.1 (2001) functions for missing values.

Type I Error Levels
The 2 × 2 incomplete contingency tables generated for this study to check type I error level were generated with equal cell probabilities and data missing at random (MAR).X 1 and X 2 were independently generated as Bernouli(0.5)random variables with sample size 500.
There are two cases with different missing at random mechanisms; 1,000 tables were generated for each case.The missing mechanism for each case is as follows: For the first case m 1 = 0.1, m 2 = 0.3, m 3 = 0.2 in (5.1).In the second case m 1 = 0.2, m 2 = 0.4, m 3 = 0.3.The percentages of cases with missing information on at least one variable are expected to be 36% and 51% for case 1 and 2 respectively.Table 1 shows the numbers of tables for which the independence null hypothesis was falsely rejected out of 1000 tables for three nominal Type I error levels.The results using MLE and CC seem to have appropriate Type I error levels on both tests, but Type I error levels tend to be inflated for the MI method.

Power Study
An alternative to independence for 2 × 2 tables with equal probability margins was used to compare the power of the various procedures.The generated multinomial variables have the cell probabilities (θ 11 , θ 12 , θ 21 , θ 22 ) = (0.2, 0.3, 0.3, 0.2), (5.2) with sample size 500.The missing data mechanisms (5.1) are same as for the two cases used previously.Before we study power of independence test, let's compare three methods with checking point estimations of θ 11 and θ 1+ θ +1 − θ 11 .Table 2 shows means and standard deviations of 1,000 values for the estimates of θ 11 from the generated 1,000 tables in this subsection.The true value of θ 11 is 0.2.MLE and MI methods provide essentially unbiased estimates for the cell probabilities but CC does not.The standard deviations of the estimates differ across methods.Complete-case analysis provides the estimate of θ 11 with the largest variance.For all methods, variation increases as the proportion of missing values increases.MLE tends to provide smaller standard deviations of cell proportion than MI.Table 3 shows means and standard deviations of 1,000 simulated values for the estimates of θ 1+ θ +1 − θ 11 , a measure of association between the two variables.The true value of θ 1+ θ +1 −θ 11 is 0.05.The averages of the estimates are similar for all methods.The complete-case and MLE have similar standard deviations and they exhibit smaller standard deviations than MI.Results on point estimation are helpful in interpreting power simulation results.The numbers in Table 4 indicate the number of tables out of 1,000 for which the independence null hypothesis was rejected under the given α levels among 1,000 tables in each type.
Table 4 shows MLE and CC have more power than MI.MLE does not show much improvement on the power levels of the tests of independence over completecase analysis.Although MLE is often more conservative than MI with respect to Type I error levels, the test using MLE exhibited more power than MI.

Summary
It has been an issue to estimate the variance of MLE for the cell probabilities of an incomplete contingency table because it is very complicated to get the second derivatives of the likelihood.The likelihood including a Lagrangian multiplier related to a constraint can solve this problem.
Complete-case (CC) analysis produces biased estimates of joint probabilities under MAR and is less efficient than either MLE or MI.MLE and MI provides consistent results under either MAR situation used in simulations.
When data are missing at random, simulation results reveal that MLE provides more efficient estimates of population proportions than either multiple imputation (MI) based on data augmentation or complete case analysis, but neither MLE nor MI provides an improvement over complete-case (CC) analysis with respect to power of tests for independence.
If the missing mechanism does satisfy missing completely at random (MCAR) criterion, CC analysis can produce unbiased estimates of joint probabilities and moderate type I error level and power of tests for independence.

Appendix: Derivatives for 3-Way Tables
The log likelihood for a three-way table was given in (2.2).In the case of a three-way table, the log likelihood function incorporating a Lagrangian multiplier is l(θ * ) = l(θ) + γ(1 − The second partial derivatives for the proportions are determined by the overlap in the three-tuples identifying the parameters.Below are illustrations for overlap of three, two, one, and zero dimensions.
and b = j, I i=1 θ ib , for a = i, and b = j, J j=1 θ aj , for a = i, and b = j, 0, for a = i, and b = j.

Table 1 :
Comparison of Type I error levels