Linear Information Models: An Introduction

Relative entropy identities yield basic decompositions of categorical data log-likelihood. It naturally leads to developing linear information models in contrast to the hierarchical log-linear models. A recent study by the authors clari(cid:133)ed the principal di⁄erence in the data likelihood analysis between the two linear models. The proposed scheme of log-likelihood decomposition introduces a prototype of linear information models, with which a basic scheme of model selection can be formulated in accordance. Empirical studies with high-way contingency tables are exempli(cid:133)ed to illustrate the natural selections of informaiton models in contrast to the hierarchical log-linear models.


Introduction
Analysis of contingency tables with multi-way classi…cations has been a foundational area of research in the history of statistics. From testing hypothesis of independence in a 2 2 table (Pearson, 1904;Yule, 1911;Fisher, 1934;Yates, 1934) to testing interaction across a strata of 2 2 tables (Bartlett, 1935), many discussions had emerged in the literature to build up the foundation of statistical inference of categorical data analysis. In this vital …eld of applied statistics, three closely related topics were gradually developed and yet, have so far been theoretically incomplete for over half a century.
The …rst and primary topic is born out of the initial hypothesis testing for independence in a 2 2 table. From 1960's till 1980s exact test repeatedly received criticism for being conservative by its discrete nature (Berkson, 1978;Yates, 1984). Although the arguments in favor of the unconditional tests essentially turned to using the unconditional exact tests in the recent two decades, the reasons for preferred p values and sensitivity of the unconditional tests have not been assured in theory. In this respect, a recent approach via information theory proved that power analysis of the unconditional tests are not for testing independence, but simply for equal and unequal binomial rates. Consequently, the longterm ambiguous criticism on the exact test is …nally proved to be logically vacuous (Cheng et al., 2005a). In retrospect, it is seen that extended hypergeometric distributions can not be applied to power evaluations at arbitrary alternative 2 2 tables (Fisher, 1962).
Extending to several 2 2 tables, Bartlett (1935) addressed the topic of testing interaction and derived an estimate of the common odds ratio. Norton (1945), Simpson (1951) and Roy and Kastenbaum (1956) discussed interpretations of interactions and showed that Bartlett's test is a conditional maximum likelihood estimation (MLE) given the table margins. For the same data, the celebrated CMH test for association (Cochran, 1954;Mantel and Haenszel, 1959) has been applied with numerous citations to the …elds of biology, education, engineering, medicine, sociology and psychology. However, it was implicit that an inferential ‡aw lies in the estimating-equation design of the CMH score test (Woolf, 1955;Goodman, 1969;Mellenberg, 1982). In addition, the probability at alternate interactions given the observed data, that is, the power analysis at alternatives to the null, has never been discussed in the literature. A remedy to such statistical inference was recently provided by an analysis of invariant information identities (Cheng, et al., 2005b). The solution, as an extension of the power analysis for a single 2 2 table, will also be useful for testing hypothesis with high-way contingency tables, which is the topic of this study to be discussed below.
Analysis of variance (ANOVA, Fisher, 1925) inspired discussions of partitioning chi-squares within the contingency tables (Mood, 1950;Lancaster, 1951;Snedecor, 1958;Claringbold, 1961). It inspired in turn the development of log-linear models (Kullback, 1959;Darroch, 1962;Lewis, 1962;Birch, 1964;and Goodman,1964). Hierarchical log-linear models were thereby formulated to analyze general aspects of contingency tables (cf. Goodman, 1970;Bishop et al., 1975), and since then, have been widely used in the literature ( Hagenaars, 1993;, Christensen, 1997;Agresti, 2002). A drawback of inference with the test statistics by Lancaster, Kullback and Claringbold was remarked by Plackett (1962). It was recently found that a ‡aw of inference also exists with a likelihood ratio test for association (Roy and Kastenbaum, 1956) and, another for testing interaction (Darroch, 1962); and again, the data likelihood identities provide appropriate explanations (Cheng, et al., 2006). These data information identities also indicate that analysis of variance need not be designed and used to measure deviations from uniform association, or varied interactions between the categorical variables, which are simply de…ned by likelihood factorizations. A prototype of the linear information models will be formulated below and compared to the hierarchical log-linear models.
The basic log-linear models in three variables will be reviewed in Section 2, where the notations and parameters de…ned by ANOVA decompositions may require careful interpretations. Next, information identities of three-way tables are fully discussed to characterize the corresponding linear information models, which may di¤er from the log-linear models only in some representations. In Section 3, the prototypes of information models with four-way and high-way tables begin to indicate the essential di¤erence from that of the log-linear models, in particular, an elementary scheme of model selection can be formulated with easily justi…ed tests of signi…cance. Section 4 will provide empirical study of four-and …ve-way data, which have been analyzed with log-linear models in the literature. Comparisons between the log-linear models and the proposed information models will be discussed, and obvious advantages over the log-linear modeling are easily shown through the information models selection. In conclusion, remarks on criteria of information models selection are noted for further useful research.

Elementary Log-likelihood Equations
The representations of data log-likelihood can be formulated in various ways, depending on the methods of likelihood decomposition. There are numerous ways of decomposing the data likelihood with high-way tables. It is elementary and instructive to discuss the case of three-way tables, which allow only a few di¤erent expressions of log-likelihood equations. The three-way log-linear models is …rst reviewed.

Basic Log-linear Models
Suppose that individuals of a sample are classi…ed according to three categorical variables fXg, fY g, fZg with classi…ed levels: i = 1; :::; I; j = 1; :::; J; k = 1; :::; K; respectively. Denote the joint probability density by f ijk (= f (i; j; k)) = P (X = i; Y = j; Z = k); (2.1) where P ijk f ijk = 1. The full (saturated) log-linear model (Goodman, 1970;Bishop, et al., 1975) is de…ned as The full model can be reduced to be a submodel of no three-way interaction, or zero conditional association, which is denoted by (XY; Y Z; XZ). The model permits all three pairs to be conditionally dependent, that is, no pair is conditionally independent (Agresti, 2002); and, the corresponding loglinear model is formulated as In case, exactly one pair of factors is conditionally independent given the third factor, say, fX; Zg given Y , then the model is expressed as With two pairs of conditionally independent factors, model (2.4) reduces to independence between fX; Y g and Z, denoted (XY; Z); and written as The …nal reduction to three pairs of conditional independence is denoted by (X; Y; Z), and expressed as Indeed, equation (2.6) de…nes the mutual independence between the three factors, that is, zero mutual information (Cheng et al., 2006). Equation (2.3) obtains from (2.2) by checking the magnitude of the three-way interaction using the iterated proportional …tting (Deming and Stephens, 1940). Equation (2.4) is however not derived from (2.3), but directly computed from (2.2) by deleting the conditional association between fX; Zg given Y , which is the sum of the unique three-way interaction and the conditional uniform association between fX; Zg given Y . This will be further explained using equation (2.12) below. The parameters in the log-linear models, besides the normalizing constant ; are scaled log-likelihood ratios, or the logarithmic odds ratios. It is obvious that the one-way and two-way parameters in each of the models (2.4) to (2.6) are statistically independent using disjoint data information. Such a desirable property is lost between the two-way and three-way terms in the models (2.2) and (2.3), which indicates an intrinsic di¤erence in the interpretation of these log-linear models.

Basic Information Models
It may be useful to take a di¤erent look at the formulation of hierarchical log-linear models, in order to understand the intrinsic di¤erence between dependent and independent likelihood decompositions. Basic identities of data likelihood are examined, and a prototype of linear information models in three variables will be considered. De…ne the marginal probabilities as (2.7) and analogously, f j ; f k ; f jk ; f i k . Also, let n ijk denote the number of individuals classi…ed to the cell fX = i; Y = j; Z = kg; and similar notations n = n ; n i = n i ; and n ij ; denote the total cell frequency and marginal totals, respectively. Some convenient terminology will be borrowed from our previous study (Cheng, et al., 2006, Section 2). The entropy identity in three variables is expressed as where the marginal entropy, joint entropy, and the mutual information are de…ned respectively to be The last entry of (2.9), the mutual information, is the Kullback-Leibler divergence between the joint density and the product marginal density of (X; Y; Z): The sample analogs of the entries in (2.9), the sample entropy and the sample mutual information, are de…ned in terms of the observed cell frequencies, which are the natural maximum likelihood estimates under the general model of multinomial distribution. For example, the sample analog of the mutual information I(X; Y ; Z) yields the likelihood ratio test statistic X i;j;k n ijk log n ijk n 2 n i n j n k ; (2.10) such that twice of (2.10) is asymptotically chi-squared distributed with d.o.f. IJK (I + J + K) + 2 under the null hypothesis of mutual independence. It is easily seen that data log-likelihood admits three equivalent information identities, which are the three possible saturated information models for a three-way table, Being saturated models, the equations of (2.11) are di¤erent expressions of the log-linear full model (2.2). All three equations are reduced to model (2.3), if the common interaction term Int(X; Y ; Z) in the equations, say, is removed (equal to zero). The remaining terms are not the same, for example, I(X; ZjjY ) characterizes the uniform association between X and Z; within the levels of Y: The unique three-way interaction, Int(X; Y ; Z) of (2.12), characterizes exactly the three-way term XY Z ijk of (2.2), and the latter expression is unique up to the normalizing constant .
The special case of 2 2 K table, the simplest three-way table, has been a much disucussed topic in the literature. It was recently proposed that the two summands of (2.12) form a natural two-step likelihood ratio (LR) test, where the …rst step LR statistic, Int(X; Y ; Z); tests for no interaction between X and Z; across K strata of Y ; and the second, I(X; ZjjY ); tests for no association between X and Z; within strata of Y (Cheng et al., 2005b). Logically, the second step is in use only when the …rst step is insigni…cant. The two-step LR test for no association across strata Y is shown to be asymptotically unbiased and more powerful than the one-step omnibus LR test, the conditional mutual information (CMI) I(X; ZjY ) of (2.12), or Pearson's chi-square test. Likewise, the two-step LR test improves over the combination of the score tests, the Breslow-Day test (1980) and the CMH test (1954,1959).
If the omnibus CMI I(X; ZjY ) is insigni…cantly small, then the conditional odds ratios between X and Z; across the levels of Y , are close to 1, and the …rst equation of (2.11) may be reduced to the sum I(X; Y )+I(Y ; Z) which is model (2.4). Otherewise, this CMI is signi…cantly large, in which the two independent summands, Int(X; Y ; Z) and I(X; ZjjY ); can be individually signi…cantly large or small, with four possible combined cases. In this case, if the test for no interaction, Int(X; Y ; Z) = 0, is rejected, it is logically consistent with the signi…cant omnibus test; and, there is no need to perform the second step test; and clearly, it is insu¢ cient to use only the second-step test for testing no association between X and Z; across Y . If the interaction is insigni…cantly small, then the secondstep test statistic I(X; ZjjY ); also called the generalized CMH test (Agresti, 2002, p.325), is usually expected to be signi…cantly large, that is also consistent with the signi…cant omnibus test. Here, with a rare chance it may happen that both Int(X; Y ; Z) and I(X; ZjjY ) are insigni…cantly small, whereas the omnibus CMI is marginally signi…cant; and, it is theoretically rare that the two-step test is not su¢ ciently sensitive. It is understood that such signi…cant or insigni…cant tests are de…ned with a common …xed level of signi…cance for each approximating chi-square distribution; and, it follows from (2.12) that four combinations of signi…cance and insigni…cance patterns between the three LR tests are meaningful in practice. These information with varied statistical inference are not directly revealed between models (2.2), (2.3) and (2.4) as mentioned above. The useful fact, that equations (2.11) disallow the inclusion of all three two-way terms, is not clari…ed in the discussion of hierarchical log-linear models.
It is perceivable that extensions of identity (2.11) to high-way contingency tables will enhance interpretations of high-way e¤ects and manifest further di¤erences from the conventional log-linear models. The goal of this study is to make a natural and systematic extension of equation (2.11) to high-way contingency tables, which are coined linear information models. The di¤erence between the proposed information models and the log-linear models will be illustrated using four-way and …ve-way data tables that have been analyzed in the literature. In the sequel, a prototype of the linear information models, de…ned by simple criteria of model selection, will be introduced to o¤er advantage of statistical inference beyond the conventional log-linear models.

Information Structures of High-Way Tables
The natural extensions of equation (2.11) consist of two major formulae for each saturated model of a K-way contingency table. The primary formula is that the log-likelihood decomposition is the sum of K main e¤ects, (K 1) two-way (MI) e¤ects, (K 2) three-way (CMI) e¤ects, ..., two (K 1)-way CMI, and one K-way CMI. The secondary formula is that each t-way CMI (t K) is a mutual information between a pair of variables, conditioned upon other t 2 variables. A t-way CMI is always a sum of the t-way interaction and the uniform association between the pair of variables, conditioned on the remaining t 2 variables. It is plain that both formulae of data log-likelihood decompositions are built upon the notion of mutual information, not in terms of ANOVA as de…ned with the log-linear models.

A Four-Way Information Structure
It is observed from equation (2.11) that a common variable appears in each term of the decomposed three-way mutual informtion, and it is used as the conditioning variable (termed CV). This is not necessarily required of a four-way or a high-way table, however, the use of such a CV is al-ways applicable and particularly useful. When any variable is of primary interest, like the response variable in a linear regression model, it is used as a CV for …nding its relations to the remaining variables. In case of four variables, the analog of (2.8) is the entropy identity (3.1) And, an analog of (2.11) among many equivalent identities may be expressed as The last equation of (3.2) follows by using (2.11) with the variable Y as the CV. If there is no special CV of interest, then either a di¤erent identity of (2.11) may be used to express I(W ; X; Y ); or another identity such as I((X; Y ); Z) = I(X; Z) + I(Y ; ZjX) can be used, in the second equation of (3.2). For example, the last statement simply leads to di¤erent non-CV saturated models such as It is worth noting with these saturated models that all the variables must appear at least once in the main e¤ects, and also among the two-way MI terms, the three-way CMI terms, and a speci…ed four-way CMI, respectively. For a three-way contingency table, exactly one of three CV models may be used according to equation (2.4); and, for a four-way table, there are exactly six distinct saturated models for each …xed CV, which yields 24 distinct saturated CV models. If both CV and non-CV models are included, the total number of saturated information models in four variables would be 72 = (4!) 3:

Elementary Model Selection Schemes
Without loss of generality, the information model selection scheme is illustrated with a four-way contingency table. The question is how to select a meaningful and parsimonious model among those seventy-two candidate linear information models. A selection scheme based on equation (3.2) using a CV is …rst outlined below. This is organized as a four-step procedure for ease of exposition.
Step 1 : Select the CV, say Y , either because it is a variable of focus; or, among the four variables fW; X; Y; Zg, Y yields the maximal signi…cant (in p value, for a …xed nominal level) sum of three two-way e¤ects, say, twice the sample analog of I(W ; Y ) + I(X; Y ) + I(Y ; Z).
Step 2 : Find the maximal insigni…cant (in p value) among the insigni…cant four-way CMI (between certain two variables, conditioned upon the chosen CV, Y; and the remaining variable). Otherwise, choose the minimal signi…cant, when the three available four-way CMI are all signi…cantly large. Suppose the chosen four-way CMI is I(W ; ZjX; Y ), then, the two candidate three-way CMI, I(W ; XjY ) and I(Z; XjY ), are directly obtained by reading the appeared variables from the chosen four-way CMI. Since the three two-way (MI) terms have already been selected in Step 1, a saturated CV information model selection in the four-variable case is essentially complete. In other words, each variable must appear at least once among the two-way terms of a saturated model before parsimonious selection, whether a common CV is in use or not.
Step 3 : To con…rm the selected MI and CMI terms, each one of the ten selected terms (including four main-e¤ect terms) must be individually and separately tested against the same nominal level, say, 0.05 in common practice. The sum of all the insigni…cantly (small in p value) terms, among the ten terms, is taken as the tentative remainder which is then tested against the total (chisquare) d.f., to the same nominal level. This step is asymptotically correct by the orthogonal likelihood decomposition. If this test is insigni…cant, then the tentative remainder is insigni…cantly small and deleted as an insigni…cant residual, so that the tentative model is accepted. Otherwise, the remainder is signi…cantly large (near or over a 95th percentile of the associated chi-square distribution) and the tentative model may be lack of su¢ cient information. In this case, a simple remedy is recommended. A maximal insigni…cant (or, a minimal signi…cant) high-way CMI term can be replaced by its next term, the second-maximal insigni…cant (or, second-minimal signi…cant) CMI; and then, the same selection procedure is continued with the renewed model modi…ed in Step 2. This remedy as a supplementary scheme is easily used in practice, because it can always choose the next insigni…cant (signi…cant) term whenever needed, from high-way to low-way subtables, while modifying the information decomposition.
Step 4. To conclude a parsimonious model selection after performing the above three steps, it is often of extra interest, though not necessary, to test against the summands of each selected CMI term, an interaction term and a conditional uniform association term (cf. 2.12), under the same nominal level. Finally, the new remainder is likewise tested to be a possible residual, as shown in Step 3, to yield perhaps a most parsimonious model.
It is understood that a general scheme may select either a CV model (3.2), or a non-CV model (3.3). If there is no need to …x a CV of particular interest, Step 1 is bypassed, and Step 2 is generalized without requiring a …xed CV throughout the selection scheme; and, Steps 3 and 4 are kept unchanged. Thus, it is expected that such a general scheme often results in selecting a non-CV model, more balanced in selecting the variables among the CMI terms.
On the other hand, there are alternate ways of selecting a model like (3.2) or (3.3). As an alternative choice in Step 2, selecting a minimal insigni…cant high-way CMI term may be preferred, if any; otherwise, select the minimal (or maximal) signi…cant high-way CMI term when all such CMI terms are signi…cantly large. These alternatives to Step 2 may sometimes yield less high-way CMI terms compared to the original Step 2, particularly with high-way tables. However, they usually lead to selecting more terms at end, and sacri…ces model parsimony. It is remarkable that a principal idea behind formulaing models (3.2) and (3.3) is to delete more high-way CMI terms e¤ectively for model parsimony, compared to the deletion of high-way interaction terms as a common practice in the selection of a log-linear model. This will be illustrated below in Section 4, in particular, Example 4.1. The above three-step or four-step model selection scheme can be easily extended to high-way contingency tables. An application to …ve-way data table will also be illustrated in Section 4.

Applications to High-Way Tables
Four-way and …ve-way contingency data tables in the literature will be examined. The proposed information modeling and the four-step model selection scheme of Section 3 will be applied to both four-way and …ve-way data below. It is entertaining that the proposed method yields easily interpretable and more parsimonious models compared to those obtained from the hierarchical loglinear modeling.

Example 4.1. A Four-Way Contingency Table
A 3 3 3 3 four-way data frequency table (Agresti, 2002, Table 8.19) is exempli…ed for a comparison study. The data consist of three-level individual's opinions on each of four variables of government spending. These 81 cell frequency counts are listed according to the three levels: "too little", "about right" and "too much", de…ned with the variables: environment (E), health (H), big cities (C), and law enforcement (L). The basic analysis of this data by log-linear modeling leads to the accepted model: deletion of the four-way interaction and all the three-way e¤ects, that is, …tting the data to the summary of the four main e¤ects and all the six two-way e¤ects (Agresti, 2002, Table 8.20). It is so …tted with the log-linear modeling because the deviance between this …tted model and the full model is evaluated to be 31:67, which approximately equals to the 35th percentile of the chi-square distribution with 48 d.f. Readers may refer to subsequent estimation of odds ratios parameters to the log-linear model …tting with all two-way e¤ects.
The information modeling begins with …nding the most relevant variable to be the CV, according to Step 1 of the selection scheme as illustrated in Section 3.2. It is found that the variable health (H) yields the greatest signi…cant sum of two-way e¤ects, which is 61:87 to the chi-square distribution with 12 d.f. It then easily follows by Steps 2 and 3 to …nd that the only signi…cant terms in the information equation (3.2), using Y = H and X = E, are the two-way e¤ects, the deviances 2I(C; H) = 28:74 and 2I(E; H) = 24:18: By Step 4, it can be easily checked that the deviance, the sum of the residual insigni…cant CMI's, is 71:59, which is close to the 76th percentile of the chi-square distribution with 64 d.f. A complete selection scheme in accordance with equation (3.2) for the four-way table is summarized in Table 1 below. Thus, a prototype of information model selection by equation (3.2) is tentatively concluded with the information model: "four main e¤ects plus the two-way e¤ects fCH; EHg", with the total …tted d.f.= 16; that is only half of 32, the …tted log-linear model of all two-way e¤ects. This exhibits a basic advantage of information modeling that it usually selects a more parsimonious model with more concise and simpler explanation compared to the classical log-linear modeling. In case another variable is of interest, say, Environment (E), which may be closely related to budget spending and worth an investigation, then, the variable (E) can be taken as the CV. It also follows by equation (3.2) and the same selection scheme that a similar model is selected: "the main e¤ects, plus the two-way e¤ects fCE; EHg", for which the remainder deviance is 81:98, close to the 93.5 percentile of the chi-square distribution with 64 d.f. This provides a similar CV model selection and interpretation. It can be easily checked from this four-way data that the other two categories, cities (C) and law enforcement (L), may not be considered as useful CV, because the selected models will include at least one three-way CMI term, in addition to the two-way terms and main e¤ects.
In case no particular variable is …xed as CV, it is checked by equation (3.3) and the same selection scheme (omitting Step 1) would lead to a non-CV model: "fCE; HLg, plus the four main e¤ects", for which the remainder deviance is 77:41, close to the 88 percentile of the chi-square (64 d.f.) distribution. This provides another equally parsimonious information model in which all the four variables share the two-way e¤ects together in a pair of two-way terms, instead of all the six two-way terms as used in the selected log-linear model (Agresti, 2002).
The conditional odds ratio parameters, interval estimates, and the deviances of the above three selected information models can be computed along with the selection schemes. For brevity, these calculations are not discussed here for each selected information models, and the readers may contact the authors for details.

Example 4.2. A Five-Way Contingency Table
A data of cross-classi…cation of individuals according to …ve dichotomized factors had been studied in six publications prior to Goodman (1978, p.112, Table 1). The purpose was to understand the association relationship between the knowledge (good or poor, denoted by K) of cancer, and the presence or absence of the other four qualitative attributes: L(= lecture), R(= radio), N (= newspaper), and S(= solid reading). The factor of primary interest is the "knowledge K", and hypotheses about the logits of K, plus estimates of the hypothesized e¤ects were examined by Goodman (1978, Tables 5 to 7) using hierarchical log-linear models. However, it is so far unknown in the literature whether there are speci…c criteria or schemes of selecting a de…nite, or tentatively entertained, parsimonious log-linear model, for the present …ve-way data that had been much discussed prior to Goodman (1978). It is thus worth investigating the proposed selection scheme of linear information models, in contrast to the hierarchical log-linear modeling, for the current …ve-way data.
According to Step 1 of Section 3.2, it is found that factor K has the maximal two-way e¤ect with the other variables, which evidences that it was a useful study design. Let factor K be the CV, a saturated information model can be derived according to Step 2 as follows. For the …ve-way data, it is notable that the selected model (4.2) treats the variable (Knowledge, K) as a response variable. To summarize the data analysis, Table 2 exhibits the component MI and CMI values of the overall mutual information, in which most are signi…cantly large, except two high-way CMI terms; and, the selection scheme con…rms only a slight reduction of two CMI terms by equations (4.1) and (4.2). This presents a case that the variables de…ned in the study are highly associated, and very little information reduction is possible, although information dissemination by Table 2 yields model (4.2).
As a supplementary note, the possible choices of …ve-way information models may be estimated like the previous discussion of four-way models based on equations (3.2) and (3.3). Equation (4.1) allows …ve ways of separating one variable from the other four, which includes a four-variable model (3.2) as part of the whole model. Thus, the number of saturated …ve-way CV models is 720 = 150.45 1 < 0:001 5! 3 2; and, the number of all saturated …ve-way models is 5! (4! 3) = 25920. Extensions of equations (3.2), (3.3) and (4.1) to multi-way contingency tables appear to be straightforward.

Concluding Remarks
A short summary of the proposed linear information models and the selection scheme in Section 3 can be organized with a few remarks. The primary purpose of developing the linear information models is to recommend the natural factorization of the raw data likelihood, without additional operations with data. The basic information identities of Section 2 are used to illustrate the advantages of using orthogonal information decomposition, directly with the observed data likelihood, but not through the adapted ANOVA. A basic drawback of the latter, the disadvantage of inevitably crossed and overlapped data information in the summands of the log-linear models, can be especially intricate with high-way tables, for which a three-way case was illustrated in the recent literature (Cheng et al., 2006). Essentially, the classical approach invalidates the development of useful selection schemes among the hierarchical log-linear models.
The important advantages of linear information models are based on direct use of the data likelihood identities as illustrated in Section 3 and exempli…ed in Section 4. The proposed model selection schemes, either using a CV or not, are naturally born out of the data likelihood. It is the simplest method based on comparing data deviances of any possible remainder terms against appropriate chi-square distributions. Thus, the proposed model identi…cation and selection schemes o¤ers a fundamental likelihood analysis with observed data. While useful information model selections still depend on interpreting the data through the choice of certain CV (or no CV) by the experimenter, it is understood that no best or uniquely optimal model can be de…ned whichever selection criterion may be designed. Given a natural selection criterion, such as the current proposal, it may take further study to de…ne optimal model parsimony and selection, together with some selection criteria in other useful statistical applications.