Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women’s Liberation Data

: This article deals with the latent class analysis of models with error of measurement. If the latent variable is ordinal and manifest variables are nominal, an approach to handle the restrictions is given for latent class analysis of the models with error of measurement using log linear models. By this way, we include ordinal nature of the latent variable into the analysis. Therefore, overall uncertainty is decreased, and our inferences become more precise. The new approach is applied to a women’s liberation data set.


Introduction
Latent class analysis is frequently used in social sciences and education.Main aim of the analysis is to explain the association structure between manifest variables by using unobserved variables, namely latent variables.Latent class analysis is a categorical analogous of the factor analysis when the latent and manifest variables are categorical.Log-linear models are widely used for the analysis of contingency tables.It is possible to represent a latent class model as a log-linear model using conditional response probabilities.This representation is called as log-linear parametrization.This is a special case of Formann's linear logstic latent class analysis (Formann, 1992).
Error of measurement models are probabilistic versions of Guttman scale (Guttman, 1950), and considered as restricted latent class models.In the loglinear parametrization of the latent class models, types of manifest and latent variables are important issues because latent class models specialize according to the typology of the variables.For example, if the latent variable is metrical and manifest variables are nominal then appropriate analysis is to carry on latent class analysis with linear restrictions or use nominal response models (Heinen, 1996).In our concerned models with error of measurement, manifest variables can be nominal or ordinal, but latent variable is ordinal.If manifest variables are also ordinal then latent class analysis with ordinal classes is a convenient way.However, if they are nominal while latent variable is ordinal, there is no proposed latent class analysis; and latent variable is treated as a metric variable in this case (Heinen, 1996).This is an inappropriate approach for the models with error of measurement.Type of used log linear model is another issue here.Heinen (1996) uses independence log linear models for the latent class analysis of models with error of measurement.In this, ordinal nature of latent variable is ignored.Also, Heinen (1996) mentions use of logit form of column or row association models when column or row variable of a two-way classification is metric, and use of logit form of uniform association models when latent and manifest variables are metric in a two-way classification for latent class analysis.However, these models should be used for ordinal variables rather than metric variables, and use of them for models with error of measurement is not noted.In this article, we propose to use column or row effects log linear models in the analysis of models with error of measurement over multi-way tables when the latent variable is ordinal and manifest variables are nominal.
We revisit the women's liberation data set of Felling et al. (1987).It is also analyzed by Heinen (1996) over models with error of measurement.He treats the ordinal latent variable as if it is metric, and uses an independence log linear model.Our approach is a more convenient option for the analysis of this data set because we take into consideration of ordinal nature of latent variable using an appropriate log linear model.We clarify our approach over the data set, compare our results with those obtained by Heinen (1996), and evaluate effects of both approaches for the models with error of measurement over the data set.By this application, we aim to promote use of column or row effects models in the analysis of latent class models with error of measurement for this kind of data sets.
Section 2 explains the log-linear model and necessary notation.Section 3 gives the basic latent class model.Section 4 explains models with error of measurement.Log-linear approach to the latent class models is given in Section 5 by means of nominal and ordinal latent variable.An expectation-maximization (EM) algorithm for estimation procedure is also presented in Section 5.The women's liberation data is analyzed using the given approach over the models with error of measurement in Section 6.

The Log-linear Model
Log-linear models vary according to the type of variables that construct the contingency table.Classical independence model is used when all variables are nominal; however, if they are ordinal, interaction models are used.If some variables are nominal and some are ordinal, row or column effect models are used.Notation given in this section are valid under all cases.Furthermore, it is assumed that effect coding is used.An alternative to effect coding is dummy coding, which sometimes makes the representations easier (Heinen, 1996).
Let S 1 , S 2 and S 3 be nominal categorical random variables (r.v.'s) constituting a R × C × K contingency table , and |  log where i = 1, . . ., R, j = 1, . . ., C, k = 1, . . ., K; log n ijk is the natural logarithm of expected cell count corresponding to i, j and k levels of the first, second and third variables; β is normalizing constant; β 1 i , β 2 j and β 3 k are main effect parameters of levels i, j and k of the first, second and third variables, respectively; β 12 ij , β 13 ik and β 23 jk are interaction effects of the corresponding levels of the first and second, first and third, and second and third variables, respectively.
Let S 1 , S 2 be nominal and S 3 be an ordinal categorical r.v.constituting a R × C × K contingency table.Then the log-linear model as a row (column) effect model is represented as follows: where τ 13 i and τ 23 j are row (column) effect parameters of nominal-ordinal interactions of the first and third, and the second and third variables, respectively; ν k is the score value corresponding to level k of the ordinal variable; definitions of the rest of the elements of eq.(2.2) are the same as in eq.(2.1).Similar models are constructed according to the number of ordinal variables.See Agresti (2002) for more details.

Latent Class Models
There are two main variable types in the latent class model: manifest and latent variables.Manifest variables are directly obtained and contain information about latent variable, and latent variables are theoretical and not observed directly.Let X be the latent variable, using the representations of Section 2, |X| is the number of unobserved latent classes.When the manifest variables come from N r.v.'s, there are Because contingency table of interest is incomplete, the aim of latent class analysis can be pertained as to complete the table.
Main assumption of the latent class analysis is the local independence.Although the manifest variables constituting the complete contingency table are interrelated with each other, if they are independent on the levels of the latent variable X, this association structure is defined as local independence by Lazarsfeld and Henry(1968).For N = 3, = 1, . . ., |X|, basic unrestricted latent class model is as follows: where g equals to i, j or k if λ equals to 1, 2 or 3, respectively; π S 1 S 2 S 3 X ijk is the probability of being in the cell ijk of the incomplete table for a randomly selected individual; π X is the probability that a randomly selected individual is in the level of the latent variable, which are called latent class probabilities.π S λ X g , which must sum up to one, is the conditional probability of being in the level g of S λ given the individual is in the level of the latent variable (McCuthcheon, 1987;Hagenaars, 1993, Heinen, 1996).Latent class probabilities, which must sum to one, describe the distribution of the latent variable within the observed measures.Latent class number determines the number of latent characteristics.Conditional probabilities reflect the degree of having the latent characteristic for a subject, who is in a given latent class.

Models with Error of Measurement
These models are variants of Guttman scaling.Items are ordered from the least difficult to most difficult, thus there is one correct ordering.In Guttman scales, if once an individual responds negatively to an easier item, she will respond negatively to the other more difficult items.Therefore, Guttman scales are deterministic.The deterministic nature of Guttman scales can be improved by allowing the measurement error.These models are restricted latent class models.There are |X| = t + 1 response patterns or latent classes corresponding to t binary item.In this case, some observations will respond according to |X| = t + 1 response patterns and remaining will respond according to 2 t − (t + 1) response patterns (McCuthcheon, 1987;Heinen, 1996).Because the items are ordered, corresponding response patterns construct an ordinal latent variable.There are four main models with error of measurement.
Proctor's model: Proctor's model is the first probabilistic variation of Guttman model and proposed by Proctor (1970).In the model, each scale item has error rates that are assumed to be same over all items and scale types.So, there are equality restrictions on the conditional probabilities of the scale items for each latent class.
Item-specific error rate model: This model relaxes the assumption that all scale items have the same error rate (Clogg and Sawyer, 1981).Instead, the assumption that there are different error rates for each of the k items.In this models, there are equality restrictions on conditional probabilities associated with each item and latent class over the response patterns.
True-type-specific error rate model: True-type-specific error rate model relaxes the assumption of the Proctor's model that all scale types have the same error rate (Clogg and Sawyer, 1981).However, these models assume that incorrect response probabilities of items are the same in each latent class.To construct the model, equality restrictions are put on the conditional probabilities of scale items for each latent class.
Lazarsfeld's latent distance model: Latent distance model is proposed by Lazarsfeld (1950aLazarsfeld ( , 1950b)).The main assumption of the model is that error rates are specific to the items rather than scale types.There is the assumption that error rates for incorrect and correct responses for an item are different from each other; and this assumption is valid for all scale types except the least and most difficult ones.To set the model, the same equality constraints with the itemspecific error rate model are imposed on the conditional probabilities of the first and last scale types, and for the rest, there are equality restrictions for each level of the manifest variables and corresponding latent classes on the conditional probabilities.

Log-linear Approach to the Latent Class Models
Log-linear approach linearizes the latent class model.In the log-linear representation of (2.1), the log-linear model for the incomplete table is as follows for N = 3 and = 1, . . ., |X|: (5.1) Definitions of log-linear parameters are straightforward as in Section 2. For the nominal latent variable, the relation between conditional probabilities and loglinear parameters is explained over the eq.(5.1) as where the definition of the g is the same as in eq.(3.1).Hence, conditional probabilities are represented in the form of log-linear parameters (Haberman, 1979;Heinen, 1996).By this way, restrictions on the conditional probabilities are imposed on log-linear parameters or design matrix, and estimation process is easier than the maximum likelihood (ML) estimation of the restricted latent class analysis.
In the log-linear representation of (2.2), the log-linear model for the incomplete table is as follows for N = 3 and = 1, . . ., |X|: Definitions of elements of log-linear model are straightforward as in Section 2. Heinen (1996) notes that when latent variable is ordinal and manifest variables are nominal, there had not any estimation method proposed for latent class models, until 1996.In addition, we have not been coincided with any citation on the subject in the literature.In this section, a log-linear approach is introduced for this case.The log-linear model for contingency tables including an ordinal variable is given in eq.(2.2).It is the same for the ordinal latent variable case but row (column) effect models are used instead of the log-linear independence models.Interaction effects between nominal and ordinal variables are perceived as row (column) effect parameters.Under these definitions, we modify the eq.( 5.2) for the case that incomplete table includes and ordinal variable corresponding to the latent variable.Over the eq.( 5.3), the relation between log-linear parameters and conditional probabilities is found as follows: . (5.4) The restrictions imposed on conditional probabilities are handled using (5.4) in the ordinal latent variable case over a row (column) effects log-linear model.
Models including the error of measurement contain an ordinal latent variable.Therefore, use of log linear models in the analysis of these models is possible due to the eq.(5.4).An appropriate analysis of the models with error of measurement can be made by converting the assumptions given in Section 4 to restrictions on the conditional probabilities, and expressing them as the restrictions on the loglinear parameters, namely the restrictions on the elements of design matrix.Heinen (1996) suggests using the independence model and (5.2) imposing the restrictions on the log-linear parameters; however in this case ordinal structure of the latent variable cannot be reflected.In fact, omitting the ordinal structure of the latent variable means omitting the difficulty levels of the items.
Here choice of score values is important, because score values should reflect the ordinal structure correctly.An inappropriate choice can cause the algorithm, which is used to obtain ML estimates, not to converge.Scores can be chosen as the proportion of the observed individuals in each latent class or integers from 1 to |X| or ν k − ν, where ν is the average of relevant score values.
An EM algorithm is used to obtain parameter estimates.Initial values, n0 ijk , are determined in the E-step.Then estimated cell counts are obtained using When summed over the levels of latent variable, estimated observed values are equal to the observed counts.In the M-step, estimated observed counts are taken as directly observed counts and nijk is updated.On this step, Newton-Raphson method is used to obtain updated estimates.Then E-step is revisited.This loop is continued until the algorithm converges (Hagenaars, 1993).

Analysis of women's liberation data
Considered data set is taken from Heinen (1996, p.46).Data comes from a Dutch study of sociocultural developments in the Netherlands.Felling et al. (1987) give detailed information about the study.There are five binary items that 1. Women's liberation sets women against men (S 3 ).
2. It's better for a wife not to have a job because that always poses problems in the household, especially if there are children (S 2 ).
3. The most natural situation occurs when the man is breadwinner and the women runs the household and takes care of the children (S 4 ).
4. It isn't really as important for a grill to get a good education as it is for a boy (S 1 ).
5. A woman is better studied to raise small children than a man (S 5 ).
Women's liberation data set is also analyzed by Heinen (1996) by means of scaling models with error of measurement assuming that the latent variable is in metric scale over independence log linear model.We reanalyze the data using our approach over models with error of measurement with row effects log linear models.The EM algorithm, mentioned in Section 5, was run until the absolute mean difference between the estimated values of the parameters was less than 10 −13 .
Proctor's model: The restrictions on the conditional probabilities that make each scale item has the same error rate over all items and scale types are 26 .
(6.1) Heinen (1996) notes that G 2 = 108.62 and P = 0.00, thus the model is not statistically significant.Here G 2 is the likelihood ratio statistic (Agresti, 2002, p. 24).Each is written in log-linear parameters using (5.4) and restrictions given by (6.1) are applied to get the restrictions of Proctor's model in log-linear parameters instead of conditional probabilities.The same way is followed for other models of interest.Therefore, it is obtained that Here, |X| = 6, and if , where c is a constant score value.Hence, the design matrix consists one column for main effect of good education, five columns for latent classes and one column for the row effect parameter.When ν 1 = 0.055 and c = 0.04762, error rate is obtained as 0.23, which is reported as 0.13 by Heinen (1996).Obtained latent proportions are 0.054, 0.155, 0.242, 0.118, 0.145 and 0.286, which are so close to Heinen's (1996) results.In addition, our G 2 and corresponding P value are very close to Heinen's results.

Item-specific error rate model:
The restrictions on the conditional probabilities that assign different error rates for each of the k items are (6.2) For this model, G 2 = 27.37 and P = 0.159, when the latent variable is treated as metric (Heinen, 1996).Restrictions, given by (6.2), are expressed in terms of log-linear parameters using (5.4) as follows: In this case, there are five columns for main effects of manifest variables, five columns for latent classes and a column for each row effect parameter in the design matrix.It is taken as m and τ 5X r , respectively.Constants c 1 , c 2 , c 3 and c 5 , and implementation conditions are determined by the same manner as in Proctor's model.For this model G 2 = 21.88 with P = 0.147, which are close to those noted by Heinen (1996, p.78).Five error rates are obtained as 0.0246, 0.1427, 0.4858, 0.4357, 0.5766; and latent proportions are found as 0.355, 0.091, 0.181, 0.044, 0.099, 0.230.These error rates and latent proportions are very different from those reported by Heinen (1996, p.81).

True-type-specific error rate model
The restrictions on the conditional probabilities that make incorrect response probabilities of items same in each latent class are 25 . (6.3) For true-type-specific error rate model, it is reported by Heinen (1996) that G 2 = 92.70 and P = 0.00.When restrictions, given by (6.3), are expressed in terms of log-linear parameters using (5.4), it is obtained that For this model, the design matrix of log-linear model consists one column for the main effect of S 5 , five columns for latent classes and one column for the row effect parameter.G 2 = 397.1248with P = 0.00 and error rates are found as 0.126, 0.217, 0.357, 0.646, 0.408 and 0.448, respectively.Latent class probabilities are obtained as 0.077, 0.162, 0.227, 0.068, 0.179 and 0.287.Our G 2 is much greater than that of reported by Heinen (1996, p.78).This implies that we reject the significance of the true-type-specific error rate model more confidentially by our approach.
The design matrix contains five columns for main effects of manifest variables and latent classes and a column for each restriction.Significance of the model is concluded by G 2 = 20.11with P = 0.0925.Error rates are 0.416, 0.027, 0.033, 0.093, 0.158, 0.008, 0.154 and 0.0193.These error rates are very different from those reported by Heinen (1996, p.80).Latent class probabilities are 0.278, 0.106, 0.156, 0.044, 0.118 and 0.297.The third and last latent class probabilities are close to those noted by Heinen (1996, p.80).The type of used log linear model is effective on the results and inferences.
In conclusion, fits of Proctor's and True-type-specific error rate models to the women's liberation data are poor, while those of Item-specific error rate and Lazarsfeld's latent distance model are statistically significant.Moreover, latent class probabilities obtained by both models are very close.The first response pattern has the greatest latent class probability according to both models.However, the highest error rates are seen for the last and first response patterns for item specific error rate and Lazarsfeld's latent distance models, respectively.
Proctor's model seems not to be effected by the type of log linear model.A possible cause of this is small number of parameters in both kinds of log linear models due to the equality restrictions of the model.For all models, G 2 values and corresponding p-values are close in our and Heinen's analyses both.However, error rates and latent class probabilities are very different for all models except Proctor's model in both analyses.This shows the effect of inclusion of ordinality in the analysis.The differences are due to the loss of information when independence log linear model with metric latent variable is used.
• | represent number of levels of inner categorical r.v.Hence |S 1 | = R, |S 2 | = C and |S 3 | = K.The representation of a saturated log-linear model is as follows: