Mixed-effect Models for Truncated Longitudinal Outcomes with Nonignorable Missing Data

Mixed effects models are often used for estimating fixed effects and variance components in continuous longitudinal outcomes. An EM based estimation approach for mixed effects models when the outcomes are truncated was proposed by Hughes (1999). We consider the situation when the longitudinal outcomes are also subject to non-ignorable missing in addition to truncation. A shared random effect parameter model is presented where the missing data mechanism depends on the random effects used to model the longitudinal outcomes. Data from the Indianapolis-Ibadan dementia project is used to illustrate the proposed approach.


Introduction
This paper is concerned with estimating the effects of putative risk factors on cognitive decline in the elderly which is the focus of many longitudinal studies, both epidemiological or clinical trials in nature.Many cognitive assessment instruments currently used in dementia studies have an upper ceiling due to the limited time available for testing and the fact that the instruments are also functioning as screening or diagnostic tools for dementia with much greater emphasis on sensitivity at the lower end of the instruments.A well known example of such an instrument is the Mini-Mental State Examination (MMSE) which has a ceiling of 30.However, more extensive and lengthy neuropsychological tests are shown to be normally distributed in a large cohort of subjects.But given a shorter version of a lengthy test, many who would have scored in the tails of the longer version are now scoring at either endpoint.Inference on cognitive function or decline is concerned with the "true" cognitive ability or change that would have been measured rather than the observed score on the shorter version with a ceiling effect.The ceiling or truncation effect in neuropsychological tests was also observed in van Belle and Arnold (2000) although their focus was on measuring reliability.
Analysis of cognitive decline data is further complicated by the fact that selective groups of subjects maybe missing some measurements either by study design or by happenstance.For example, in large cohort studies of dementia, subjects previously diagnosed with cognitive impairment no dementia (CIND), an intermediate status between normal and dementia, were usually not re-screened with the cognitive test and may directly proceed to clinical evaluations.Other missing data due to death or nursing home placement may also raise the possibility of non-ignorable missing data in this situation.
When measurements with ceilings are used in regression analysis, it has been shown that the ordinary least square (OLS) estimator ignoring the ceiling is biased and inconsistent (Goldberger, 1981).There have been efforts to correct for the OLS bias in the regression model setting (Tsui, Jewell, and Wu, 1988).Hasselblad, Stead and Galke (1980) considered a univariate regression model with multiple truncation points using the EM algorithm.In the case of a single truncation point with normally distributed data, the model is sometimes referred to as the Tobit model in the econometric literature (Amemiya, 1984) after an earlier econometric application (Tobin, 1958).Little and Rubin (2002) called this type of truncation "non-ignorable missing data with known mechanism" since the truncation point is known in this situation.When such outcomes are measured repeatedly and factors associated with change in outcome over time is of interest, special methods are also needed to ensure unbiased inferences.Hughes (1999) proposed an EM based maximum likelihood approach for longitudinal outcomes with truncated outcomes.Publications adopting the Hughes model are mostly concerned with the modeling of viral load in HIV and related lab data with various special mixed-effect models (Wu, 2002)).Lyles (2000) considered the mixed-effect model in Hughes (1999) with the additional problem when the outcomes are also subject to informative drop-out.The authors adopted the approach of Schluchter (1992) by joint modeling of outcome variable and a log-normal survival model for time remained in the study for each subject.Thiebaut, Jacqmin-Gadda, Babiker and Commenges (2005) and Pantazis, Gouloumi, Walker, and babiker (2005) also considered the joint modeling of bivariate longitudinal data with a log-normal survival model for time to dropout.In this paper, we propose to use a binary survival model proposed by Wu and Carroll (1988) and previously adopted by by Pulkstenis, Ten Have, andLandis (1998), andTen Have, Kunselman, Pulkstenis, andLandis (1998) to model the incidence of dropping out in a shared random effect model approach.We present results from a simulation study and illustrate the proposed method using data from a community-based dementia study.

A Longitudinal Dementia Study
The Indianapolis Study of Health and Aging is one of two longitudinal cohorts in the Indianapolis-Ibadan Dementia Project aimed at identifying risk factors for dementia, Alzheimer's disease and cognitive decline.The study population consists of 2212 African Americans age 65 and older living in Indianapolis, USA, at study baseline.Study participants were evaluated at study baseline and repeatedly evaluated at 2, 5, 8 and 11 years after baseline with a two-phase design at each evaluation wave.At the first phase (screening phase), study subjects were interviewed at their homes with a questionnaire designed to evaluate their cognitive function.In addition, demographic information, family history of illness, medical history of the subject, consumptions of alcohol and tobacco and blood samples were collected during the screening interview.At the second phase (clinical phase), selected subjects from a stratified sample based on screening results received full physical and neurological examinations to determine disease status.Subjects who received a full clinical evaluation were classified as demented, CIND, or normal.Demented subjects were then followed using a separate protocol.The CIND subjects were allowed to skip the screening phase and proceeded directly to clinical evaluation at the next follow-up evaluation.
At each screening phase study subjects were interviewed using the Community Screening Instrument for Dementia (CSID), a questionnaire designed for dementia screening in diverse cultural and educational backgrounds.The CSID consists of two parts: an interview with the study participant and an interview with an informant.The interview with the study participant assesses cognitive functioning, medical history, social involvement, and other putative risk factors.The interview with informants assesses the study participant's cognitive functioning, activities of daily living (ADL) and functioning at work and in social relationships.
The cognitive test of the CSID includes a number of test items measuring multiple cognitive domains including language, memory, orientation, judgment, comprehension and constructional praxis.Several neuropsychological tests including the animal fluency test and East Boston story were also included.In this paper, we consider a total cognitive score created by summing corrected answers from 40 questions in which lower scores indicate more cognitive impairment.These 40 questions were repeatedly administered at baseline and at each of the four follow-up waves.Therefore, there is an interest in investigating the patterns of cognitive decline and factors associated with cognitive decline.One particular factor of interest is the influence of education on cognitive decline.Many crosssectional studies have reported low education is a risk factor for poorer cognitive function.However, longitudinal studies have been inconsistent with some finding no effect suggesting the effect seen in cross-sectional studies was due to biases in cognitive assessment favoring highly educated individuals.
The investigation of education on cognitive decline is complicated in the Indianapolis data by two facts.The first is the truncation of the CSID scores at 40.In Table 1, we show significant difference in mean CSID cognitive scores at baseline in two groups defined by education level, namely, those who had 6 or less years of education (low education group) and those who had 7 or more years of education (high education).The cut-off of 6 years of education was chosen for this cohort in previous report (Hall, Gao, Unverzagt, and Hendrie, 2000).The percentages of subjects who scored at 40 also differed significantly in the two groups.The second complication is the substantial proportions of missing data at each of the follow-up evaluations.Only 425 (21%) subjects out of a total of 2028 eligible subjects at baseline had complete information for all evaluations.The number of dropping out after baseline, 2, 5 and 8 years are 370, 469, 497 and 267, respectively.In Table 2, we show significant differences in percentages of subjects with missing data at each follow-up evaluation by education groups.Subjects with low education were more likely to drop out of the study than those in the high education group.Statistical inference ignoring these two facts may create a distorted conclusion regarding education's effect on cognitive decline.In the follow section, we adopt a shared parameter model approach to model the longitudinal outcomes while accounting for both truncation and missing data.We note that our setting differ somewhat with the HIV laboratory test setting where a biomarker value below detectable range was considered censored.In our setting, we considered the test responses at 40 truncated due to the design of the questionnaire.We also assumed the test scores are measured without errors.

Shared Random Effect Models
Let y be the true cognitive outcome that would be measured if the instrument used for assessing cognitive ability did not have an upper ceiling.y ij denotes the jth measurement from the ith subject.Instead of y ij , we observe a pair of random variables (Q ij , T ij ), where Q ij is the truncated response and T ij is the truncation indicator.Therefore, we have Hughes (1999) considered truncation at both the floor and ceiling by having three levels of T ij .In our example data setting, we have a uniform truncation point, i.e.T ij = T .

The longitudinal model
We assume the following mixed-effect model for the true cognitive outcome, y ij : where X ij is a 1 × p vector of fixed effect covariates, β is a p × 1 vector of fixed effect, Z ij is a 1 × q vector of random effect covariates, γ i is a q × 1 vector of random effect, γ i ∼ N (0, D), and e ij ∼ N (0, σ 2 I).
Note that y has a multivariate normal distribution with mean Xβ and variancecovariance matrix of V = ZDZ + σ 2 I, where X is the design matrix for the fixed effect where it has X ij as its rows and Z is the design matrix for the random effect where Z ij forms its rows.Our interest is in estimating the fixed effect parameter β, the variance-component matrix D and the random error parameter σ 2 using the observed data Q ij and T ij .An EM algorithm can be used to derive parameter estimates from (3.1) when there is no missing data.We refer the details of the EM algorithm to Hughes (1999) and concentrate instead to describing the methods for dealing with nonignorable missing data.

The drop out model
Let R ij be an indicator variable for missing observation for the ith individual at the jth follow up wave: R ij = 0, if an outcome is observed; R ij = 1, if an outcome is missing.Using the framework of the shared random effect model, we assume the following model for the probability of a subject having missing data at evaluation wave j conditional on this individual being followed up at evaluation wave j − 1: where η is a link function for a generalized linear model, W ij is a 1 × r vector of covariates, α is a r × 1 vector of fixed effect, δ is a parameter for the random effect, U ij is a 1 × q vector of covariates for the random effect, and γ i is the same subject-specific random effect defined in the longitudinal outcome model (3.1).
The shared random effect model has an attractive appeal to biomedical researchers who generally believe that there may be some latent yet to be measured quantity underlying a person's susceptibility to both cognitive decline and missing data due to adverse outcome (nursing home entrance or death).In addition, the shared random effect model does not explicitly assume that missingness depends on the unobserved outcome; rather, it depends on a latent variable that is inherent in all outcomes from the same subject.

The Joint Likelihood
Without loss of generalizibility, we assume that for each completely followed up subject, the first m i of the total n i observed outcomes are not truncated.We also assume that for subjects who were lost to follow-up, the first m i observations were not truncated (m i ≤ n i ), the next n i − m i observations were truncated and the last observation was unobserved.This can be easily achieved by re-arranging the observations for each subject along with all relevant covariates.
Let p ij = Prob(R ij = 1).Assuming that y ij and R ij are independent given γ i , the joint likelihood function of observed y ij and R ij can be written as Note that the joint likelihood can be divided into three parts.The first product in (3.3) is the likelihood function for the regular mixed-effect model for measures with observed outcome and no truncation; the second product involves those subjects with observed, but truncated outcomes; the third product is the probability of missing data conditional on random effects.Various link functions may be considered in the missing data model.We consider, for example, when η is a logit link function.
Since the missing data model is built on the incidence function of the missing data variable, we use the following recursive relationship to define the marginal probability of R ij : Assuming the shared random effect parameter follows a normal distribution, the maximum likelihood estimates can be derived using numerical integration techniques offered in the NLMIXED procedure in the SAS software package.Programming codes for the implementation are available upon request.In the following section we investigate the empirical properties of the shared random effect models and compare the proposed approach to other methods in simulation studies.

Simulation Results
We conducted a simulation to compare three different approaches in data with truncation and missing data.The first method is a so-called naive approach where a regular mixed-effect model is fit for the longitudinal data ignoring both the truncation and missing data.The second method considers the truncation process but ignores the missing data and the third method utilizes the shared randomeffect models to obtain maximum likelihood estimates for the joint likelihood function.We designed the simulation studies using the data structure of the Indianapolis Study of Health and Aging.Number of baseline eligible cohorts and covariates were fixed to be those in the Indianapolis data set.We considered three covariates in the simulation.One is an indicator variable for age: 0 for those 75 and younger and 1 for those over 75 years at each evaluation.This variable is a time-dependent variable.The Second covariate included in the simulation is the dichotomized education variable discussed in Section 2. The third covariate is follow-up time in years since baseline.
We simulated longitudinal true outcomes according to the following model: where γ i ∼ N (0, σ 2 g ), and e ij ∼ N (0, σ 2 e ).All generated scores above 40 were then truncated to 40.
We also simulated missing data by using the following missing data model: where γ i ∼ N (0, σ 2 g ).δ is the parameter controlling the degree and the direction of the "closeness" between the longitudinal model and the missing data process.We consider three scenarios in this paper.δ=0 indicates that missing data is not linked to the outcomes, hence the method considering truncation only is adequate.We also consider the scenario when δ = −1 where those subjects with higher cognitive scores are less likely to drop out, consistent with our dementia data.The last scenario we consider is when δ = 1 where those with higher cognitive scores are more likely to drop out.Other parameter values used for generating the simulations are fixed to be: 0 and σ 2 e = 1.0 for the longitudinal model (1), and α 0 = −2.0,α 1 = 1.0, α 2 = 1.0, α 3 = 0.15 for the missing data model (2).These parameters were chosen to make the truncation and missing data patterns similar to our dementia example data.Approximately 35% of subjects have truncated scores at baseline and the percentages decline with time.Drop-out ranged from about 25% at the first follow-up to about 50% at the last follow-up.
We present simulation results in Tables 3, 4 and 5.For space's sake, we omitted results for the δ=-1 scenario which are similar to δ=1.In Table 3, we present parameter estimates, and percent bias defined as |estimates-true parameter value|/(true parameter value).The naive estimates were shown to be consistently biased from the true parameters.The truncation only approach was adequate when δ = 0 (missing at random) with minimum bias.When δ = 0, the approach considering truncation and ignoring the missing data also showed some bias.If we focus on the parameter for interaction, β 4 , the truncation only method showed about 11% relative bias in estimating β 4 where the shared random effect approach had bias smaller than 1%.It should also be noted that the shared random effect model approach showed considerable bias in estimating parameters of the missing data model (2), although the β estimates for model (1) do not seem to be affected as a consequence.It is worth noting that the shared random-effect model approach underestimated the magnitude of the δ parameter.
In Table 4, we present standard error estimates using the information matrices of each likelihood function for the three methods, compared to the empirical standard error estimates based on the estimates in the simulations.Our results demonstrate that these standard error estimates are fairly consistent with the empirical standard error estimates for all three methods.We then constructed 95% confidence intervals using the derived parameter estimates and standard error estimates based on asymptotic normality assumption for each of the parameters in the two models.In Table 5, we present estimated coverage probabilities for parameters in the two models using the three approaches.The naive method provided poor coverage for almost all parameters under all three scenarios, while the approach considering truncation but ignoring missing data provided adequate coverage when δ = 0, as expected.However, when δ = 0, the truncation only approach had poor coverage on some parameters, especially on time-dependent covariates, such as time and time by education interaction.In addition, the truncation only approach also had poor coverage of the within-person variance parameter, σ 2 g .
The shared random-effect model approach provided adequate coverage for all parameters in the longitudinal model with the exception of the time covariate, where the coverage was below the 95% norminal level.Coverage for the two variance component parameters are also excellent.However, the shared randomeffect approach gave very poor coverage to the missing data model parameters indicating that inferences on the missing data model parameters are not reliable and better estimation approaches may be needed.

The Dementia Data
We also used the three estimating approaches on the dementia data described in Section 2. We present parameter estimates, standard error estimates and pvalues by the three methods in Table 6.Note that the shared random-effect model estimated δ=-0.3416.Based on the simulation results, we believe that this may be an underestimate of the true parameter value and the estimate seems to confirm our observation that the high cognitive functioning subjects are less likely to have missing outcomes.We illustrate the estimated trend of cognitive decline for the two groups defined by education levels using the three methods in Figure 1.The truncation only method provided very similar result with the shared randomeffect model approach, differing slightly toward the end of the follow-up period by estimating lesser declines for both groups.This difference is expected because the truncation approach assumes missing at random for those subjects dropped out during follow-up while the shared random-effect model approach assumes that the drop-outs are more likely to be worse in cognitive functioning.The naive approach, however, differed from the other two approaches by estimating a much lesser decline especially in the high education group since it neglected the truncation in the scores.For our dementia data, all three approaches found the interaction between education level and time since baseline significant, providing evidence supporting an education effect on cognitive decline in this cohort.Therefore, application of the shared random-effect model did not alter conclusions reached by the naive method or the truncation only method.However, it is conceivable in other data set with a less strong interaction that the application of the shared random-effect model approach may lead to different conclusion reached using the other two methods.

Conclusion
We propose to use the shared random-effect model for longitudinal data with both truncation and missing data.We showed that maximum likelihood approach under the correctly specified models can provided unbiased estimates for the parameters in the longitudinal outcome model.More research is needed to provide adequate estimation and inference procedures for the missing data model parameters.and P30 AG10133.We thank Dr. Hélène Jacqmin-Gadda for her review and comments on an earlier version of the manuscript.

Table 1 :
Comparisons of baseline characteristics between the two groups defined by education levels

Table 2 :
Comparisons of percentages with missing data in the two groups defined by education levels.Data from the Indianapolis cohort of the Indianapolis-Ibadan Dementia Project

Table 3 :
Parameter estimates and percent bias (second entry) of model parameters assuming various missing data models in simulations

Table 4 :
Standard error estimates and empirical standard errors from simulations assuming various missing data models

Table 6 :
Results from the dementia data using the three different approaches