Markov Chain Monte Carlo Methods for Inference in Frailty Models with Doubly-censored Data

Frailty models have become popular in survival analysis for dealing with situations where groups of observations are correlated. If the data comprise only exact or right-censored failure times, inference can be done by either integrating out the frailties directly or by using the EM algorithm. If there is both leftand right-censoring this is no longer the case. However the MCMC method of Clayton (1991, Biometrics 47, 467-485) can be easily extended by imputation of the left-censored times. Several schemes for doing this are suggested and compared. Application of the methods is illustrated using data on the joint failures of patients with fibrodysplasia ossificans progressiva.


Introduction
The usual aim of survival analysis is to examine the effect of covariate information, such as the application of a medical treatment, on time-to-event data.Sometimes the actual distribution of the event time, as characterized by the baseline survival function or the hazard rate, is of interest too.It is common for some observed times to be right-censored, for example when a patient drops out of a medical trial, so that the true event time T is known only to be greater than an observed time t.Occasionally left-censoring occurs, when we only know that T < t.Data are said to be double-censored when both kinds of censoring are present.An additional complication arises when the observations are not independent, for example when repeated observations are made on a single subject or on subjects in the same family.A frailty model attempts a parametric model of this within-group dependence by postulating a group-specific parameter, a "frailty", to account for each group's particular susceptibility to new events (Clayton, 1978;Vaupel, Manton and Stallard, 1979;Oakes, 1982;Clayton and Cuzick, 1985;Aalen, 1988).The group here may be a set of repeated failure times of a single subject, or single failure times of individuals who are related in some way.It is usual to assume a proportional hazards model, taking the hazard rate for the jth failure in group i to be of the form where h 0j (t) is a baseline hazard whose dependence on j might be expressible in terms of measured covariates (h 0j (t) = h 0 (t)e β x j ), or may be non-parametric (as in the case where each j repesents a different type of event), or a combination of both.See the next section for an example.The role of the frailty z i , or "log-frailty" u i = log z i , is thus to modify the hazard rate for the ith group, so that groups with higher frailties have a proportionally higher risk of failure.The frailty is regarded as a random effect.More complicated hierarchical structures can be modelled, but these are not discussed here.
The frailties are usually assumed to come from a "frailty distribution" f (z) belonging to some parametric family of distributions, for example the gamma or lognormal.For identifiability the mean frailty is fixed at 1, or the log-frailty at 0, so that the baseline hazard represents the average subject.The variance of this distribution represents how much frailty varies within the population, and may be of practical interest.A common choice for the frailty distribution is the gamma distribution with mean 1 (Clayton and Cuzick, 1985a) with density: where ν determines the frailty variance.This gamma family is convenient for estimation when all failure times t ij are either observed or right-censored, because of the form of the likelihood conditional on the z i : where H 0j is the cumulative hazard and δ ij the usual censoring indicator (0 for right-censored and 1 for uncensored).Each of the frailties z i can be explicitly integrated out of this expression with respect to the above gamma densities to give the observable log-likelihod.If a parametric form is assumed for the baseline survival function(s), inference concerning these parameters and the frailty parameter ν can be based on direct maximization of this observed log-likelihood.
Alternatively the z i can be regarded as missing or unobserved data and we can use the EM algorithm based on the "complete data" likelihood: The frailties z i enter linearly into that part of the log-likelihood which depends on the regression and gamma parameters, and the conditional expectation of each z i is also easy to calculate because its full conditional distribution is also gamma.Thus the E-step is very straightforward.There are a number of alternative schemes for implementing the M-step (Clayton and Cuzick, 1985b;Klein, 1992;Nielsen, Gill, Andersen and Sorenson, 1992).
When some observations are left-censored, this intoduces into the likelihood terms of the form (1 − e −z i H 0j (t ij ) ), so that the conjugacy with the gamma frailty model is lost, and so is the linearity in the complete-data log-likelihood.Thus the usual approaches to inference outlined above cannot be applied.
One alternative approach is penalized likelihood (McGilchrist and Aisbett, 1991;McGilchrist, 1993;Therneau, Grambsch and, Pankratz, 2000), treating the frailty densities in the complete-data log-likelihood above as penalty terms, and maximisng over the frailties rather than integrating them out.If a parametric model is assumed for the survival functions this method can be applied even with doubly-censored data (Jones and Rocke, 2002), although the properties of the estimates are uncertain.The semi-parameric Cox form of the model cannot be applied because there is no partial likelihood function for doubly-censored data.
Another alternative is to use Markov chain Monte Carlo (MCMC) methods, in which each of the parameters in the model is iteratively resampled using its conditional density given the current values of other parameters.It is helpful to visualize the parameters as nodes in a network.We visit each node in turn and resample its value by making a random drawing from its conditional distribution given the data and the values of all the other nodes.For an elementary introduction see Gilks, Richardson and Spiegelhalter (1996).This produces a Markov Chain which, when converged, gives us a sample from the full posterior distribution of any parameters of interest.From a theoretical perspective, this implies a Bayesain approach to inference.However in practice it has much in common with the marginal likelihood approach of integrating out nuisance parameters, particularly if non-informative priors are used.Again several alternative schemes are possible.Clayton (1991) uses Gibbs sampling to fit frailty models to clustered failure data with exact or right-censored observations, sampling iteratively from the full conditional distributions of H 0j , β, z and ν with H 0j as an independent-increments gamma process (Kalbfleisch, 1978).Sargent (1998) avoids the difficulty of modelling and sampling from H 0j by using the Cox partial likelihood.He recommends using Metropolis-Hastings steps except when exact conjugacy is available.
To extend either of these approaches to more complicated censoring schemes, it is natural to consider the imputation of censored data as an additional step in the MCMC sampling scheme, since the censored values can be regarded as additional nodes in the network and resampled from their full conditionals in the same way as the model parameters.Since right-censored data can already be handled it is sufficient to apply this only to left-or interval-censored values.Two imputation schemes have been proposed, by Satten (1996) and Satten, Datta and Williamson (1998), for frailty-less models with grouped or interval-censored data.In the first a Gibbs sampling scheme is used to generate rankings consistent with the data and covariate values, by choosing a convenient parametric model from the proportional hazards family.Because only the ranks are considered the distribution will not depend on the baseline hazard function.The generated rankings are then used in a stochastic approximation scheme to maximize the marginal likelihood based on the ranks.Satten, Datta and Williamson (1998) propose an alternative imputation scheme in which actual failure times are generated from an assumed parametric model, after which the imputed times are ranked and a rank-based procedure is used to estimate covariate effects.They suggest that this method is robust to mis-specification of the baseline distribution and has several advantages over the rank-imputation method.We adopt their second approach below.
In this paper we investigate and compare a number of different methods for imputing left-and interval-censored failure times, within an MCMC procedure for fitting a frailty model.For illustration purposes we use data on the use of accutane for treatment of fibrodysplasia ossificans progressiva (Zasloff, Rocke, Crofford, Gregory and Kaplan, 1998).The first method, following Satten, Datta and Williamson (1998), uses a parametric model to impute the failure times conditional upon the frailties and other parameters, then a rank-based procedure to generate new frailty and parameter values.The second is based on Clayton's (1991) scheme and imputes failure times from a Nelson-Aalen type estimate of the current H 0j .This method places all imputed times as equal to one of the observed times, so requires some uncensored observations.The third uses a smoothed estimate of the current H 0j .These are compared with a fully-parametric model, which also requires MCMC estimation.

Example
Fibrodysplasia ossificans progressiva (FOP) is an extremely rare and disabling genetic disorder characterized by progressive heterotopic ossification of soft tissues.Fibrous nodules appear in soft connective tissues and herald impending ossification at an anatomic site.Nodules mature to form permanent foci of heterotopic bone that bridge and immobilize adjacent joints.In a clinical study reported by Zasloff, Rocke, Crofford, Gregory and Kaplan (1998), twenty-one FOP patients underwent treatment with the drug accutane for varying periods.The status of each of eleven joints per patient before, during and after treatment was to be compared with data from an external control group of 40 patients, previously analyzed by Rocke, Zasloff, Peeper, Cohen and Kaplan (1994), to assess the efficacy of accutane treatment.There is both right-and left-censoring in each group, and the treatment covariate is time-dependent.For more detail see Jones and Rocke (2002).
To allow for within-patient correlation, we can model the hazard rate for the jth joint of the ith patient at age t as where Here t i is the age of patient i when treatment commenced and τ is the proportional reduction in hazard from using accutane.For patients in the control group τ ≡ 1.The cumulative hazard is then The contribution to the log-likelihood of an observed age t ij for the jth joint of the ith patient will be Where joints are in right-left pairs, it seems reasonable to assume that the hazard function is the same for each side; we can thus pool the data for right and left sides, allowing two observations per patient per joint, and these are assumed to be conditionaly independent given the frailties.This reduces the number of hazard functions to j = 7.Table 1 summarizes the amount of data available for each joint.Rocke, Zasloff, Peeper, Cohen and Kaplan (1994) examined the marginal distributions for the control group using both Weibull models and Turnbull's non-parametric estimator (Turnbull, 1974), and found that the Weibull model gives results similar to the Turnbull estimates.One approach then is to take: For the reasons given in the Introduction this model cannot be fitted directly or using the EM algorithm, but inference using MCMC methods is reasonably straightforward.The convenience of the gamma frailty distribution is lost, so we consider instead a lognormal frailty distribution, To perform an MCMC algorithm we can conveniently partition the model parameters into the Weibull hazard parameters {(ρ j , κ j ) : j = 1, . . ., 7}, the log-frailties {u i : i = 1, . . ., 61}, the treatment effect τ and the frailty variance σ 2 .The form of our model implies that the subject-specific parameters u i are conditionally independent given all other parameters, as are the joint-specific parameters (ρ j , κ j ).This makes the implementation of the MCMC resampling scheme easier, and ensures reasonably fast convergence.We now consider the form of the full conditional distributions for each parameter set.To do this we can just focus on those terms in the likelihood which contain the parameters of interest, combining them with our chosen prior distributions.Here we have used non-informative, or vague, priors for all parameters.
The log-likelihood for the Weibull parameters (ρ j , κ j ) can be taken as and with prior p(ρ j , κ j ) the full conditional posterior density is proportional to e (ρ j ,κ j ) p(ρ j , κ j ).Even without left-censored data this does not have a convenient form, but for a moderate number of observations per joint and with a vague prior this can be reasonably accurately approximated by a normal distribution based on the maximum likelihood estimates (ρ j , κj ).Thus to resample each (ρ j , κ j ) pair we can fit a Weibull distribution separately for each joint, then simulate from a bivariate normal centred at the MLEs with covariance matrix taken from the observed information matrix.An alternative would be to use Metropolis-Hastings sampling or an adaptive rejection method (see Gilks, Richardson and Spiegelhalter, 1996).
For each frailty z i the (penalized) log-likelihood, conditional on the other parameters, is (z i ) = j ij +log f (z i ), where f (z) is the frailty distribution.From equations (2.1) and (2.2) we can see that when there is no left-censoring, appropriate choice of f (z i ) makes this the log of a gamma density, but the left-censoring contributes terms of the form log(1 − e −z i H 0j (t ij ) ).Thus the posterior will not have a simple form.Moreover the number of terms in the log-likelihood is small (11 joints per patient) so the use of an asymptotic normal approximation is not sufficiently accurate.Examination of the shape of the full conditional likelihood functions, at typical values of the other parameters, reveals however that the logfrailties u i do give an approximate normal shape, and that this approximation is improved by using a lognormal frailty distribution, say u i ∼ N (0, σ 2 ).To sample u i from its full conditional posterior, we can use the rejection sampling approach of Zeger and Karim (1991) with the majorizing function a rescaled normal density based on the MLE.Again adaptive rejection or Metropolis-Hastings could be used instead.
Next consider the multiplicative treatment effect parameter τ .For these data there were no left-censored times during treatment because patients were being closely monitored, so the log-likelihood for τ has the form: where n ft is the total number of joints which failed during treatment (only two in this case), t i is the age at which treatment commenced, and the summation is over all joints which had not failed before the start of treatment.This has the form of a gamma log-density.If we use the non-informative prior p(τ ) ∝ 1/τ then the full conditional distribution from which we resample τ will be Gamma(n ft , Finally, to resample the frailty variance σ 2 , we note that the full conditional likelihood is based on the current z i values of the 61 patients, so is proportional to (σ 2 ) −61/2 e − z 2 i /σ 2 .If we take the usual vague prior p(σ 2 ) ∝ 1/σ 2 we find that z 2 i /σ 2 ∼ χ 2 61 .This is well approximated by a normal distribution, so we simulate from this normal distribution and then solve for σ 2 .
3. Fit the log-frailty u i separately for each subject i, holding other parameter values fixed.Use the fitted value and its estimated covariance in a proposal distribution for resampling each z i by rejection sampling.
4. Resample τ from the Gamma(n ft , , evaluated at the current values of the other parameters.

Resample σ 2 by dividing z 2
i by a random drawing from a χ 2 61 or N (61, 122) distribution.
6. Output all parameter values of interest.7. Repeat 2-6.8.After establishing convergence of the chain, summarize the resulting posterior distributions.
The above MCMC scheme was implemented in Fortran, with the results shown in Table 2 (Weib column) in the next section.Convergence was assessed using the CODA suite of programs (Best, Cowles and Vines, 1995).To facilitate comparison between lognormal and gamma frailty distributions the lognormal variance σ 2 was converted to a coefficient of variation parameter.
We now consider relaxation of the assumption of Weibull survival functions.This can be done by adding to the MCMC scheme an extra step in which the left-censored data is imputed from the current values of the model parameters.

Imputation of Censored Data
We consider here three schemes, the first (PL) using partial likelihood for the parameters and data imputation from a surrogate parametric model (based on the approach of Satten, Datta and Williamson, 1998).The second approach (H1) uses Clayton's (1991) MCMC method, with the addition of a data imputation step using the non-parametric hazard estimate.The third (H2) is similar but smoothes the hazard estimate by linear interpolation.Using imputed data t * ij for the left-censored times allows us to use the non-parametric methods developed for data which are either exact or right-censored.
In the PL approach, MCMC sampling for the frailties z i and treatment effect τ is done using the Cox partial likelihood as the likelihood component (see Sargent, 1998).Again there is no computational advantage in using a gamma frailty distribution, so a lognormal was used.The current parameter values are then used to fit baseline Weibull distributions to the data for each joint, given the frailties and treatment effects.This parametric fitting can be done either using the actual right-censored information or, when they become available, the current imputed times t * ij .To impute a new time t * ij which was left-censored at t we Figure 1: Imputation of event times t * for an observation left-censored at t, based on the current estimated survival curve (adjusted for covariates).Here u is drawn randomly from a Uniform(S i , 1) where S i is the estimated survival at the largest observed event times less than or equal to t.
adjust the corresponding Weibull distribution for the frailty and simulate from the truncated distribution as follows where ρ = ρ j z 1/κ j i , κ = κ j and u is a random drawing from a Uniform(0,1) distribution.Note that, although a distributional assumption is used for the missing data imputation step, the parameters of interest are generated nonparametrically.This should provide some robustness to mis-specification of the parametric survival curves.
To eliminate entirely the dependence on a parametric assumption for the survival curve, it is natural to ask whether the data imputation can be based on a non-parametric hazard estimate.If we use Clayton's MCMC approach we have at our disposal an estimated hazard function, based on the Nelson-Aalen estimator (Aalen, 1978), which is a step function with jumps at the observed event times, each jump drawn from a Gamma(α, λ) distribution where α is the number of "deaths" at that time and λ the total risk score, ie the total cumulative hazard for all joints of that type which were still at risk at that time.To impute a new time t * ij which was left-censored at t we adjust this hazard estimate for the frailty and simulate from the truncated distribution as shown in Figure 1.Denoting the observed event times by t 1 , t 2 , . . .t n and the corresponding estimated values of the survival function by S 1 , S 2 , . . .S n , we find the largest t i < t, draw u randomly from Uniform(S i ,1), and take t * = t k where S k is the largest value less than u.Note that this method does not add new event times: computationally we just increase by one the number of observed events at time t k .The fact that this hazard estimate is undefined beyond the largest observed event time is not a problem for this method: if an event is left-censored beyond this time it is assigned to one of the observed event times, with probabilities proportional to the jumps in current estimate of the survival function at those times.If on the other hand an event is left-censored at a time smaller than the earliest observed event time, it can be assigned t * = 0. Computationally this means that we should always include zero as a possible event time, possibly with S 0 = 1 initially.With the left-censored data imputed, the conjugacy of the gamma frailty model returns, so in implementing this method we can assume a Gamma(ν, ν) frailty distribution, ie with mean 1 and coefficient of variation ν −0.5 .The Gibbs step for each frailty then involves sampling from a gamma distribution.Sampling from the full posterior of ν is described by Clayton (1991).This second method (H1) has the clear disadvantage that it requires a considerable proportion of the data to be uncensored: otherwise we would have few event times each with many tied events.It would not work at all for intervalcensored data.An obvious extension is to smooth the hazard estimate so that t * is drawn from a continuous distribution.In the third method (H2) we use linear interpolation of the estimated cumulative hazard function.Events leftcensored before the earliest observed event time are now assigned times between zero and the censoring time.Unfortunately there is now a difficulty with events left-censored after the last observed event time.Such a situation might be supposed to be rare, but there were such in the data considered here: five joints for one individual which were right-censored at age 36.A number of ad hoc solutions are possible.One could take S(36) = 0.One solution is to choose a time T larger than all censored or uncensored times in the data and to take S(T ) = 0. We can then investigate the sensitivity of the solution to the choice of T .In our example two values, T = 40 and T = 100, produced no discernible differences in the posterior distributions.If the choice of T is material one might want to specify a prior distribution and include it in the MCMC algorithm, thus averaging over a range of values.The posterior distributions resulting from the four MCMC procedures are summarized in Table 2, giving the mean and the 2.5th and 97.5th percentiles for the treatment effect (τ ), the coefficient of variation of the frailty distribution (CV), and two of the individual frailties for subjects #24 and #52.The posterior distribution of the treatment effect (Figure 2) is similar for all four methods and it is clear that it is significantly less than one, indicating a significant reduction in hazard from the use of accutane.The distribution of the coefficient of variation of the frailty distribution is almost identical for the Weibull and partial likelihood methods, which both assumed lognormal frailties, but is somewhat different for the two methods which use the Nelson-Aalen hazard estimator and assume a gamma frailty distribution.This in turn affects the posterior distributions for the individual frailties.We focus here on two individuals, #24 for whom all data are uncensored and #52 for whom all data are either left-censored at 2.75 or right-censored at 7.75.In the former case only the smoothed Nelson-Aalen result is substantially different from the others; the latter case seems to be affected much more by the choice of frailty distribution, perhaps not surprisingly since there is less information in the data for this case.Finally we compare in Figure 3 the estimated survival curves for the elbow joint using the parametric Weibull model and the non-parametric Nelson-Aalen estimate.There is perhaps a suggestion in this that the Weibull model is underestimating the hazard in the early years.
The main parameter of interest however is the treatment effect.From the confidence intervals we can conclude that the use of accutane leads to a significant reduction in the incidence of joint failures of between 85 and 99%, a result similar to that found by Jones and Rocke (2002) using penalized profile likelihood.

Discussion
This paper examines the use of Markov chain Monte Carlo techniques for the estimation of frailty models in multivariate survival analysis, in the presence of both left-and right-censoring.Since right-censored observations are easily dealt with using existing methods, all that is required is for the left-and intervalcensored times to be imputed within an MCMC scheme.Three ways of doing this have been considered, and these have been compared with a fully-parametric Weibull model for a particular dataset.The results suggest that the estimation of fixed effects (here a treatment effect) is not greatly affected by the choice of method.Estimation of the frailty parameter, and the prediction of individual frailties, can vary considerably, but this appears to be due at least as much to the choice of frailty distribution as to the method of imputation used.These conclusions may well depend on the amount of censoring present.The method of imputation using an unsmoothed Nelson-Aalen estimator requires a reasonable number of exact death times, so would not be possible if, for example, all times were interval-censored.The convergence proprties of the smoothed estimator are unknown.A reasonable compromise solution might be the use of discrete time with a suitably sub-divided scale.

Figure 2 :
Figure 2: Posterior distribution of treatment effect τ using four different MCMC methods.

Table 1 :
Numbers of censored and uncensored observations for each joint type.

Table 2 :
Point estimates and confidence intervals for the treatment effect, the frailty cv and two individual frailties, for each MCMC method