On the Estimation and Comparison of Lifetime Morbid Risks

Lifetime morbid risks are usually determined either by the KaplanMeier product limit estimator or by simpler estimators such as the lifetime prevalence, the Weinberg method or the Schulz method, which can be considered an elaboration of the Weinberg method. We show that the Kaplan-Meier product limit estimator of lifetime morbid risk may yield unreliable estimates. Although the simplicity of the Schulz method and the Weinberg method is appealing, we suggest that under a proper model, those methods can be replaced by the original Strömgren estimator which is almost equally simple, and more accurate. Increased accuracy is achieved when the investigators have prior indication regarding the distribution of the ages at onset for those affected by the disorder, and even when that indication is vague and only limited knowledge of the distribution is available.


Introduction
The assessment of lifetime morbid risk is an integral part of the epidemiologic assessment of many diseases and disorders.Estimation of lifetime morbid risk is among the goals of many cross-sectional surveys, either in the population at large or in subpopulations such as particular socio-economic groups, families of affected individuals or individuals with other co-morbid diseases (e.g.Venkat Narayan et al., 2003;Wolfe et al., 2003;Seshardi et al., 2006;Lloyd-Jones et al., 2002;Feuer et al., 1993).
Our goal in this paper is to compare the various methods for estimating the lifetime morbid risk from cross-sectional survey data: the lifetime prevalence, the estimators of Weinberg, Schulz and Strömgren, the Kaplan-Meier estimator and the maximum likelihood (ML) estimator.The survey might be a random sample from the population at large or a focused study on a particular sub-population.In a common example of the latter type of study, the sub-population consists of affected individuals and the goal is to assess co-morbidities associated with the primary disease.
The estimators differ in the way that they exploit survey data to estimate morbid risk.Some of the estimators adjust for the onset ages of those subjects who have the disease whereas others adjust for the age at the time the subject is interviewed.Some of the estimators take advantage of concomitant information on the distribution of age-at-onset among all individuals who contract the disease.
We examine the quality of the estimators under the assumption that the ageat-onset distribution is known and also under the assumption that only partial information on the distribution is available.This study's results demonstrate that some of the most commonly used estimators can be seriously biased.The Kaplan-Meier method, although widely used (e.g.Pauls et al., 1995;Nicolson et al., 2003;Asarnow et al., 2001;Do Rosario-Campo et al., 2005), can yield disturbingly unreliable estimates of lifetime morbid risk.Moreover, simple and more accurate estimators are readily available.
We limit our study to lifetime morbid risk in a single population.An interesting and useful extension is to include group comparisons or general dependence on covariates.Kuk and Chen (1992) showed how the Kaplan-Meier approach to lifetime morbid risk could be used to address these questions.Their approach combines a proportional hazards model to account for the covariates with a nonparametric model for the baseline hazard.

The Estimators of Lifetime Morbid Risk
We begin with the simplest estimators of lifetime morbid risk: the lifetime prevalence and the Weinberg, Schulz and Strömgren estimators.Then we present the ML and Kaplan-Meier estimators.
Lifetime Prevalence.Some studies (e.g.Kringlen et al., 2001;Nestadt et al., 2000;Bienvenu et al., 2000;Black et al., 1992) report only the sample proportion of affected subjects, known as lifetime prevalence (e.g.McGuffin et al., 1994) and denoted by L-MR = A/n, Where A is the total number of affected individuals in a sample of size n.The lifetime prevalence typically underestimates the lifetime morbid risk, denoted by p, as at the time of the study some disease-free subjects will still be at risk of contracting the disease.
Weinberg and Schulz.Weinberg (1925, and1928) and Schulz (1937) proposed simple estimators of the lifetime morbid risk that can be used when the disease has an established risk period (say, 17-45 for a disorder like schizophrenia).These estimators also adjust for the ages of the subjects at the time of the interview.The Weinberg method is widely used in empirical studies (e.g.Reddy et al., 2001;Somanath et al., 2002 and2002a;Silverman et al.,1993).It is similar in form to the lifetime prevalence, but modifies the denominator to reflect whether subjects were affected or not and, among the latter, weights the observations according to age at the time of the interview.Let U 1 , U 2 and U 3 be the number of unaffected subjects who were younger than the minimal risk period age, within the period of risk or older than the maximum risk period age, respectively.The Weinberg estimator for lifetime morbid risk is given by The denominator, often referred to by the German term Bezugsziffer, or BZ, is meant to approximate the number of lifetimes at risk.Subjects who have not entered the risk period are not counted at all and those currently in the risk period are counted with weight 0.5.Schulz (1937) suggested a modification of the Weinberg method in which the weights for subjects in the risk period account for their specific age at the time of interview.The Schulz estimator is defined as Where T = ∑ i l i and l i denotes the proportion of the risk period that the ith individual has completed at the time of the interview.Unlike the Weinberg method, the Schulz estimator does not implicitly assume that the ages at the time of the interview are a random sample from the corresponding distribution of onset ages.However, it does implicitly assume that age at onset is uniformly distributed within the risk period, which can be very inaccurate, and as shown below, may yield substantially biased estimates.

Strömgren.
The Strömgren method (Strömgren, 1935) and the modified Strömgren method (Strömgren, 1938) are used to estimate lifetime morbid risk in many psychiatric studies (e.g.Baron et al., 1985;Lenane et al., 1990).These estimators assume that the conditional age-at-onset distribution for affected individuals is known.The Strömgren denominator sums weights for each individual that reflect the proportion of risk for onset that the individual has experienced, rather than the proportion of risk period time.The original Strömgren estimator is defined as Where a i is the age at the time of the interview of the i-th individual in the sample, and D(a i ) is the corresponding conditional probability of being affected by age a i , given that an individual is affected.In general, as in Risch (1983), we discretize the onset-age distribution to the nearest year, and consider the period of risk of J years beginning at age γ 1 and ending at age γ J .We denote by d j the conditional probability that the disorder occurs at age γ j , given that the disorder occurred during the lifetime, and by D j = d 1 + • • • + d j the cumulative conditional probability that the disease occurs by age γ j given that the disease does occur.For any age γ K ≥ γ J , D K = 1.The overall probability of disorder by age γ J is p, i.e. the lifetime morbid risk.At the other end of the scale, D L = 0 for γ L < γ 1 .The original Strömgren estimator is unbiased and its standard deviation is (see e.g.Winokur et al., 1964) .
In some cases, especially when sample sizes are small and the lifetime morbid risk is large, the original Strömgren estimator can give estimates that exceed 1.To avoid this problem, Strömgren suggested setting the weights of the affected to 1, as in the Weinberg estimator, and giving the modified Strömgren estimator (Strömgren, 1938): where Y 1 , i = 1, 2, . . ., n is a dichotomous variable which assumes the value one if the i-th individual is affected, and zero otherwise.The revised denominator assures that the modified Strömgren estimator is always less than or equal to 1, but also induces bias.An alternative estimator, which is slightly biased but in general less so than SM-MR is obtained by setting the original Strömgren estimator to equal 1 whenever the computed value of SO-MR exceeds 1.
Comparison of Ratio Estimators.The lifetime prevalence, the Weinberg estimator, the Schulz estimator and the original Strömgren estimator all have the form A/ ∑ w i , where w i is a weight associated with the i -th subject.It is instructive to compare the weighting schemes.The Strömgren estimator assumes knowledge of the age-at-onset distribution and weights according to the probability of onset by the subject's age.The Schulz estimator weights by the proportion of the risk period and corresponds to a special case of the Strömgren estimator, in which the onset distribution is uniform throughout the risk period.The Weinberg estimator gives full weight to each affected subject and half-weight to unaffected subjects in the risk period, in effect assuming that the ages of those in the risk period are a random sample from the age-at-onset distribution.
For the lifetime prevalence and the Schulz estimator, the denominators are constants, so we can easily compute their expected values.
, and The lifetime morbid prevalence has a clear negative bias.Following our comparison above of the weighting schemes, the bias of Schulz' estimator depends on whether the age at onset for affected individuals is early or late in the risk period.
The standard deviations of the lifetime morbid prevalence and the Schulz estimator are obtained by multiplying σ( SO-MR) by the corresponding constants from the formulas for the expected values.The standard deviations can be estimated by substituting the unbiased estimator SO-MR for the unknown p.

Maximum Likelihood
The maximum likelihood estimator is based on the fact that the probability that subject i is affected at the time of the interview is pD(a i ).Assuming that the age-at-onset distribution is known and that the subjects are statistically independent, the maximum likelihood estimator (MLE) for the lifetime morbid risk p can be found by solving the equation (Risch, 1983): .
The MLE for p (which we denote by M Lp) depends on the ages at the time of interview for unaffected subjects, but ignores all age data, both interview and onset, for affected subjects.If the solution to the equation is greater than 1, or if all subjects are affected, M Lp = 1.Risch (1983) shows that the ML method can also be extended to jointly estimate the lifetime morbid risk and the age-at-onset distribution.
The computation of M Lp requires numerical methods.We briefly outline a simple method for solution in Appendix 1.We also present there an approximate formula for the variance of M Lp.Our simulation study also indicates that, at least on the average, the differences between M Lp and the unbiased estimator SO-MR are very small.We also present in Appendix 1 formulas for computing an approximate standard error for M Lp.

Kaplan-Meier.
In recent years the Kaplan-Meier (KM) estimator (Kaplan and Meier 1958) is probably the method most commonly used to assess lifetime morbid risks.The KM estimator provides a non-parametric estimate of the entire survival function.The KM estimator of the probability S(g) of remaining diseasefree at age g is ŜKM where r j and s j are the size of the risk set (i.e. the number of subjects who had potential to develop the disorder) and the size of the subset of the sample who remained disease-free, respectively, at age g j .The associated KM estimator of lifetime morbid risk is KM where g M is the maximum onset age in the sample.Maller andZhou (1992, 1994) consider use of the Kaplan-Meier method to estimate the proportion of immunes in a censored sample by ŜKM (g M ).The proportion of immunes is the complement of the lifetime morbid risk and their estimator is the complement of the estimator KM-MR presented above.Maller andZhou (1992, 1994) present properties of the estimator, including conditions that assure that it is a consistent estimator.Their results will obviously hold for KM-MR as well.The primary condition is the need for "sufficient followup", meaning that the data must include many censored times that exceed the maximum possible failure time.Maller and Zhou (1994) develop a non-parametric test to examine this question.They do not suggest using external data on the onset age distribution, but it would appear that such data could also be valuable in checking for sufficient follow-up.

The Sensitivity of the Kaplan-Meier Estimate for Lifetime Morbid Risks
The KM estimator was designed for survival analysis and estimates the entire survival curve.However, it is not necessarily ideal for estimating lifetime morbid risk.The KM method implicitly assumes that "death" will eventually occur for all subjects.Indeed, at the extreme, the lifetime mortality rate is obviously 1, regardless of the data.On the other hand the lifetime morbid risk is less than 1.Many diseases have well-defined risk periods, so theoretical hazards should be set at zero for ages outside that period.Obviously, the Kaplan-Meier estimator does not account for such prior knowledge.
The KM method is the only estimator of lifetime morbid risk that uses data on the onset ages of affected subjects.An important practical concern is thus the reliability of the recorded time of onset of a disorder.Often occurrence times are not exactly known, yet they have a strong influence on the KM estimator.We illustrate this phenomenon with data from a study of schizophrenia among the oldest siblings of probands with schizophrenia and OCD.Among 72 siblings, 3 were affected with schizophrenia, one reporting onset at age 20 and two at age 25.At the age of 20, 69 siblings had the potential to develop the disorder (3 were interviewed at age 19 and had no schizophrenia).At the age of 25 years, there were 53 potential siblings.The Kaplan-Meier estimate of probability for the lifetime survival (i.e.no onset), is thus 51 53 × 68 69 = .9483and the estimated lifetime morbid risk is 0.0517 or 5.17%.Now suppose that one or two of the subjects whose actual onsets were at the age of 25 reported onset at age 44.These changes increase the KM estimate of the lifetime morbid risk from 5.17% to 12.10%, and to 17.87%, respectively.Most investigators are likely to feel uneasy knowing that the estimator used can be affected so dramatically by a change in the age of onset of just one or two individuals.
Unlike the Kaplan-Meier statistic, the ages at onset of the affected subjects are not used in the computation of all the other estimators presented above.
For the example presented above with the period of risk from 17 to 45 years, Weinberg's estimator has A = 3, U 1 = 0, U 2 = 65, U 3 = 7 and W-MR = 7.06% for all three scenarios.The other estimators will depend on the specific distribution of age at the time of the interview, but not on the onset ages.

Robustness to the Onset-age Distribution
The calculations of SO-MR and M Lp assume knowledge of the conditional distribution of the ages at onset (the d j 's and D j 's).These values are obviously never known exactly.However for various diseases national registries and results of previous studies provide good approximations.Moreover, as shown in the simulation (next section), even partial knowledge of the distribution may be used to improve estimation of the lifetime morbid risk.We represent partial information by adopting constant d j 's for several consecutive ages in specific segments within the period of risk.
The estimator SO-MR becomes biased when the actual onset-age distribution differs from that assumed in calculating the estimator; see Appendix 2 for details.Our results can be used, for example, to bound the bias of the estimator by positing two "extreme" distributions for the onset-age distribution, one with the youngest onset ages and one with the oldest onset ages that seem plausible.

Comparison of the Estimators by Simulation
The performance of the estimators under various conditions was assessed in a large simulation study.The statistics assessed were the Kaplan-Meier morbid risk estimator, the Weinberg estimator, the original and the modified Strömgren estimators and the maximum likelihood statistic.The Schulz estimator is included as a special case of the SO-MR estimator when the age-at-onset distribution is taken as uniform (i.e.d j ≡ 1/J, see below), which might be adopted as an approximation when there is limited information about onset ages.
The simulation study was designed as a 3 × 3 experimental design with three patterns of conditional distributions of onset age and three distributions of ages at the time of the interview.The period of risk was defined to be from age 17 to 45 (inclusive), which corresponds to the accepted period of risk for adult schizophrenia.
The simulations include as one pattern of ages the data from a recent family study (Poyurovsky et al., 2005) of 92 siblings of probands with schizophrenia.In the data set there were six affected siblings.For comparability, we used n = 92 for all our simulations.The (unknown) lifetime morbid risk used for the first set of simulations was p = 10.67%, which is the Kaplan-Meier estimate of lifetime morbid risk in that family study.The subsequent simulations were performed under the same experimental design with various morbid risk values.
The patterns of ages of individuals at the time of the interviews were as follows: (a1) Ages were generated from the Poisson distribution with mean λ = 24, censored from below at the age of 17.
(a2) The actual ages at the time of the interview from our study of siblings of schizophrenia probands with n = 92, with a mean of 32.54 and a standard deviation of 12.04.As can be observed from the ratio between the mean and the standard deviation, the distribution is far more disperse than expected from the Poisson distribution.
The three interview age distributions were crossed with three different onset age distributions: (b1) early onset -the median of the ages at onset at 3/10 of the period of risk (in our case, median at age 25), (b2) mid-onset -the median is in the middle of the period of risk (median at age 31), and (b3) late-onset -the median occurs at 7/10 of the period of risk (median at age 37).
The assigned d j -values were monotonically and equally spaced increasing up to the median of the distribution and monotonically equally spaced decreasing afterwards.Thus, for the J 1 and J 2 years before and after the median, (J 1 + J 2 = J), d j = c 0 + jc 1 and d j = c 0 − c 1 J 1 − (j − J 1 )c 2 , respectively.In the simulations, the values assigned to the c 1 slopes were .0083,.0044and .0014,for the early-mid and late-onset, respectively.The corresponding c 2 slopes were −.0024, −.0044 and −.0050.The events of "affected" and "not affected" are generated with respect to those actual probabilities.
We compute the estimator SO-MR using the actual d j and D j values and also using two cases of "partial knowledge".In the first case, we only assume knowledge of the median onset age for the specific disease, and set the risks to be constant at one level up to the median and at another level after the median, i.e. for γ j ≤ median we assume d jAppl = 0.5/J 1 , while for γ j > median, d jAppl = 0.5/J 2 .
The second case assumes no knowledge of the distribution, and sets equal risks such that d j,App2 = 1/J for each age.The probability of being affected is proportional to the period of risk at the age of the interview.This assumption corresponds to Schulz's statistic.The App1 approximation can be used when the median onset is roughly known, whereas the App2 approximation might be adopted when there is no information.
The simulations are intended to assess the deviations of the various statistics from the (unknown) actual lifetime morbid risk.
Based on the generated data on the individuals in the sample, i.e. (i) their age at the time of the interview, (ii) whether the individual is affected, and if so, (iii) the age of onset, we calculated the sample statistics that assess the lifetime morbid risk.
The results for the first set of simulations with p = 10.67%, are summarized in Table 1.The rows in the three panels of the table are the 3 × 3 categories of patterns of age at onset within the risk period and patterns of interview ages.The first panel of the table presents the means and the standard deviations of the various statistics.The second panel presents the ratio of the mean to the population value of 10.67%.Note that when the median onset is in the middle of the period of risk the two approximations coincide, i.e. d j,App1 = d j,App2 (see Table 1).
The results obtained with the same estimators for the other simulated values of p (p = 5%, 50% and 80%) can be found in Table 2.For the same values of p, Table 3 compares the results obtained by the original Strömgren estimator with those obtained by the maximum likelihood method for both exact and partial knowledge of the age-at-onset distribution.
The results clearly indicate that whenever accurate or partial information on the distribution of the age at onset is available, incorporating it in the analysis can substantially improve the accuracy of the estimate for lifetime morbid risk.As expected, under actual D j 's the means of the unbiased SO-MR-statistics deviate only slightly form the lifetime morbid risk in the population (10.67%).In this case, the value of SO-MR did not exceed 1 in any of the simulations.When the actual distribution of the age at onset is known, the modified Strömgren estimator is biased downwards.
The lifetime prevalence has a clear negative bias, except in the sample of older persons with early onset, where most subjects are interviewed at ages with little remaining risk.The Weinberg method is usually biased downward in the younger sample (26%, 40% and 79% of the population value for late-mid-and early-onset, respectively) and upward in the older sample (133% and 150% of the population value for mid-and early-onset, respectively).The level of accuracy of the Schulz method (which is identical to the values achieved by the SO-MR under the uniformity assumption) depends mainly on the distribution of onset (as the distribution of ages is well accounted for).The method yields downward biased estimates for late onset (between 50% and 80% of the population value) and upward for early onset (between 130% and 160% of the population value).The Kaplan-Meier method performed poorly in the cases when the subjects are relatively young with respect to cumulative risk.In those cases the biases can be very substantial.With late onset the estimate is about 33% of the population value, and for the mid-onset case it is about 60% of the population value.The negative bias in these cases is not as severe as with the lifetime prevalence or the Weinberg method.When the subjects' ages are in the other categories, the Kaplan-Meier method performed well.
We also observe from Table 1 that even quite vague prior knowledge about the onset age distribution leads to a good estimator.Indeed, under the App1 approximation, which only assumes knowledge of the median and sets the risks to be uniform in the two segments, the resulting SO-MR-statistic is superior to all the statistics that make no assumption about the distribution of the age at onset (including the Kaplan-Meier estimator).This finding may have an even more important practical implication than those derived from the case of perfect knowledge of the age-at-onset distribution.
The results from Table 2 suggest that the performance of the statistics is not very sensitive to the level of the lifetime morbid risk.Indeed, the results for p = 5%, 50% and 80% are similar to those from Table 1 (where p = 10.67%).A minor exception is observed in the performance of the Weinberg statistic for older subjects with early and mid-onset.The statistic performs considerably better when p = 50% or 80% than when p=5% (where the results are comparable to those in Table 1).Another technical difference occurs when the morbid risk is very high (p = 80%).In this case, the occurrences of estimated morbidity exceeding 1 are not ignorable, and the modified Strömgren estimator is superior in a limited number of cases.However, the use of the truncated original Str?mgren estimator is superior in the majority of the cases that we examined.
The results in Table 3 show only minor differences between the original Strömgren estimator and the maximum likelihood estimator, both for the case of knowledge of the age of onset distribution as well as for the partial information case.As mentioned, the drawback of the M Lp-statistic is that its computation requires numerical methods.The maximum likelihood estimator is however to be preferred when both the lifetime morbid risk and the distribution of the age of onset are to be estimated simultaneously.
It is worth reminding that in specific samples, the deviations from the actual values may be even more pronounced than those from Tables 1-3, which present means and standard deviations of 1,000 samples (each sample with n = 92).We illustrated such deviations in the section on the sensitivity of the Kaplan-Meier estimate, in which we perturbed one or two ages at onset.Thus, it seems even more important to use statistics that are robust to such fluctuations, which can result from incorrect reporting.

Discussion
Morbid risk is characterized by specific features that differ from those that characterize mortality risk.The period of risk is usually well defined and the hazard rate often does not increase monotonically with age.Furthermore, the lifetime morbid risk is far from being 100%.
We have shown that incorporating the onset age distribution can improve the estimation of the lifetime morbid risk.As mentioned, for various diseases there are reasonably reliable data on the distribution of the ages at onset for those who suffer from the disorder.Furthermore, for some disorders, the level of knowledge is even more detailed.As an example, Suvisaari et al. (1998) present distributions of the age at onset of schizophrenia specific for gender and the degree of familial loading.
As a first conclusion from the simulation study, it is clear that, although the simplicity of the Weinberg and the Schulz methods is certainly appealing, those methods can be replaced by the original Strömgren method which is almost equally simple and yields far more accurate estimates.Furthermore, the well known bias of the modified Strömgren estimator (e.g.Thomson and Weissman (1981)) is illustrated in the simulations.It is shown that the modification of the original Strömgren estimator is in general unadvisable.At least in medium and large size samples when the lifetime morbid risk is not very large, the probability that the original method will lead to risk estimates greater than 1 is very low.
The results of the study show that the Kaplan-Meier method is in general inferior to the original Strömgren method in assessing the lifetime morbid risk.The Kaplan-Meier estimator gave poor results when the subjects are young relative to likely onset ages.Our simulation results match the theoretical study of Maller andZhou (1992, 1994).They found that the KM estimator was prone to poor performance when the data lack sufficient follow-up, i.e. when the non-affected individuals are relatively young compared to the oldest affected individuals.The problem of insufficient follow-up is most severe in our simulations with young subjects and the late onset distribution.A further drawback of the KM estimator is its sensitivity to mis-reporting of onset ages for affected subjects, whereas the original Strömgren method does not require these ages.Given the wide use of the Kaplan-Meier estimator in assessing lifetime morbid risk, the results on its relative inferiority may have considerable practical significance.
The SO-MR and the M Lp-statistics incorporate the onset age distribution among the affected in the estimation of lifetime morbid risk.Given knowledge of the actual onset age distribution, the SO-MR and M Lp-statistics are unbiased and maximum likelihood estimators, respectively, of the lifetime morbid risk.
In the single sample case, we found that the SO-MR and MLp statistics yield very similar results.The M Lp statistic requires iterative computation, so the SO-MR statistic has a clear advantage in this respect.However, as suggested by Risch, the M Lp statistic can be used to estimate simultaneously the morbid risk and the age of onset distribution.This is certainly a desirable feature and may be useful when the distribution of age of onset for the studied population is not available or is unreliable.However, if the sample size is not very large, given the many parameters to be estimated, we suggest that one should proceed with cau-tion in using such estimates.Also, it may be advisable to fit appropriate models for the distribution of the age of onset, to avoid cases of estimated parameters which may be in disagreement with clinical observation.As an example, in the analysis of the data of Winokur et al. (1964) on major affective illness the MLEs for the probability of onset in six discrete 10-year age intervals were 8%, 15%, 30%, 19%, 7% and 21%, corresponding to a multimodal distribution of age at onset. .The drop in the 50-59 age group from 19% to 7%, followed by an increase to 21%, may contradict clinical observation.To avoid such occurrences, the use of a parametric distribution or a smoothing procedure may be advisable.
Furthermore, this study has shown that even if the onset information is imperfect, very simple approximations still yield good estimators.At least under the first approximation, which assumes only knowledge of the median of the onset age distribution, the results are superior both to the Kaplan-Meier estimator and to the ratio estimators whose denominators account only for the periods of exposures of the individuals in the sample and the age range of risk, but not on the distribution of age of onset within the period of risk.In extreme cases of assumption of a uniform distribution, the Kaplan-Meier statistic may yield more accurate estimates.This happens, for example, in Table 1 when we use an uninformative distribution for age at onset (the second approximation) and the actual distribution is early-or late-onset.However, when some relevant information on ages at onset is available, the SO-MR and M Lp statistics typically yield more accurate estimates.
As a practical comment, we suggest that the fact that the SO-MR and M Lp statistics do not depend on the ages at onset can be considered a further advantage over the Kaplan-Meier estimator.Indeed, the level of accuracy for reported ages of disease onset is usually much lower than the corresponding information on mortality ages.

Appendix 1
We now turn to finding the MLE for p.The log likelihood is given by log where ∑ Y i =1 log d(a i ) is independent of p.If all the subjects are affected, the likelihood is monotone increasing in p and so M Lp = 1 and if none are affected, M Lp = 0. Otherwise, we proceed by equating the first derivative of log L to zero, obtaining the likelihood equation As p tends to 0, T (p) tends to infinity.So if T (p) has no roots between 0 and 1, it is always positive there, meaning the log likelihood is monotone increasing and M Lp = 1.If there is a root, it must be unique, because both terms in T (p) are monotone decreasing functions of p for 0 ≤ p ≤ 1.
Simple iterative numerical methods can be used to solve the likelihood equation.Suppose we have at the m-th iteration a candidate solution p (m) ).Then we can expand T (p) in a first-order Taylor series about p (m) ) and solve the resulting equation for p.The revised candidate solution is where T (p) is the derivative of T (p) and is given by We can use the unbiased estimator SO-MR as the initial candidate value p (0) in the above iterative scheme.We can also use the above results to provide an approximate standard error for M Lp.Standard asymptotic results for maximum likelihood estimators imply that (M Lp − p)/[−T (M Lp)] −1/2 has approximately a standard normal distribution, so that [−T (M Lp)] −1/2 is an approximate standard error for M Lp.Simulations like those carried out to compare the estimators find that the SE for M Lp is quite similar to the SE for the unbiased estimator SO-MR.An alternative estimator based on the expected information was proposed by Larsson and Sjögren (1954).

Appendix 2
We assess here the properties of SO-MR when an approximate onset age distribution is used.As before, we denote by d j and D j the actual onset, and cumulative onset probabilities, conditional on contracting the disease.Now, suppose we don't know the true probabilities and instead use the approximate values D(a i ).The estimator of lifetime morbid risk is then SO-MR = A/ ∑ n i=1 D(a i ), and its expected value is given by E(SO-MR) = p ∑ n i=1 D(a i )/ ∑ n i=1 D(a i ).Use of the approximate onset distribution induces bias, as given by the ratio of the sum ofs cumulative onset probabilities.A practical upper bound on the bias can be found by supposing that the true onset distribution is the "stochastically smallest" that is feasible, i.e. by setting the true cumulative probabilities to the smallest values that are deemed feasible.Similarly, setting these probabilities to their maximum feasible values gives a practical lower bound on the bias.The standard deviation of SO-MR is affected by the same multiplier, σ(SO-MR) = σ(SO-MR) * { ∑ D(a i ) ∑ D(a i )

Table 1 :
Means, standard deviations and biases of the estimated lifetime morbid risks by the various methods.The "real" lifetime morbid risk is 10.67%.

Table 3 :
Ratios of the simulation means and the "real" lifetime morbid risks for various p-values