Maximum Likelihood Estimation for Ascertainment Bias in Sampling Siblings

When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter. Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one affected sibling (proband) of each family. In these studies, an estimate of the proportion of siblings with the disease will be inflated. For example, studies of the issue of whether a rare disease shows an autosomal recessive pattern of inheritance, where the Mendelian segregation ratios are of interest, have been investigated for several decades. How do we correct for this ascertainment bias? Methods, primarily based on maximum likelihood estimation, are available to correct for the ascertainment bias. We show that for ascertainment bias, although maximum likelihood estimation is optimal under asymptotic theory, it can perform badly. The problem is exasperated in the situation where the proband probabilities are allowed to vary with the number of affected siblings. We use two data sets to illustrate the difficulties of maximum likelihood estimation procedure, and we use a simulation study to assess the quality of the maximum likelihood estimators.


Introduction
When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter of interest.Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one sibling (proband) of each family.In such studies, an estimate of the proportion of siblings with the disease will be inflated.Sometimes the situation is even worse; the investigator takes all families that appear in the hospital.Thus, there is a selection bias (e.g., Patil and Rao, 1978).Fisher (1934) illustrated the importance of adjusting for the selection bias.For a discussion of the problems of ascertainment bias in the analysis of family data, see Crow (1965).For example, studies of the issue of whether a rare disease shows an autosomal recessive pattern of inheritance, where the Mendelian segregation ratios are of interest, have been investigated for several decades.For a rare disease, the Mendelian segregation ratio is p = 0.5 for an autosomal dominant disease and p = .25 for an autosomal recessive disease.These follow from the first law of Mendel.For a rare disease one would be interested to know whether it is autosomal dominant or recessive.That is, whether p = 0.5 or p = .25respectively.But because the disease is rare, the investigator will select all those nuclear families that appear.Then there is a selection bias; specifically the estimates will be inflated.How do we correct for this ascertainment bias?Methods, primarily based on maximum likelihood estimation, are available to correct for the ascertainment bias.See Lange (2002, chap.2) and Sham (1998, chap.2) for very clear pedagogy on this problem.
Table 1 gives a set of data which was presented by Fisher (1934) to illustrate the need to take account of the method of ascertainment in segregation analysis.The data consist of 340 families all with five offspring.The family was ascertained through at least one affected offspring.One can count the total number of offspring to be 1700, the total number of affected offspring to be 623, and the total number of probands to be 432.[Sham (1998) gave an incorrect total of 434.]Thus, one might estimate the segregation ratio to be 623/1700 = .3665,and the ascertainment probability to be 432/623 = .6934.Unfortunately, these simple estimates are too inflated.Sham (1998) also used these data for illustration.We note that Fisher (1934) did not state that the data are on albinism, but one might believe so because his work was motivated by the study of albinism.It is currently known that there are various forms of albinism in which chromosomes (11,15,13,9, 10 and X) may become damaged or incomplete during mutation so that the proper proteins may not form, making the person albino.So that albinism does not come from a single chromosome.For illustration using these data, we will treat it as autosomal recessive as Fisher (1934) did.
Table 2 gives a set of data on cystic fibrosis which was presented by Crow (1965) to illustrate the need to take account of the method of ascertainment in segregation analysis.Cystic fibrosis is a hereditary disease that affects the mucus glands of the lungs, liver, pancreas, and intestines, causing progressive disability due to multisystem failure.The CFTR gene, found in Chromosome 7, is the cause of cystic fibrosis, where mutations result in proteins that are too short because of premature end to production.One can count the total number of offspring to be 269, the total number of affected offspring to be 124, and the total number of probands to be 90.Thus, one might estimate the segregation ratio to be 124/269 = .4610,and the ascertainment probability to be 90/124 = .7258.Again, these simple estimates are too inflated.Note that 46.1% which is far in excess of the 25% expected on simple recessive inheritance (cystic fibrosis is autosomal recessive).One reason for the excess is the ascertainment biasthe exclusion of families where the parents are heterozygous, but fail to have a homozygous recessive child.These would add to the number of normal children and thereby reduce the proportion affected.This data set was also used in Lange (2002) for illustration.Current data on cystic fibrosis of the same form from the state of Georgia are available, but because of confidentiality they cannot be used.
There are two major differences between the two data sets.First, in Fisher's data the family sizes are all the same, but in Crow's data the sample sizes vary from 1 to 10.There are 340 families in Fisher's data, but there are only 80 families in Crow's data.Therefore, because maximum likelihood estimation has optimal asymptotic properties, it may be more appropriate in Fisher's data.
We describe the ascertainment bias problem in the study of rare autosomal recessive disorders.It is almost always the case that a disease is inherited from carrier parents when the disease is rare in the entire population.The number of at-risk parents is usually small (i.e., the number of parents capable of producing affected siblings is very small relative to the number not capable of producing affected siblings).So if a sample is taken at random from the entire population, there could be no at-risk families.Hence, at-risk families are divided into two groups, those with at least one affected sibling and the other with no affected siblings.A sample is then drawn from the families with at least one affected sibling, thereby introducing an ascertainment bias.Thus, our two examples can be viewed in this manner, and as is evident in both examples, a direct estimate of the proportion of affected siblings will be too large; one needs to adjust for the ascertainment bias.
When all families with affected offspring are ascertained, we say that there is complete ascertainment.When there are families with affected offspring who are not probands, we say that there is incomplete ascertainment.Fisher (1934) first analyzed the data in Table 1 using complete ascertainment.His analysis was done using a truncated binomial distribution.However, Fisher (1934) also described a simpler method for the more appropriate incomplete ascertainment for these data.This discussion was further developed by Bailey (1951) and Morton (1959).In this paper, we will focus on incomplete ascertainment as is evident in data in both Tables 1 and 2. Crow (1965) pointed out the need to adjust for ascertainment bias and incomplete ascertainment for the cystic fibrosis data.
The key idea for the correction of ascertainment bias is to find the correct sampling distribution under the ascertainment bias.Let x represent the quantity being measured, A denote the ascertainment event, and θ a parameter.Without the ascertainment bias, f (x | θ) is the sampling distribution for a random sample.However, when there is an ascertainment bias, we need f In general, the two sampling distributions f (x | θ, A) and f (x | θ) are different; f (x | θ, A) being the more appropriate sampling distribution.Correcting for ascertainment bias means that we need to construct the sampling distribution, f (x | θ, A).A simple example, introduced by Fisher (1934) for complete ascertainment, is on the number (r) of affected siblings in a family of size (s) in a binomial model with r > 0.Then, where θ is the proportion of affected siblings, and A is the event that r > 0, leading to the binomial distribution truncated at 0. More importantly the binomial probabilities are being re-weighted (increased in this case) so that the mass points are 1, . . ., s; 0 is excluded.
The problem of ascertainment is not new to survey samplers.For finite population sampling, Sverchkov and Pfeffermann (2004) defined the sample and sample-complement distributions as two separate weighted distributions (Patil and Rao, 1978) for developing design consistent predictors of the finite population total; see also the more recent presentation (Pfeffermann and Sverchkov, 2007).Malec, Davis and Cao (1999) used a hierarchical Bayesian method to estimate a finite population mean for binary data.These works are not directly applicable to our situation, but the ideas they portray are important for the issues associated with ascertainment bias.For probability proportional to size (PPS) sampling Nandram (2007) implemented surrogate sampling techniques to provide simulated random samples by using a model which reverses the selection bias.Under PPS sampling, Nandram et al. (2006) used a method, developed by Chambers, Dorfman and Wang (1998), to do Bayesian predictive inference when a transformation is needed.
We wish to study how inference about the segregation ratio changes with the proband probability.So we consider two cases.In the first case we consider a single proband probability, and we discuss extensively maximum likelihood estimation.In the second case, we consider how inference about the segregation parameter will change when there are different proband probabilities.In fact, we allow the proband probabilities to depend on the number of affected siblings in each family.Because there are more parameters in the analysis of the same data, maximum likelihood estimation should be relatively inefficient.
In this paper we provide some new distribution results and algorithms on maximum likelihood estimation of the ascertainment bias problem in which we assume incomplete ascertainment.The plan of the rest of the paper is as follows.
In the next section we review maximum likelihood estimation, and we present some new analytical results.Specifically, we discuss existence of maximum likelihood estimators, how to compute them, and what inferential difficulties exist.In the third section, we present numerical results and a simulation study.We also show how to incorporate different proband probabilities.As we will show, this task is particularly challenging for maximum likelihood estimation.The final section has a discussion where, in addition to a summary, we discuss how one might fix the problems associated with the maximum likelihood estimation procedure.(1986) discussed many ascertainment models.In this paper, we discuss the simplest ascertainment model (Sham, 1998;Lange, 2002).Essentially Lange (2002) showed how to adjust for the ascertainment bias using the expectation-maximization (EM) algorithm (Dempster, Laird and Rubin, 1977); Sham (1998) used Fisher's scoring.We will introduce a couple new methods as well.

Thompson
Suppose there are n families selected through ascertainment sampling.Letting the k th ascertained family have s k siblings, we assume that there are r k affected siblings and a k ascertained siblings.In Fisher's data the s k are all equal 5, and for Crow's data s k vary from 1 to 10.Let p denote the segregation probability and π the proband probability.Here, p is primarily the parameter of interest.The simplest ascertainment model specifies that (2.1) Note that (2.1) provides the likelihood for any family without conditioning on whether it is ascertained or not.To incorporate the ascertainment bias, we need to adjust (2.1) to the support 1 Now, the probability that a family with s k siblings is ascertained is 1 − (1 − pπ) s k , leading to the truncated probability mass function 2) provides the likelihood for a family that has been ascertained.Thus, in the terminology of missing data, while (2.1) is the complete data likelihood, (2.2) is the incomplete data likelihood.This is a fairly long section, so we present a plan.First we present some new theoretical results on the properties of the joint probability mass function.Then we present maximum likelihood estimation; most of this is known, but we present some new ideas as well.
Henceforth, unless otherwise stated all inference will be conditioned on s k and A. However, for convenience we will drop this notation.

Properties of the joint probability mass function
We present some properties of the joint probability mass function p(a k , r k | p, π) in (2.2).Again, note that we still have the conditioning on s k and A, but it will be eliminated for convenience.We provide some interpretations as well.We note that some of the results are new.
First, we consider the marginal distribution of r k .Using (2.2), let all other points have zero probability.In Appendix A we show that ) is bigger than s k p with the discrepancy related to p, π and s k .With some cumbersome algebraic manipulation, we also show in Appendix where Also, for a family that has not been ascertained (i.e., a k = 0), it is easy to show that is the probability of having at least one affected sibling in the k th family with a k = 0.
The marginal probability mass function of a k is all other points have zero probability.That is, p(a k | p, π) is a truncated binomial probability mass function.It is easy to show that and Thus, as expected, E(a k | p, π) increases from s k πp, and Var(a k | p, π) decreases from s k πp(1 − πp).
In Appendix B, we show that We also show that (1 − πp) s k −1 {1 + (s k − 1)πp} is nonnegative.Thus, the correlation between a k and r k is nonnegative, and therefore, there may be important information about p (via the r k ) in π (via the a k ).
In fact, the conditional probability mass function of r k given a k is also interesting.It is easy to show that Thus, in the conditional probability mass function, expectation increases with a k and variance decreases with a k [i.e., a knowledge of a k is informative, consistent with Sham (1998)].Sham (1998) used Fisher's data to illustrate this issue, but here we have obtained an analytical argument.Finally, letting a = ∑ n k=1 a k , r = ∑ n k=1 r k and s = ∑ n k=1 s k , without selection bias the maximum likelihood estimators of p and π are p = r/s and π = a/r respectively.These are the MLEs under the model without the ascertainment bias in (2.1).We will denote the MLEs with selection bias by p and π, which are to be determined.These are the MLEs under the model with the ascertainment bias in (2.2).

Estimation procedures
We discuss maximum likelihood estimation of p and π under the reasonable assumption that the families are sampled independently.This is the same assumption used throughout the historical development since the pioneering work of Fisher (1934); see Lange (2002, chap.2) and Sham (1998, chap. 2).Then, the likelihood function for all ascertained families is It is pertinent for us to show that if a > n, the maximum likelihood estimators (MLE) of p and π exist.For example, if a k = 1, k = 1, . . ., n, MLEs may not exist.That is, if exactly one sibling is ascertained in each family, MLEs may not exist.Also, if each family has exactly one sibling, the likelihood function is a constant in the unit square (i.e., 0 ≤ p, π ≤ 1), and every point in the unit square is an MLE (i.e., the MLE is not unique).It is true in both Fisher's and Crow's examples that a > n.To prove the existence of the MLEs, we note that because The maximum point (p, π) of the likelihood function exists (inside the unit square) if r > n, s > r, a > n and r > a.This is true because the function There are at least four methods to find the maximum likelihood estimators of p and π.One can use an optimization routine such as Nelder-Mead algorithm or Newton's method directly.Sham (1998) used a Fisher scoring algorithm, and Lange (2002) used the EM algorithm.We have developed a much simpler algorithm.
It is worth noting here that if we differentiate the log-likelihood function in (2.5) to obtain the maximum likelihood estimators of p and π, we need to solve the two equations simultaneously These are the equations that constitute our new iterative method.We start with p set at p and π set at π in the left-hand sides of these equations to update the right-hand sides of these equations.Thus, it is mathematically clear that p and π are inflated by πp(1 − p)q/s and pπ(1 − π)q/r respectively, thereby accounting for the ascertainment bias.More importantly, it is easy to solve these equations iteratively by simply replacing p and π in the right-hand sides of these equations and updating the left-hand sides accordingly; it is sensible to start with p = p and π = π.
In fact, we have maximized the logarithm of the likelihood function of (p, π) in (2.5) directly using the Nelder-Mead algorithm (Nelder and Mead, 1965) to get the maximum likelihood estimators (π, p).Unlike Newton's and the Fisher scoring algorithm, the Nelder-Mead algorithm is derivative-free; both Newton's and Fisher scoring need the first derivative and while Newton's method need the Hessian matrix, the Fisher scoring algorithm needs the information matrix (i.e., expected value of the negative Hessian matrix).Both of these methods are rather inefficient near the boundaries of the parametric space (e.g., p or π near 0 or 1).
Lange (2002) used the expectation-maximization (EM) algorithm.However, he has used an additional assumption in the EM algorithm.His key argument is, "If we view ascertainment as a sampling process in which unascertained families of size s k are discarded one by one until the k th ascertained family is finally ascertained, then the number of unascertained families discarded before reaching the k th ascertained family follows a shifted geometric distribution with success probability 1 − (1 − πp) s k ."His EM algorithm gives the MLEs by solving iteratively as in (2.6).No measure of variability was presented, and any measure of variability will be too small because of the additional assumption.Essentially, Lange (2002) assumes that the missing sibship sizes are known, but he did no say this explicitly.In fact, no EM algorithm exists in the original model with missing sibling sizes.However, we observe that it is much easier to solve the MLE equations in (2.6) by first updating p only.Using (2.6) we get Substituting πp into (2.6), and solving for p and π, we get (2.8) (2.9) where w = {(1 − p)(1 − π p)} s k .Thus, we start with p set at p in the righthand side of (2.8) to obtain p on the left-hand side, and iterate until convergence to p.Then, we substitute p into (2.9) to get π without iterations.Of course, convergence is much faster than updating (p, π) simultaneously.
It is also easy to find the negative inverse Hessian matrix to get an approximation for the covariance matrix of (π, p).Sham (1998) gave a form for the standard errors from his Fisher scoring, but he did not present the correlation between the estimators.Lange (2002) gave the EM algorithm, but he did not present any measure of precision of his estimators.It is a standard practice to use the inverse negative Hessian matrix, evaluated at the MLEs to get an approximation of the covariance matrix.Thus, it does not matter which method is used to get the MLEs, the covariance matrix is the same.In Appendix B we present the covariance matrix.We note in Appendix B that if the MLEs exist, the covariance matrix will be positive definite, and therefore, the MLEs are unique.
In Appendix B we have also shown that a sufficient condition for the correlation between p and π to be nonnegative is s k − 1 ≥ 4πp.In the study of autosomal recessive typically π, p ≤ .50 and πp is greater than a number which is smaller than 1/4.So that a sufficient condition for nonnegativity is that s k ≥ 2. This excludes families with one sibling, but if there are not too many of these, the correlation will be nonnegative.

Results
This section has three parts.First, we present numerical results for Fisher's data and Crow's data.Second, we perform a simulation study to assess the performance of the maximum likelihood estimators of the segregation ratio and proband probabilities.Third, we show that there are further difficulties of the maximum likelihood estimators when the proband probabilities vary with the number of affected siblings within a family.

Numerical results
Essentially we have used all the numerical methods we have discussed, and we have found that they gave the same estimates of the MLEs.Specifically, it is good that they agree with the Nelder-Mead algorithm.For Fisher's data, the EM algorithm gives p = .253,π = .475;these are consistent with the estimates provided by Sham (1998).Their standard errors of .0129and .0310are also consistent with the ones we obtained.We also have a reasonable correlation of .250 between p and π.We have used both the Fisher's information and the negative inverse Hessian matrix (without taking expectation) to compute the standard errors and correlation; they are in perfect agreement.Sham (1998) used Fisher's information to obtain the standard errors of p and π, but he did not present the correlation between p and π.We note that for the case in which the information on the probands are ignored, Sham (1998) reported standard errors of .0286and .2400,showing large gains in precision in the method that includes the probands.
For Crow's data, the EM algorithm gives p = .268,π = .359;the standard errors are respectively .0347 and .0814with a small correlation of .248.These are consistent with the estimates given by Lange (2002); the standard errors were not provided.As pointed by Lange (2002), these estimates are consistent with the theoretical value of .25 for an autosomal recessive as in cystic fibrosis.
It is possible to provide approximate 95% confidence interval for p by using the asymptotic normality of maximum likelihood estimators.We have used the intervals p ± 1.96ST E(p) where p is the maximum likelihood estimator and ST E(p), the standard error, obtained from the information matrix.Similarly for π, we have used the interval π ± 1.96ST E(π), where π is the maximum likelihood estimator and ST E(π), the standard error, obtained from the inverse negative Hessian matrix.For Fisher's data the approximate 95% confidence interval for p is .253± 1.96 × .0129which gives (.228, .278).Note that the 95% credible interval for p contains .250,consistent with an autosomal recessive inheritance.For Crow's data the approximate 95% confidence interval for p is .268± 1.96 × .0347which gives (.200, .336).We also consider inference about π.For Fisher's data the approximate 95% confidence interval is .475±1.96×.031 which gives (.414, .536).For Crow's data the approximate 95% confidence interval is .359± 1.96 × .081which gives (.200, .519).Note, as for Fisher's data, the 95% credible interval for p contains .250,consistent with an autosomal recessive model.

Simulation study
We have performed a small simulation study to assess the performance of the maximum likelihood estimation procedure.We have generated data from the model with ascertainment bias in (2.2), and we have fit the model using maximum likelihood estimation procedure.We have taken p = .257,π = .371to obtain data similar to Crow's data.To study the effect of the sample size n, we have taken n = 25, 50, 100, 200; smaller values of n should challenge the maximum likelihood procedure.
We have generated 1000 data sets from the model that includes the ascertainment bias.From Crow's data, we have obtained the distribution of the ten family sizes 1, 2, . . ., 10.The frequencies of the family sizes are 9, 24, 16, 13, 9, 2, 4, 1, 1, 1. Thus, using the table method, we draw n family sizes for each of the 1000 simulated data sets.Now, noting that we use the composition method to draw a k from p(a k | p, π), and with this value of a k , we draw r k from p(r k | a k , p, π), where r k = a k , . . ., s k .Here p(a k | p, π) is a truncated binomial probability mass function; so we draw a k from Binomial(s k , πp) and accept it if it is larger than 0. Because we first draw r k − a k from this binomial probability mass function and add a k to it.We repeat this process for all n families.
In Table 3 we present the results for the simulation study.We consider each measure in turn.As expected, the MLE's for p and π converge respectively to the true values.However, p is closer to the true value than π for all sample sizes; π is noticeable far away for n = 25.As it must be, the standard errors go down with increasing n (so must be the widths).The coverage is not so good for n = 25 or n = 50 when p is estimated; this is worse when π is estimated.The MSEs seem fine for p, but off for π especially at n = 25 and n = 50.
Therefore, as expected for small sample sizes the maximum likelihood estimation does not perform well.However, as expected maximum likelihood estimation procedure does perform well for larger sample sizes.In fact, we have found that for small sample sizes, the lower end of the 95% confidence intervals under maximum likelihood estimation are smaller than 0, a standard problem with maximum likelihood estimation.An interval extended below zero has to be truncated at zero, and extended beyond two standard errors from the maximum likelihood estimate to get the nominal coverage of 95%; in practice just the truncation is done.

Discussion
When one wants to find out about the proportion of people with a rare disease, one cannot take a random sample from the population.It is convenient to take a random sample of the cases that appear in a doctor's office.Thus, clearly this sample is biased (i.e., there is an ascertainment bias).An important example in genetics occurs when one is interested in the segregation ratio for a rare recessive disease.This problem exists over a century, and there are many solutions depending on the sampling scheme.More generally the selection bias problem is important when a non-random sample is taken from a population as in population genetics.
We have considered the problem of estimating the segregation ratio and the proband probabilities when there is an autosomal recessive disease.We have summarized some approaches for finding MLEs in the ascertainment bias problem, and we have provided some new theoretical results.We have also provided a new algorithm, potentially faster, and we have presented some new interpretations of the associated formulas.We also considered the case in which the proband probabilities change with the number of affected siblings in each at-risk family.This is a challenge for asymptotic theory as in maximum likelihood estimation.The two popular methods, Newton's method or the Fisher scoring method, can not be used in this case.The use of different proband probabilities can lead to changes in inference about the segregation parameter over the case of a single proband probability.This is true for both Fisher's data and Crow's data, more so for Fisher's data.Finally, we discuss an alternative Bayesian procedure to maximum likelihood estimation.The basic difference between Bayesian method and maximum likelihood estimation is that in the Bayesian method the parameters are random, and they have prior distributions which arise from historical data and may not be informative.The prior distributions permit some flexibility with small sample sizes and many parameters (e.g., when the proband probabilities are allowed to vary with the number of affected siblings).For a single proband probability, we can take p, π iid ∼ Beta(α, β) where α and β are to be specified.For example, α = 1, β = 1 gives a noninformative and proper prior, and α = 1/2, β = 1/2 gives Jeffrey's prior.In general, one can incorporate important prior information using this prior distribution, and this can remove all the difficulties associated with maximum likelihood estimation procedure.Once a decision on a model is made, all the information about the parameters exist in their joint posterior distribution which is obtained using Bayes' theorem.The Bayesian method is more versatile than standard maximum likelihood estimation.Two advantages of Bayesian methods are they have simple interpretation, and they enjoy the recent development in Markov chain Monte carlo methods, the workhorse in Bayesian data analysis.We have shown, not presented in this paper because of space restriction, that the Bayesian procedure does overcome some of the difficulties associated with maximum likelihood estimation procedure, especially in the case where there are different proband probabilities.
In addition, not only the problem with unequal proband probabilities the Bayesian procedure can solve, but it will permit us to solve two more problems.First, we can include a familiar correlation in our model; for example, see Nandram and Choi (2005).The number of affected siblings in each family is not quite a binomial random variable.It is expected that one sibling getting affected will be related to the other siblings because they are in the same nuclear family sharing the same genes.For this problem, we have seen some improvements of the Bayesian procedure over the maximum likelihood procedure as well.Second, we can consider ascertainment bias that occurs in single nucleotide polymorphism (SNP) discovery, one of the issues that motivated this work.In SNP discovery a small sample of people is taken from the population, and these individuals are sequenced for a large number (≈ 10 6 ) of nucleotides.However, because of the low density of polymorphisms, many of the nucleotides of the panel are not polymorphic in the panel, and so they are eliminated from the panel.The discovery goes on to sequence a larger sample for the variable nucleotides (i.e., the remaining nucleotides).But, if the panel sample was larger, some of the discarded nucleotides could have been polymorphic.Thus, there is an ascertainment bias; for example, see Signorovitch (2003) for a description of this problem.The Bayesian procedure can be implemented to solve this problem, and we have some on-going activities in this area.where σ 2 p = {d(1 − ρ 2 )} −1 , σ 2 π = {e(1 − ρ 2 )} −1 and ρ = −f (de) −1/2 .Note that in this approximation σ 2 p is the variance of p, σ 2 π is the variance of π, and ρ is the correlation between p and π.
Finally, we show that the correlation ρ is nonnegative.It is easy to show that Thus, we need the condition for B k to be nonnegative for each k, and this is the same as s k −1 ≥ {1−(1−πp) s k }/πp(1−πp).This leads to the condition for nonnegativity that s k − 1 ≥ 4πp.