Estimation of the Proportion of Sterile Couples Using the Negative Binomial Distribution

Abstract: A Sterile family is a couple who has no children by their deliberate choice or because they are biologically infertile. Couples who are childless by chance are not considered to be sterile. The object is to estimate the proportion of sterile couples in Jordan indirectly based on the 1994 population census, by separating the two types of childless couples into sterile and fertile couples. Three methods of fitting a negative binomial distribution to the completed family size data obtained from 1994-population census are investigated. It appeared that the third method gives the best fit. Based on the fitted distribution, the proportion of sterile couples is estimated at 6.1% of all couples. This estimate is much lower than the corresponding estimate of sterile couples in the USA, which was estimated at 11%. The difference between the two can be due to some socio-cultural factors influencing the deliberate choice of couples to have no children. The method of estimation can be applied on other populations.


Introduction
The distribution of completed family size (or sipship size) has been a subject of interest for human biologists, geneticists, demographers, and social scientists.Since the variance of the family size distribution is much larger than its mean, a Poisson distribution is unlikely to fit.However, there is a good empirical evidence that the distribution is nearly that of a negative binomial (Kojima and Kelleher, 1962).Waller et al. (1973) in their paper entitled Heterogeneity of Childless Families noticed that, the number of childless families is much greater than the expected number of childless families when they fit the negative binomial distribution to the observed frequencies of completed family size from various sources, although the fit is good for the rest of the distribution.This led them to suggest that the childless family is a mixture of two types of families.The first type is biologically fertile and could have children, but by chance, didn't.This type of families should be a part of the general negative binomial distribution of family size.The second type is either biologically or electively not fertile (sterile) and thus has no children.This should not be a part of the general negative binomial distribution of family size.The proportion of this type of families is expected to vary among populations studied, due to sociocultural factors influencing the deliberate choice to have no children.They discussed the theoretical considerations that justify the use of a negative binomial distribution.One of these considerations is that a birth process leads to a negative binomial distribution.The negative binomial random variable X is a non-negative discrete random variable with probability function: where The mean of X is µ = kq/p and the variance is σ 2 = kq/p 2 .Note that the variance is always greater than the mean.Of particular interest to the problem under consideration is the first term of the distribution.For childless fertile families, X = 0 and P r[X = 0] = p 0 = p k , which is the theoretical proportion of childless fertile couples.Hence to estimate this proportion we need to obtain estimates for p and k.
This distribution has been found to provide useful representation in many fields.Its applicability in birth and death process has been shown by Furry (1937) andKendall (1949).It was used to model family size by Rao et al. (1973).Wilson et al. (1983) and Binns (1986) have used it for modeling entomological data.Kault (1996) have used this distribution for modeling the number of sexual partners.This distribution has been used extensively on biological data and consequently, there has been some attempt to give an ecological meaning to the mean µ and the shape parameter k.The mean µ has been thought of as the density of organisms in the area of interest, because an increase in µ results when the population increases in size or become more dense, see Wilson et al. (1984).Anscombe (1949), noted that there is a theoretical evidence that k depends on the intrinsic power of a species to reproduce it self and Waters (1959) suggested that k is a measure of aggregation.With this in mind, three methods of fitting this distribution, to the family size data obtained from the general census of the Jordan population, (1994), are employed.The best fitted distribution is used to estimate (or approximate) the proportion of sterile couples in Jordan population.To the best of our knowledge, there has not been any attempt to estimate this quantity directly; this could be due to the sensitivity of the issue.

Methods of fitting
Three methods of fitting the negative binomial distribution to the observed data have been employed ( see Waller et al., 1973).

Method I: (Complete)
This method consists of approximating the mean (µ) and the variance (σ 2 ) of the negative binomial distribution directly from the observed data, and the parameters p, k are estimated using the formulas:

Method II: (truncated)
In this method the zero class is considered as missing and the parametersx k, p are estimated on the basis of the incomplete (truncated) distribution.Let where f x is the frequency of families of size x.All summations range from x = 0 to the maximum class value (family size).The estimates of the parameters for the negative binomial distribution are: See Rider (1955).

Method III: (iteration)
This method consists of iterating method from the sets of initial trial values p, k obtained from methods I or II.T 0 is the total number of families in the data set with at least one child.In iterating from either set of values, the calculated new total number for the distribution is Ñ = T 0 /(1 − pk ), and the estimated number for the zero class is Ñ0 = pk Ñ .Then we replace the observed total number with the calculated values of and calculate new values for p and k by using the formulas in Method I.The end point of this iteration process is reached when successive values of Ñ and Ñ0 do not change widely.See Waller et al. (1973).

Population of Jordan
Based on the general census of Jordan (1994), the observed family size distribution is shown in Table (1).Where x is the sipship size (number of children in the family) and f x is the number of families which have x children (frequency).Now, we will apply the previous methods of fitting on this population.

Method I
Using Table 1, and using x = 13 for all families with size 13+, we find that µ = 4.32, and σ 2 = 9.19.Hence, using formula (1) we have p = 0.4703, k = 3.837.Then, the fitted negative binomial distribution is Table 1 contains the fitted family size distribution based on Method I. Column 1 contains the family size (x).Column 2 contains the observed number families f x for each value of x, column 3 contains the relative frequency of each value of x, column 4 contains the theoretical probability of each value of x based on the negative binomial distribution with p = 0.4703 and k = 3.837.The last column contains the fitted family size.It can be seen from this table and Figure 1 that the fitted curve and the actual curve have a similar general shape, but with a large gap at x = 0. Furthermore, both curves are skewed to the right with x = 3 being the mode as well as the median of both distribution.The large gap between the two curves at x = 0 suggests that there is a heterogeneity among the families of this class; some may have no children subject to chance alone while some are biologically or deliberately sterile and hence should not be part of the distribution.In method II and III below, this second type of families is isolated.

Method II
Here we deal with the zero class as missing values.We use Rider Method (1955) to estimate the parameters on the basis of the incomplete distribution.Let Then we have

Ñ =
T 0 1 − pk = 589227.9Ñ0 = pk Ñ = 12410.0 The fitted distribution is: Table 2 contains the family size distribution based on Method II.Column 1 contains the family size (x).Column 2 contains the observed number families f x for each value of x, with the zero class being adjusted.Column 3 contains the relative frequency of each value of x, column 4 contains the theoretical probability of each value of x based on the negative binomial distribution with p = 0.6382 and k = 8.5965.The last column contains the fitted family size.
It can be seen from this table and Figure 2 that the fitting curve and the actual curve have a very similar general shape with some discrepancies in the empirical and the theoretical probabilities.The total number of electively or In other words, based on this method of fitting about 79.3% of all childless families are electively or biologically sterile.

Method III
We do method III on the computer, beginning with values of p, k from the first method.The algorithm is: 1. Find mean (µ) and variance (σ 2 ) from the frequency distribution.2. Obtain p and k of the distribution using p = µ/σ 2 , k = µ 2 /(σ 2 − µ) 3. Obtain Ñ (total) and N 0 (total number of zero's) using Ñ = T 0 /(1−p k ), where T 0 = 576817, and Ñ0 = Ñ0 p k .4. Replace f 0 by Ñ0 , and N by Ñ . 5. Repeat steps 1, 2, 3 and 4 many time (L times) until the values of Ñ and Ñ0 converge.It can be seen from this table and Figure 3 that the fitting curve and the actual curve are very close.
Table 5 contains the fitted frequencies for the three methods along with the actual frequencies.Table 6 contains a summary of the estimated µ, σ 2 p and k for the three methods.

Results
It can be seen from the previous tables and graphs that the last method is the most appropriate method of fitting.So we adopt this method.Based on this method, the number of childless fertile couples is approximated by (213200) families, and the number of childless sterile (biologically or electively) families are (38659) family.Hence, we can approximate the proportion of childless sterile families in Jordan at (6.1%).To the best of our knowledge there was no reported value of the proportion of sterile families in Jordan.

Conclusions
The methods of fitting the negative binomial distribution to the population data give us clues to how to estimate the proportion of sterile (infertile) families in Jordan population (π).It should be noted that not all-childless families are sterile families.Thus estimating the proportion of sterile families based on all childless families (Crude estimate) would give a number that is higher than the actual number.To approximate π, we consider the childless family as being a mixture of two types of families: (1) One (2) Another type is either biologically or electively not fertile and thus has no children.Since method III gives the best fit, we may conclude that π = 0.0607 is a good estimate of π.Hence, the percentage of electively or biologically sterile couples in Jordan is about 6.1 It can be seen from previous tables that about 64% of the childless families are infertile.Also by inspecting table (5), and figure (3), it can be seen that there is some indication that the number of 3-child, and 4-child families are also in access.I.e.some of those families may have been become sterile (not fertile either electively or biologically).

Figure 1 :
Figure 1: Observed and fitted curve of family size using method I, bold = observed; smokey = fitted

Figure 2 :
Figure 2: Observed and fitted curve of family size using method II, bold = adjusted value, smoky = fitted value.
54406)5.4406 (0.4582) x , x = 0, 1, • • • , 13 (7)Table 4 contains the family size distribution based on Method III.column 4 contains the theoretical probability of each value of x based on the negative

Figure 3 :
Figure 3: Observed and fitted curve of family size using method III, bold = adjusted value, smoky = fitted value.

Table 1 :
Family Size Distribution of Jordan population by Method I

Table 2 :
Family Size Distribution of Jordan Population by Method II

Table 3 :
Results of Iteration for the Jordanian Families Population

Table 3
contains the results of the first 11 iterations.

Table 4 :
Family Size Distribution of Jordan Population by Method III

Table 5 :
Observed and Fitted Number of Families of Various sizes

Table 6 :
Estimates of the parameters of the fitted negative binomial distribution for the Jordanian population for the three methods.biologicallyfertile and could have children but didn't; this type of family should be a part of the general negative binomial distribution of family size.