Interval Estimation for Ratios of Correlated Age-Adjusted Rates

Providing reliable estimates of the ratios of cancer incidence and mortality rates across geographic regions has been important for the National cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program as it profiles cancer risk factors as well decides cancer control planning. A fundamental difficulty, however, arises when such ratios have to be computed to compare the rate of a subregion (e.g., California) with that of a parent region (e.g., the US). Such a comparison is often made for policy-making purposes. Based on F-approximations as well as normal approximations, this paper provides new confidence intervals (CIs) for such rate ratios. Intensive simulations, which capture the real issues with the observed mortality data, reveal that these two CIs perform well. In general, for rare cancer sites, the F -intervals are often more conservative, and for moderate and common cancers, all intervals perform similarly.


Introduction and Preliminaries
Let Ω denote a region such as the entire US or a state in the US, and let X denote a subregion (a proper subset) of Ω. Denote the rest of the region by X c = Ω/X.Let R X , R X c and R Ω denote the age-adjusted rates for X, X c and Ω, respectively, all of which are defined below.
where w j are known standards normalized to sum to 1 over the J age-groups; d Xj , d X c j , d Ωj and n Xj , n X c j , n Ωj are the number of cancer cases or deaths, and the number of person-years in X, X c , Ω, respectively.We define the underlying true rates as respectively.Under the assumption that the counts are independent Poisson random variables, their variances can be approximated by Our interest is to construct an approximate 100(1 − α)% confidence interval for the rate-ratio of X to Ω, namely that of the parameter θ = µ X µ Ω , e.g the cancer rate-ratio comparing the rate of California with that of the US.Such comparisons are often necessary for policy-making purposes.However, statistical diffculties arise when the estimates of the rates (e.g.those of California and the US) are correlated.
In a simpler context, Fay (1999) and Tiwari et al. (2006) derived confidence intervals for φ = µ X µ X c using the F distribution as an approximation of the ratio of the two independent Gamma random variables.Specifically, denote by φ = R X R X c , the estimator of φ.The F-based confidence interval is given in Tiwari et al. (2006) as ,b) .The derivation of (1.1) is to view the ratio φ as an approximately F distributed random variable, given the independence of the numerator and the denominator in φ.Tiwari et al. (2006) noted the need for comparing the rate of a subregion (e.g., California) with that of a larger region (e.g., the US), and gave a possible solution for computing the CIs for θ (the ratio of a subregion with its parent region) that accounts for the correlation between the age-adjusted rates.However, their method was not fully developed.In this paper, we will propose new F -and normal-based confidence intervals that can conveniently compute the CIS based on the ratio of two correlated estimates of rates.We demonstrate via simulations that the new intervals perform well and retain the nominal coverage probabilities.
The rest of the paper is organized as follows.In Section 2, we derive the new confidence intervals, and in Section 3 we evaluate their performance in terms of their empirical coverage probabilities.Section 4 gives a short discussion and Section 5 ends this paper with a conclusion.

Two New Confidence Intervals for θ
To derive the confidence interval for θ based on its point estimate θ, we first That is, the ratio of person years in X (e.g.California) to that of Ω (e.g. the US) is approximately the same across all age-groups.This so-called proportional agedistribution assumption is common in comparing the age-adjusted rates across different geographical areas and was found to be a good approximation for the US population; see, e.g., Pickle and White (1995).
Reasonable values for p X is given by pX = , where n X = j n Xj and n Ω = j n Ωj .Now, we can write where p X c = 1 − p X .Write the estimators for θ and φ by Hence confidence intervals for φ would lead to those for θ (and vice versa), which will be derived below.
Since pX and pX c consistently estimate p X and p X c respectively, we thus obtain the approximate 100(1 − α)% confidence interval for θ as

A normal approximation
From the perspective of a normal approximation, we can also derive a confidence interval for θ.First note that ) is asymptotically normal, and where Z α is the upper 100α percentile point of the standard normal distribution, and . Thus, the normal confidence interval for θ is ) . (2.2)

Simulation Studies
We carried out simulations along the lines of Tiwari et al. (2006).We used the 2004 US cancer mortality data for tongue, esophagus, and lung cancer sites.These sites were selected to reflect the spectrum of cancer incidence; that is, from rare cancer (tongue), to moderate cancer (esophagus), to common cancer (lung).
The data were used to generate Poisson counts d Xj , where X represents each of the 51 regions (50 states and Washington D.C.) and j indexes the 19 age-groups.The true means of the Poisson distributions are taken to be the observed values of d Xj .We generated 10,000 Poisson counts, and the computed age-adjusted rates, using the 2000 US standards, so that 19 j=1 w j = 1.Approximate 95% confidence intervals were obtained for the ratios of the age-adjusted rates for each of the 51 regions as compared to the overall US rate using the modified versions of the two CIs, as discussed in Tiwari et al. (2006).
• F -interval: • Normal Interval: ; where For the normal-approximation based intervals, if the lower limit is negative, we replace it by 0. Also note that the correction fraction added to the counts d Xj and d X c j does not make any significant difference numerically.It merely avoids the zero rates.Each of the Tables 1-3 gives the ratio of age-adjusted rates, RX R Ω , the estimate, pX , of the ratio of population for region X to that of the US, the empirical coverage probabilities, and the width of the 95% intervals for the two intervals, namely, the F -and normal-approximation based CIs.Because of the space, we only report the selected states in these tables and the full tables are available from the authors.The results show that the two CIs perform reasonably close.For the tongue cancer, all intervals have higher coverage probabilities and larger widths than those for esophagus and lung cancers.Indeed, for the latter two cancers, the empirical coverage probabilities get much closer to 95%.For large states such as California, Florida, New York, Pennsylvania, and Texas, all intervals perform similarly in terms of interval widths and the coverage probabilities.

Discussion of the Results
The SEER Program of NCI has implemented the F -intervals of Fay (1999) and the modified F -interval of Tiwari et al. (2006) in the SEER*STAT software to compare the age-adjusted rates for two nonoverlapping regions.However, as pointed out in Tiwari et al. (2006), there is an emerging need of obtaining confidence interval formulae for comparing the age-adjusted rates of a subregion that is a part of a larger region.This paper fills that gap by providing the needed confidence interval formulae, namely, the F -based and the normal-based intervals.
It is noticeable that these two intervals depend on the ratio of the population sizes for the subregion and the region, and on their age-adjusted rates.The results in Tables 1-3 show the effect of the size of the overlap in the two populations on the confidence intervals.To avoid the situation where the observed age-adjusted rates are zero, we adopted a corrected version of age-adjusted rates by adding a small constant as in Tiwari et al. (2006).We note the framework of the normal-approximation method allows the computation for the following two common scenarios in cancer surveillance.The first concerns with a partial overlapping situation.Consider X and Ω two regions

Table 1 :
Performance of the Derived Confidence Intervals based on the 2004 Tongue Cancer Mortality Data.The empirical coverage probabilities and the widths of the intervals were based on 10,000 simulations.

Table 2 :
Performance of the Derived Confidence Intervals based on the 2004 Esophagus Cancer Mortality Data.The empirical coverage probabilities and the widths of the intervals were based on 10,000 simulations.

Table 3 :
Performance of the Derived Confidence Intervals based on the 2004Lung Cancer Mortality Data.The empirical coverage probabilities and the widths of the intervals were based on 10,000 simulations.