Abstract: Li and Tiwari (2008) recently developed a corrected Z-test statistic for comparing the trends in cancer age-adjusted mortality and incidence rates across overlapping geographic regions, by properly adjusting for the correlation between the slopes of the fitted simple linear regression equations. One of their key assumptions is that the error variances have unknown but common variance. However, since the age-adjusted rates are linear combinations of mortality or incidence counts, arising naturally from an underlying Poisson process, this constant variance assumption may be violated. This paper develops a weighted-least-squares based test that incorporates heteroscedastic error variances, and thus significantly extends the work of Li and Tiwari. The proposed test generally outperforms the aforementioned test through simulations and through application to the age-adjusted mortality data from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute.
Abstract: Providing reliable estimates of the ratios of cancer incidence and mortality rates across geographic regions has been important for the National cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program as it profiles cancer risk factors as well decides cancer control planning. A fundamental difficulty, however, arises when such ratios have to be computed to compare the rate of a subregion (e.g., California) with that of a parent region (e.g., the US). Such a comparison is often made for policy-making purposes. Based on F-approximations as well as normal approximations, this paper provides new confidence intervals (CIs) for such rate ratios. Intensive simulations, which capture the real issues with the observed mortality data, reveal that these two CIs perform well. In general, for rare cancer sites, the F-intervals are often more conservative, and for moderate and common cancers, all intervals perform similarly.
Predictor envelopes model the response variable by using a subspace of dimension d extracted from the full space of all p input variables. Predictor envelopes have a close connection to partial least squares and enjoy improved estimation efficiency in theory. As such, predictor envelopes have become increasingly popular in Chemometrics. Often, d is much smaller than p, which seemingly enhances the interpretability of the envelope model. However, the process of estimating the envelope subspace adds complexity to the final fitted model. To better understand the complexity of predictor envelopes, we study their effective degrees of freedom (EDF) in a variety of settings. We find that in many cases a d-dimensional predictor envelope model can have far more than $d+1$ EDF and often has close to $p+1$. However, the EDF of a predictor envelope depend heavily on the structure of the underlying data-generating model and there are settings under which predictor envelopes can have substantially reduced model complexity.