Abstract: We compare two linear dimension-reduction methods for statisti cal discrimination in terms of average probabilities of misclassification in re duced dimensions. Using Monte Carlo simulation we compare the dimension reduction methods over several different parameter configurations of multi variate normal populations and find that the two methods yield very different results. We also apply the two dimension-reduction methods examined here to data from a study on football helmet design and neck injuries.
Abstract: We derive three likelihood-based confidence intervals for the risk ratio of two proportion parameters using a double sampling scheme for mis classified binomial data. The risk ratio is also known as the relative risk. We obtain closed-form maximum likelihood estimators of the model parameters by maximizing the full-likelihood function. Moreover, we develop three confidence intervals: a naive Wald interval, a modified Wald interval, and a Fieller-type interval. We apply the three confidence intervals to cervical cancer data. Finally, we perform two Monte Carlo simulation studies to assess and compare the coverage probabilities and average lengths of the three interval estimators. Unlike the other two interval estimators, the modified Wald interval always produces close-to-nominal confidence intervals for the various simulation scenarios examined here. Hence, the modified Wald confidence interval is preferred in practice.
For statistical classification problems where the total sample size is slightly greater than the feature dimension, regularized statistical discriminant rules may reduce classification error rates. We review ten dispersion-matrix regularization approaches, four for the pooled sample covariance matrix, four for the inverse pooled sample covariance matrix, and two for a diagonal covariance matrix, for use in Anderson’s (1951) linear discriminant function (LDF). We compare these regularized classifiers against the traditional LDF for a variety of parameter configurations, and use the estimated expected error rate (EER) to assess performance. We also apply the regularized LDFs to a well-known real-data example on colon cancer. We found that no regularized classifier uniformly outperformed the others. However, we found that the more contemporary classifiers (e.g., Thomaz and Gillies, 2005; Tong et al., 2012; and Xu et al., 2009) tended to outperform the older classifiers, and that certain simple methods (e.g., Pang et al., 2009; Thomaz and Gillies, 2005; and Tong et al., 2012) performed very well, questioning the need for involved cross-validation in estimating regularization parameters. Nonetheless, an older regularized classifier proposed by Smidt and McDonald (1976) yielded consistently low misclassification rates across all scenarios, despite the shape of the true covariance matrix. Finally, our simulations showed that regularized classifiers that relied primarily on asymptotic approximations with respect to the training sample size rarely outperformed the traditional LDF, and are thus not recommended. We discuss our results as they pertain to the effect of high dimension, and offer general guidelines for choosing a regularization method for poorly-posed problems.