Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves

Receiver operating characteristic (ROC) methodology is widely used to evaluate diagnostic tests. It is not uncommon in medical practice that multiple diagnostic tests are applied to the same study sample. A variety of methods have been proposed to combine such potentially correlated tests to increase the diagnostic accuracy. Usually the optimum combination is searched based on the area under a ROC curve (AUC), an overall summary statistics that measures the distance between the distributions of diseased and non-diseased populations. For many clinical practitioners, however, a more relevant question of interest may be ”what the sensitivity would be for a given specificity (say, 90%) or what the specificity would be for a given sensitivity?”. Generally there is no unique linear combination superior to all others over the entire range of specificities or sensitivities. Under the framework of a ROC curve, in this paper we presented a method to estimate an optimum linear combination maximizing sensitivity at a fixed specificity while assuming a multivariate normal distribution in diagnostic tests. The method was applied to a real-world study where the accuracy of two biomarkers was evaluated in the diagnosis of pancreatic cancer. The performance of the method was also evaluated by simulation studies.


Introduction
In recent years many efforts have been made on studying biomarkers that could provide accurate and non-invasive ways of disease diagnosis or prognosis.Many of these biomarkers are measured in a continuous scale and receiver operating characteristic (ROC) curve is widely used for evaluating the accuracy of such a continuous diagnostic test (Hanley and McNeil 1984;Hanley 1989;Begg 1991).Suppose that, based on some gold standard independent of the diagnostic tests to be evaluated, subjects belong to 1 of 2 basic conditions -diseased (D + ) and non-diseased (D − ).The ROC curve evaluates the ability of the diagnostic test to discriminate the two conditions.By plotting the true positive rates (sensitivity) versus the false positive rates (1-specificity) across all possible thresholds, ROC curve reflects the relative trade-off between true and false positive rates.The area under the ROC curve (AUC) measures the distance between the distributions of diseased and non-diseased populations and is frequently used as a global measure for the accuracy of the diagnostic test (Swets and Pickett 1982;DeLong, Vernon and Bollinger 1985;Ma and Hall 1993).If a test could perfectly discriminate, then there exists a cut-point above which all member of one group (diseased or non-diseased) will fall and below which all members of the alternative group will fall.The ROC curve would then pass through the point (0,1) on the grid [0, 1] × [0, 1], with an AUC of one.The closer the AUC comes to this ideal, the more discriminating ability the test has.Zhou et al (2002) and Pepe (2003) provide excellent reviews of the existing methods on the analysis of ROC curves.
In clinical studies, it is not uncommon that multiple diagnostic tests are applied to the same sample.In such a case, the diagnostic tests are more likely to be correlated.A variety of methods have been proposed to evaluate and compare the performance of such correlated diagnostic tests.Greenhouse andMantel (1950), andLinnet (1987) compared two sensitivities at a single fixed specificity.McClish (1987) proposed a way to assess the relative diagnostic accuracy of independent ROC curves using the difference of areas under curves (AUC).Metz et al. (1984) generalized the statistical comparison of the binormal ROC model (i.e., assuming the data in both diseased and non-diseased groups follow normal distributions) to bivariate case for comparing the difference on AUC between correlated ROC curves.DeLong et al. (1988) and Venkatraman and Begg (1996) developed nonparametric methods to compare the areas under two ROC curves.Wieand and colleagues (1989) proposed a more general family of non-parametric statistics to compare the weighted average of sensitivities.Their method can be used to compare the diagnostic tests either over a restricted range of specificity or under an entire ROC curve.However, since different markers are usually representative to different aspects of diseases, it is desirable to combine the correlated tests to increase the diagnostic accuracy.Assuming that the biomarkers of interest have a multivariate normal distribution in each of the diseased and non-diseased populations, Su and Liu (1993) worked under a linear discriminant analysis framework to separate the two conditions.They showed that the linear combination derived from discriminant function maximizes the area under the ROC curve.Based on the same binormal assumptions, recently Xiong and colleagues (2004) proposed an approach to construct the optimum linear combination over all possible linear combinations under a ROC analysis framework.Based on the eigenvalue of the optimum linear combination of the diagnostic tests, they presented closed forms for the estimation of maximum AUC and its variance.Both of the above methods developed the optimum linear combinations based on the area under a ROC curve, an overall summary statistics that measures the distance between the distributions of diseased and non-diseased populations.In clinical applications, however, a marker's usefulness is generally determined by its specific settings.For example, a test with 20% false positive rate (80% specificity) may be acceptable for cancer prognosis, but usually will be too high for cancer screening.Therefore, a more frequently question raised by a clinician could be "How much sensitivity (specificity) can be achieved at a given specificity (sensitivity)?".This article addresses the problem of combining multiple correlated diagnostic tests under a similar framework as Xiong et al. (2004).Instead of searching optimal linear combination that maximizes AUC, our method is seeking an optimum linear combination in discriminating between the diseased population and the healthy population at a single fixed specificity.More specifically, we consider all possible linear combinations of multiple diagnostic tests and numerically search for the best set of coefficients (weights) that maximizes the sensitivity at a given specificity of interest.The standard deviation and 95% confidence interval of the estimate are constructed taking a parametric bootstrap approach (Gentle 2002).The method is exemplified with a study on the diagnosis of pancreatic cancer where two serum markers are measured at 90 patients with pancreatic cancer and 51 patients with pancreatitis.The performance of the method is also evaluated by simulation studies.

Method
We assume that a total of r tests are used for each subject in both the diseased population and the healthy population.Without loss of generality, we assume that higher values of each test are associated with the positive results.Let D + and D − denote the diseased (i.e., the positive condition) group and the non-diseased (i.e., the negative condition) group respectively.Let X = (X 1 , X 2 , . . ., X r ) t (t stands for the transpose) be the values of the r test results for a subject in group D + , and Y = (Y 1 , Y 2 , . . ., Y r ) t be the values of the r test results for a subject in group D = .We assume that (X 1 , X 2 , . . ., X r ) t follows a multivariate normal distribution MV N r (µ + , Σ + ) with mean vector As mentioned earlier, we assume that µ + > µ − in each test.We also assume that Σ + and Σ − are positive definite.Considering the scenario of a single test (i.e., i-th test), let (µ denote means and variances in the diseased and non-diseased groups respectively.For a given specificity Q, the cut-off value C γ in the non-diseased group can be determined as After applying C γ to the diseased group, the corresponding sensitivity will be Therefore, the binormal ROC model can be written as where the double subscripted σ represents variances, Φ(•) is the cumulative distribution of a standard normal distribution, Φ −1 (•) is its inverse function, and Q is a given specificity.Note that in our notation σ represents variances rather than standard deviations.The above model plays a central role in ROC analysis similar to the role of normal distribution in classical statistical modeling, and it has been shown that this model provides a good approximation to a wide range of ROC curves encountered in practice (Pepe 2003;Hanley 1996).In the presence of multiple (correlated) tests, we seek a linear combination of r diagnostic tests such that the sensitivity is maximized over all possible linear combinations when the specificity is fixed at Q (preferably 0.5 < Q < 1).Let w = (w 1 , w 2 , . . ., w r ) t be a set of weights (coefficients), S = w t X and T = w t Y be the scores of linear combinations of the r diagnostic tests at the diseased and health populations respectively.The corresponding ROC associated with S and T is given by, Since the cumulative distribution of the standard normal distribution Φ is strictly monotonic, the maximization of g(Q) given Q over the choice of w is equivalent to the maximization of, where w = (w 1 , w 2 , . . ., w r ) t is obtained numerically with the constraint r i=1 w 2 i = 1.Since the distribution of the maximal sensitivity, g(Q), is analytically intractable, the standard deviation and confidence interval of the estimated sensitivity will be constructed taking a parametric bootstrap approach.Specifically, means and covariance matrices will be estimated from the study sample, and 1000 samples (each with the same number of observations as in the original data) will be generated from multivariate normal distributions based on these estimates.For each of the generated samples, a maximal sensitivity, g(Q) * , will be estimated based on (2.2).Then, the standard deviation and 95% confidence interval of g(Q) can be obtained based on the estimated distribution of g(Q) * (Gentle 2002).

Application: Biomarkers for the Diagnosis of Pancreatic Cancer
For illustration, we apply the proceeding method to a real-world data on the diagnosis of pancreatic cancer with two tumor markers (CA19-9 and CA125).CA19-9 is a carbohydrate antigen that tends to be elevated especially in subjects with carcinomas of the gastrointestinal tract while CA125 is a cancer antigen that is associated with a variety of malignancies including breast, cervix, pancreas, and lung, etc.A study conducted at Mayo Clinic considered 90 "cases" of patients with pancreatic cancer (D + ) and 51 "controls" of patients with pancreatitis (D − ).Serum CA19-9 and CA125 were measured on each of these patients and both of the markers were measured in continuous scales.The data was first presented by Wieand et al. (1989) to compare the relative accuracy of the two biomarkers for the diagnosis of pancreatic cancer.Zhou and others (2002) used the data to illustrate the maximum likelihood method and more recently Cai and Moskowitz1 (2004) exemplified the data with two semi-parametric approaches for fitting ROC models.The objective of our current analysis is to derive an optimum linear combination of the two markers that maximizes the sensitivity over all possible linear combinations at a fixed specificity (90%, say).
Let X = (X 1 , X 2 ) t be the values of CA19-9 and CA125 for pancreatic cancer patients and Y = (Y 1 , Y 2 ) t be the marker values for pancreatitis patients respectively.The original distributions of CA19-9 and CA125 are found to be badly skewed to the right because some of the marker values tend to be extremely large, and thus a logarithm transformation was performed for both markers to improve the normality.Based on the behavior of the majority data (Figure 1), we can assume that log(X) and log(Y) have a bivariate normal distribution, log(X) ∼ MV N (µ + , Σ + ) and log(Y) ∼ MV N (µ − , Σ − ).
Without considering the possible correlations between the two markers, approximately 78% sensitivity can be achieved for CA19-9 alone while the maximum sensitivity for CA125 alone is 34% at a given 90% specificity.After applying our proposed method, the optimum weights are searched numerically as 0.89 for CA19-9 and 0.455 for CA125.For a fixed 90% specificity, the resulting linear combination, 0.89 × log(CA19 − 9) + 0.455 × log(CA125), will achieve an approximately 80% (SD=4.0%)sensitivity , with a 95% confidence interval of [72.0%, 87.4%].The dotted line in Figure 2 corresponds to the 90% specificity of the resultant optimum linear combination which is the best one over all possible linear combinations, w 1 ×log(CA19−9)+w 2 ×log(CA125), such that w 2 1 +w 2 2 = 1.

Simulation Studies
Simulation studies are designed to evaluate the performance of proposed method in the presence of correlated multiple diagnostic tests.In practice, the true mean and the true covariance matrix of a vector of multivariate diagnostic tests are rarely known, and the best linear combination has to be derived based on the estimated means and covariance matrices.Therefore, it is important to assess how the sample size and inter-marker correlation affect the performance of the estimated optimum combination.The simulation assumes 3 correlated diagnostic tests.These diagnostic tests in the diseased (D + ) group are assumed to have a 3-dimensional normal distribution MV N (µ + , Σ + ) of with the vector of variance to be (σ + 1 , σ + 2 , σ + 3 ) = (3, 2, 1).The tests in the healthy (D − ) group are also assumed a 3 with the variance vector of (σ − 1 , σ − 2 , σ − 3 ) = (6, 2, 4).For simplicity, we consider a common correlation parameter (ρ = ρ + = ρ − in our simulation and let ρ take 3 values (ρ = 0.2, 0.5, 0.8).We also assume that diseased and healthy groups have an equal sample size, and in the simulations 4 sample sizes (N = 25, 50, 100 and 200) are considered for each group.For each selected sample size, 1000 random samples are generated from MV N 3 (µ + , Σ + ) and MV N 3 (µ − , Σ − ) at a given ρ respectively.In this study the simulation was implemented by the statistical package S-Plus (version 6.2).The random samples were generated from the function RMVNORM (the random generation function for the multivariate normal distribution) while the optimum weights (coefficients) w = (w 1 , w 2 , w 3 ) t for linear combinations were searched numerically by the function NLMINB (the function for nonlinear minimizations subject to box constraint).To satisfy the constraint of 3 i=1 w 2 i = 1, the actual minimization was performed on the unconstrained parameters γ = (γ 1 , γ 2 , . . ., γ r ) t such that w i = γ i / γ 2 i .In the simulations, we evaluated the performance of the method given Q = 80% and 90%, two specificities that are usually of most interest to clinicians.
By assuming that all the mean vectors and variance-covariance matrices in both diseased and healthy populations are known, Table 1 shows the optimum weights and the expected maximum sensitivities at different combinations of ρ and Q.These optimum weights will produce the best linear combination that gives the maximum sensitivity over all possible linear combinations of the 3 diagnostic tests.The results show that the optimum weights (and thus the maximum sensitivity) are a function of the inter-marker correlation.When there exists a weak correlation among these 3 biomarkers (ρ = 0.2) at a given Q = 90%, for example, the optimum weights ŵ = (−0.079,0.820, 0.567) t will give the best linear combination as S = 3 i=1 ŵi X i in the diseased sample and T = 3 i=1 ŵi Y i in the healthy sample.Then a maximum 52% sensitivity can be achieved based on the scores of S and T .In contrast, a different set of optimum weights ŵ = (−0.589,0.521, 0.617) will be obtained in the presence of a strong correlation (ρ = 0.8) among the 3 tests, and the resultant combined test will allow us a maximum 72% sensitivity at the fixed Q = 90%.The optimum weights in Table 1 are consistent in signs with the expected weights ŵ = (−0.3845,0.6767, 0.1692) by Xiong et al. (2004) who took a similar parameter setups as ours but searched for the optimum combination maximizing area under ROC curve (AUC).Our simulations show that optimum weights are also a function of the specificity (Q).In the presence of a weak inter-marker correlation (ρ = 0.2), for example, the optimum coefficients for Q = 80% are ŵ = (−0.080,0.626, 0.776) while the coefficients are ŵ = (−0.079,0.820, 0.567) for Q = 90%, and the corresponding linear combinations will result in 75% and 52% maximum sensitivities respectively.Our finding is consistent to the work by Anderson and Bahadur (1962) that generally there is no unique linear combination superior to all others over the entire range of specificities (sensitivities).In real-world applications, the true mean and the true covariance matrix of a vector of multivariate diagnostic tests are rarely known, and the best linear combination of the diagnostic tests has to be derived based on the estimated means and covariance matrices.Table 2 presents the averages of estimated maximum sensitivity and its standard deviation based on 1000 random samples.The results show that the estimated maximum sensitivity becomes closer to the expected ones as the sample size increases and an accurate estimate can be achieved even in a relatively small sample size.The last column in Table 2 shows the empirical coverage probabilities of 95% confidence interval (CI).We see that, though the empirical coverage probabilities tend to be lower than the nominal 95% coverage probability when sample sizes are relatively small, the estimated confidence intervals perform very well for moderate to large sample sizes.
Table 2: The averages of the estimated maximum sensitivity, the average of the estimated standard deviation (SD), and the empirical coverage of 95% confidence intervals based on 1000 random samples, where ρ represents intertest correlation and N is the sample size in each group.

Discussion
In this paper, we proposed an approach to estimate the maximum sensitivity at a fixed specificity in the presence of multiple correlated diagnostic tests.Although we focused on seeking the maximum sensitivity at a fixed specificity, it is a straightforward extension to obtain the maximum specificity at a given sensitivity.By assuming multivariate normal distributions for the diagnostic tests in both the diseased and healthy populations, an optimum linear combination test is searched numerically over all possible linear combinations under a binormal ROC setting.The method is exemplified with a real-world data on the diagnosis of pancreatic cancer.The performance of the method is also assessed with simulation studies.Results show that the proposed method can provide an accurate point estimate of the expected maximum sensitivity even in a relatively small sample size.The performance of the estimated confidence interval is also evaluated in terms of attaining the nominal 95% coverage based on the empirical coverage probability in the simulation study.The results show that a better coverage can be produced with moderate to large sample sizes.
The means and covariance for most populations are unknown in practice, and the corresponding maximum likelihood estimates (MLE) from the observed samples are frequently used.It is important to point out that the results from this work depend on the assumption of multivariate normality for the multiple diagnostic tests.In addition, our maximization process is based on the MLEs of the first two moments rather than individual measurements, and the proposed method may be relatively more sensitive to the normality assumption.In cases where a real-world data does not satisfy this assumption, some transformations may be necessary to improve normality and the proposed method can then be applied to the transformed data.When the normality assumption of X and Y fails, there will be in general some degeneration in the performance of our method, similar to that of a classical binormal ROC curve modeling.Note that the weights are dimensionless and thus are more appropriate for diagnostic tests with similar units.Otherwise certain data preparation (such as data transformation or normalization) is needed to reduce the dissimilarity among values from different tests.It also should be pointed out that, as explained by Anderson and Bahadur (1962), the method to identify optimum linear combination at a fixed specificity (sensitivity) may become problematic when the specificity (sensitivity) is extremely large.In such a case, we will work at a location of normal distribution that is far away from its central and thus the variance will dominate the estimation procedure.

Figure 2 :
Figure 2: The scatter plot for log(CA19 − 9) versus log(CA125)in 51 pancreatitis patients (D − ) and 90 pancreatic cancer patients (D + ), where dotted line corresponds the optimum linear combination that maximizes sensitivity at a fixed 90% specificity.

Table 1 :
The optimum weights and the expected maximum sensitivity at a fixed specificity (Q) when the means and variance-covariance matrices of the diagnostics tests are known, where ρ represents the inter-test correlation.