Pairwise Comparison in Repeated Scores-Application in Palliative Cancer Patients

In clinical studies, subjects or patients might be exposed to a succession of diagnostic tests or medication over time and interest is on determining whether there is progressive remission of conditions, disease or symptoms that have measured collectively as quality of life or outcome scores. In addition, subjects or study participants may be required, perhaps early in an experiment, to improve significantly in their performance rates at the current trial relative to an immediately preceding trial, otherwise the decision of withdrawal or dropping out is ineviTable. The common research interest would then be to determine some critical minimum marginal success rate to guide the management in decision making for implementing certain policies. Success rates lower than the minimum expected value would indicate a need for some remedial actions. In this article, a method of estimating these rates is proposed assuming the requirement is at the second trial of any particular study. Pairwise comparisons of proportions of success or failure by subjects is considered in repeated outcome measure situation to determine which subject or combinations is responsible for the rejection of the null hypothesis. The proposed method is illustrated with the help of a dataset on palliative care outcome scores (POS) of cancer patients.


Introduction
Research in many areas frequently involves study designs in which repeated measurements are obtained.Studies in which the response variable is measured at multiple points in time from each subject are one important and commonly used application.In other applications, the response from each experimental unit is measured under multiple conditions rather than at multiple time points.In some settings in which repeated measurements data are obtained, the independent experimental units are not individual subjects.For example, in a toxicological study, the experimental units might be litters; responses are then obtained from the multiple newborns in each litter.In a genetic study, experimental units might be defined by families; responses are then obtained from the members of each family (for more illustration, see Davis, 2002).
Researchers often collect multiple observations from many individuals.For example, in research examining the relationship between stress and mood, a research participant may complete measures of both these variables every day for several weeks, and so daily measures are grouped within participants (see for details M Ataharul Islam et al., 2009).In relationship research, a respondent may report on characteristics of his or her interactions with a number of different friends.In developmental research, individuals may be measured at many different times as they develop.In cognition research, reaction times may be observed for multiple stimuli.
In each of the above-cited examples, the interest may be in determining whether the subjects improved their performances or chances of success over the set of conditions during the study or experimental period.In other words, researchers may be interested in testing whether the proportions of positive responses are the same or different over a set of conditions.If the hypothesis of no improvement is rejected, there might exists some improvements in performance or increases in proportions of positive responses, one may then search further to examine statistically any observed patterns in these increases, with a view to ascertaining which of the conditions or their combinations might have led to a rejection of the null hypothesis.Often interest in these situations may be in determining whether the subjects on the average successively improve their performance rather than in multiple comparisons of all the conditions.Therefore, research interest may be in pairwise comparisons of proportions of success or failure by subjects or candidates in a consecutive series of experiments or trials over time or space.In literature, many nonparametric methods exist for answering these questions usually based on rank order the observations for each subject or candidate across the treatment conditions and then apply any of the non-parametric methods used in analyzing ordered data (see for more insights Conover, 1980;Kempthorne, 1979;Prentice, 1978;Page, 1963;Sen and Puri, 1967).
In cancer like disease, repeatedly recorded data is common to examine the disease pregression or to understand treatment responses.For patients with advanced disease diagnosis mostly undergo with palliative care and then the main concern is to provide better quality of life at each stage of care.Therefore, it is required to understand the performance of a randomly selected patient at any time point and decision need to be taken whether to proceed with the current status of care or not.
In this article, we proposed a method of pairwise comparisons in repeated measures that is suitable when interest is not only on testing whether the null hypothesis of no difference is rejected or accepted, but if the null hypothesis is rejected, which individual subjects or their combinations actually contributed to the rejection of the null hypothesis.The section 2 deals with the detailing of this method.Hypotheses of research interest, odds of better performances are discussed in section 3 and section 4, respectively.Finally, a dataset of advanced cancer patients in whom serial measurements of palliative care outcome scores (POS) have been obtained is analyzed to illustrate the proposed method in section 5.

The Method
Let us assume that there are k independently drawn subjects involved in a study.Each of them is observed in regular time interval and certain performance scores are obtained at each time period, location and treatment conditions (altogether termed as "visit" hereafter).
Let y ij (i = 1,2, … , k and j = 1,2, … , v) denote the score earned by the ith subject at the jth visit.Define Here, we assign the ith subject a numerical score 1 if the subject's outcome score of the current visit is lower than that of the immediate preceding visit, otherwise a numerical score 0 is assigned.It is to be noted that lower the outcome score better is the condition of the subject and hence in any visit if the outcome score is less than that of immediately preceding visit the subject is in a better condition or in a success.
The raw outcome measurement data is shown in Table 1 below: and Note that t j is the number of 1's (can be called as success in view of Bernoulli's trial) by the subject in the current visit relative to the immediately preceding visit.The corresponding number of 0's (or failure) is(k − t j ). Let , be the total number of successes (1's) for all the  visits and let be the total number of failures (0's) for all the  visits.Hence, u ij ~Bernouli(1, π j ) so that E(u ij ) = π j and Var(u ij ) = π j (1 − π j ) .Note that t j ′s are binomial random variates with parameters k and π j .Therefore, E(t j ) = kπ j and Var(t j ) = kπ j (1 − π j ).It is to be noted that π j represents the proportion of successes in the current jth visit relative to the immediately preceding ( − 1)th visit.π j can be estimated as In particular, if the proportions of successes are the same for all the v visits then the common proportion of success is estimated as p = t k(v−1) ; where t = ∑ t j v j=2 .The above results can be summarized in a 2x(v − 1), as noted in Table 2 .

Hypotheses of Interest
Now suppose we want to test the null hypothesis (H0) that there are equal proportions of successes for all the visits against a two-sided alternative.The expected number of successes (1's) and failures (0's) are respectively, and for j = 2, 3, … , v.
Hence under H0, the test statistic is given by (5) ote that the corresponding observed frequencies are O 1j = t j and O 2j = (k − t j ), for j = 2, 3, … , v.
The above statistic has approximately a chi-square distribution with (v − 1) degrees of freedom (d.f).
This can be used to test the hypothesis of no difference in success rates.On simplification, from (5) we get, An alternative expression in terms of proportions is p(1−p) (6) which can be used to test the equal proportion of success in case of adequate combination of k and v.
The null hypothesis (H0) is rejected at 100α% level of significance in favor of alternative if Observed χ 2 ≥ χ 2 α;(v−1) , and is accepted otherwise.Here χ 2 α;(v−1) is the point in chi-square distribution with (v − 1) d.f such that the area to its right is α.
If H0 is rejected, then one may be interested to investigate further and an obvious question comes in mind that which visits have led the rejection of the hypothesis that the proportions of successes are not equal for all the visits.To be more specific, one may be interested to test whether the subjects are successively improving their performances over the visits or not.
For answering such question, we need to formulate certain necessary hypotheses once H0 is not accepted.Let π r and π s be the population proportions of positive responses (successes) at the th and th visits respectively, for r, s = 2, 3, … , v and r ≠ s.The sample estimates of π r and π s are given by p r = t r k ⁄ and p s = t s k ⁄ , respectively.
It is worth mentioning here that π r and π s , respectively, measure percentage increases in performance of the subjects, in population sense, at the r th and s th visits relative to their performances at the (r − 1)th and (s − 1)th visits respectively.We may be interested in testing either i.
Relative improvement rates differ by some constant, or ii.
There is no relative improvement Using standard notations, we may wish to test either of the hypotheses 1. H: π r − π s ≥ π 0 (a constant) against K: π r − π s < π 0 2. H: π j ≥ π 0 against K: π j <  0 , for j = 2, 3, … , v To test the null hypothesis given in (7), the sample estimate of (π r − π s ) is given by (p r − p s ).Under the null hypothesis of no difference between the population or proportions of success, the overall estimate of π j , p ̂j, is p as given earlier.Hence the variance of p j can be estimated as Hence the test statistic for testing H given in (6) boils down as: Equivalently, in terms of proportions, for testing H in (7), we have which approximately follows a chi-square distribution with 1 degree of freedom.
The test statistic obtained in (11) can be used to test the null hypothesis that proportion of successes in the rth visit is higher than the corresponding proportion in the sth visit by at least some pre-assigned constant value, π 0 .Note: The observed value of the statistic obtained using (11) may compared with an appropriately chosen critical value of the chi-square distribution with one degree of freedom at a specified significance level α.However, it is suggested to make all comparisons against critical chi-square values with (v − 1) degrees of freedom and a specified α to have control over the type-I error.Now, to test the other hypothesis as mentioned in ( 8 which, again, approximately follows a chi-square distribution with 1 degree of freedom. Note that the null hypothesis in ( 8) is nothing but testing the success rate at the  th visit is greater than the corresponding success rate at the ( − 1) th visit by at least some constant, π 0 .So, the test statistic obtained in (12) can similarly be used to test the hypothesis that the proportion of successes in the current  th visit is at least equal to the corresponding proportion of successes in the immediately preceding visit, viz.( − 1) th .

Odds of Better Performance
If early in the study or experiment, it is required for the subjects to improve significantly on the success rates, the null hypothesis of no difference between two consecutive visits must not, anyway, be accepted.For example, if the success rate at the second visit is expected to improve over that of the first visit, the null hypothesis of no difference in these two visits must be rejected.Hence, for a given value of sample size and level of significance α, the test statistic given in ( 11) must be such that The inequality obtained in (14) provides lower bound of an estimate of the odds that a randomly selected subject performs significantly better in the second visit than the first one.From this we can also find out a lower bound of the probability, p 2 , that a randomly selected subject for some experiment or study performs significantly better (i.e., significantly improves his/her performance) at the second visit relative to first visit as

Application: Palliative Care Outcome Scores (POS) in Cancer Patients
Before going to the data analysis and illustration of the proposed method, let us provide a brief discussion on application area and data description that we have considered in this section.Patients reported outcome (PRO) measures are widely used in health research to describe patient populations or to assess the effectiveness of interventions, but they are not, as yet, always incorporated into routine clinical practice.However, with the increasing focus on patient autonomy, equitable service delivery and transparent information compelling service providers, healthcare commissioners and funders to demonstrate effectiveness and value for money, PRO measurement is becoming a more important procedure to consider.These outcomes can be measured using a variety of tools, for example Palliative care outcome scores (POS).The POS has been shown to be a credible clinical, research and audit tool that is acceptable to both patient and staffs (see for more details Hearn and Hegginson, 1999).Individual POS question items are all valid and POS is equally valid when used as a summary scale.The POS is sensitive to change over time.Notably, it is responsible to changes in patients' conditions and generates different results accordingly (see Siegert et al., 2010).The POS has extensive applications in palliative care of advanced cancer patients for assessing their quality of life (cf.Hearn and Hegginson, 1997;Stromgren et al., 2002;Stevens et al., 2005).It is also an important tool to assess and control pain and allied symptoms of advanced cancer patients under home based palliative care for providing better care to improve life quality (see for more details, Harding et al., 2003).
The data from a study on home based palliative care service provided by Malabar Cancer Centre, Thalassery, India, (CTRI number: CTRI/2014/03/004477) is considered here for illustration and analysis of the proposed methods described in the previous sections.We have considered 100 cancer patients' POS taken (scores ranging from 0 to 40) during 3 consecutive home visits (from October 2010 to December 2013).The data set utilized here is taken randomly from a data of 108 subjects (see Biji M S et al., 2014;2015) under the same study.A part (subjects from serial no.33 to serial no.45) of the data set is shown in the following Table 3: The research interest is to determine whether or not patients in the study progressively improved their performances as captured by their POS.To answer this question we apply the method given in (1) to code the data shown in the above Table with 1's and 0's.Table 4 below shows the coded data.Comparing with the Table value for chi-square distribution with (3 − 1) = 2 degrees of freedom at 5% level of significance (χ 2 0.05;2 = 5.991), we cannot accept the null hypothesis and hence the test is statistically significant.This means that subjects'/patients' performance seems to differ from visit to visit.An expression in terms of p-value can easily be calculated manually by finding the probability, Prob{χ 2 > 6.664|H 0 is true} , and which is also less than 0.05, when compared conventionally.Now we step forward with a interest to compare the visits in terms of success rates achieved to determine which visit might be responsible in rejecting the H0.For example, one may be interested in comparing visit 3 with visit 2 to determine whether there is any significant difference in the relative success rates for these two visits, or, in other words, one may be interested in testing the null hypothesis mentioned in (7) with  0 ≥ 0.

Conclusion
The proposed method of pairwise comparison of repeated measures is quite suitable when interest is not only to accept or reject of the hypothesis of no difference, but it is devised in a fashion that more specific queries also can get a scientific answer, such as whether successive improvement is significant or not.The procedures described in this article do not need any sophisticated statistical software for applying in real life situations and hence very much user friendly in case of field applications.Moreover, at each stage of data collection, performance of a randomly selected study participant can easily be understood and decision can be taken whether to proceed with the study or not.Further, the method can easily be extended to a situation where a given minimum difference in scores is considered as the threshold for defining success of a given intervention.
− p s ) − π 0 √Var(p r − p s ) Where Var(p r − p s ) = Var(p r ) + Var(p s ) as Cov(p r , p s ) = 0. So, under H, Z = (p r −p s )−π 0 √Var(p r )+Var(p s ) is a standard normal variate.Hence  2 has approximately chi-square distribution with 1 degree of freedom.

Table 2 :
Table for analyzing repeated measures.

Table 3 :
POS data of home based palliative care

Table 4 :
Data coded as per the method described

Table 5 :
Patterns of 1's and 0's for the coded data (15)h is statistically significant at 5% level and hence, we can conclude that there is significant difference in relative success rates for the two successive visits.Similar comparisons can be made for any other pairs of visits if the numbers of visits are more, i.e., if the POS would have been recorded more frequently.In comparing visit 2 versus visit 1, we considered the null hypothesis mentioned in (8) that there is no relative improvement at visit 2 relative to visit 1, Using(12)we have, Which is again statistically significant at 5% level.The above results indicate that patients improved their performances significantly from first to second visit and from second to third visit as well.Now, if it is required that patients must achieve some minimum critical score at the end of the second visit relative to the first one, then if α = 0.05, the required minimum score is obtained using(15)as which is much lesser that the average success rate of 98% achieved in the second visit relative to first visit.Moreover, for small number of samples, say, for  = 30, and for α = 0.05, patient in the study of interest or program would have to earn at least 16.6% better outcome score in the second visit relative to the first to be able to continue with the study or program.