A New Change-Point Rank Tests

A new rank-based test statistics are proposed for the problem of a possible change in the distribution of independent observations. We extend the two-sample test statistic of Damico (2004) to the change point setup. The finite sample critical values of the proposed tests is estimated. We also conduct a Monte Carlo simulation to compare the powers of the new tests with their competitors. Using the Nile data of Cobb (1978), we demonstrate the applicability of the new tests.


Introduction
In practice, a collection of observations enjoying some common stochastic properties is important for statistical inference.However, the observation characteristics may vary over the observational domain.The analysis of such a change is important in many scientific fields.For example, in quality control, it is of interest to check if the production process is constant during the whole period of time or it is changing over time.Also, this analysis of change includes; structural change in Economics, incidence of a disease in Epidemiology and the study of archeological cites in Archeology.In statistical literature such problems are called "change point".
Suppose that X 1 , ..., X n is a sequence of independent random variables with distribution functions F 1 , ..., F n .In this article we are interested in testing the null hypothesis of no change: against the at most one change point (AMOC) alternative hypothesis: where F = G are unknown distribution functions and τ ∈ [0, 1].
There are considerable research done in the area of rank inference for the change-point problem.Lombard (1983) derives the asymptotic distribution for some linear rank change point test statistics under small parameter perturbations.Wolfe and Schechtman (1984) discuss a general nonparametric change-point inference including the rank-based statistics.Aly, Csörgő, and Horváth (1987) study rank processes and tests for the one change point problem.They proved some important weighted approximations for the P-P plot, empirical and quantile rank processes.Lombard (1987) was first to derive the asymptotic distribution of a general functional linear rank test statistics in the change point setting.Aly and Abd-Rabou (2000) discuss, theoretically and via simulation, some general change point linear rank test statistics assuming that the sample size, n, is random.Hušková (2004) proves weak invariance principles for some change-point regression rank test statistics.
The aim of this paper is to propose linear rank test statistics for the changepoint problems in (1.2).These tests are extension of the Damico's two sample test statistics Damico (2004).The remainder of the paper is as follows.Section 2, presents the proposed rank test statistics for the at most one change point (AMOC) alternative in (1.2).We also present the asymptotic distributions of the proposed test statistics.Section 3 is devoted to two simulation studies.In Section 3.1, the finite sample properties of the two proposed tests are assessed.In Section 3.2, the power of the proposed tests are compared, empirically, with Lombard's test statistics Lombard (1987).Finally, in Section 4, the proposed tests are applied to a real data set; the Nile data.

The Proposed Tests
Assume that F (•) and G(•) are two unknown, generally different, distribution functions.Let X 1 , ..., X m , X m+1 , ..., X n be independent random variables such that X i , for 1 ≤ i ≤ m, has the distribution function F (•), and X i , for m + 1 ≤ i ≤ n, has the distribution function G(•).Note that the problem in (1.2) is the typical two-sample problem if the integer m = [nτ ] is known.However, if m is unknown the more complicated problem, the change point problem, arises.
For the two sample problem, Damico (2004) suggests the use of the following test statistic: where [k] is the integer part of k and R i = R(X i ), 1 ≤ i ≤ n, is the rank of X i among the sample X 1 , ..., X m , X m+1 , ..., X n .Damico reports a few cumulative probabilities of the test in (2.1), when the two samples are of size m = 4 and n − m = 6.
In the change point context, typically the value of m is unknown.This complicates the problem and the Damico's test can not be used in a straightforward way.In this setting, we suggest the following test statistics: and As in earlier researches we may suggest to estimate the shift location m by the point m at which D n,1 is maximized.Now we try to write the asymptotic distribution of the proposed test statistics in (2.2) and (2.3) under the null hypothesis H 0 in (1.1).We will follow the general rank test scheme of Lombard (1987).Let φ be an arbitrary score function defined on the interval (0, 1) and satisfying 0 < and Lombard (1987) have shown that under the null hypothesis of no change, the process {B n (u); 0 ≤ i ≤ 1} defined by We recall that the standard Brownian bridge is a Gaussian process with zero mean and covariance structure as Applying the above Lombard's scheme to our test statistics in (2.2) and (2.3), and using Theorem 5.5 of Billingsley (1968), we have the following results where B(•) is a standard Brownian bridge.

Simulation Studies
In this section, simulation studies are used to evaluate the performance of the proposed tests.The first subsection is devoted to estimating the critical values of the proposed tests, D n,1 and D n,2 .In the second subsection the focus is on comparing the powers of the proposed tests with some competitors under different alternative distributions.

Estimated critical values
In this Section we estimate through a Monte Carlo study the critical values of the proposed tests in finite samples.The sample sizes are chosen as 5, 10, 15, ..., 100, i.e. we have 20 different values of sample size n.For each sample size n, we generate random permutations of the integers 1, 2, ..., n.In each case we calculate the proposed test statistics D n,1 and D n,2 as in (2.2) and (2.3) respectively, under the assumption of no change.This setting is replicated r times.For each value of r, the obtained values for each statistic (D n,1 and D n,2 ) are sorted and then the (1 − α) th percentiles are obtained.The α values is chosen to be 10%, 5% and 1%.Finally, the values of r are fixed at 500, 1000, 2000, 3000, 5000 and 10000.
The critical values of the proposed statistics D n,1 and D n,2 , for r = 10000, are displayed in Table 1 and Table 2 respectively.The results of the remaining values of r are not reported for parsimony and because the qualitative conclusion is the same.However, the results are more stable with higher values of r.The asymptotic distributional values (nominal values) of the test statistics D n,1 and D n,2 , at each α, are included in Table 1 and Table 2.These values are corresponding to n = ∞ on the bottom of each table.Also, the results in Table 1 and Table 2 are depicted in Figure 1.The horizontal lines represent the nominal critical values at each significance level α.The simulation results show that the estimated critical values decrease as the sample size n increase.As the value of sample size n increases, the estimated critical values converge to their corresponding asymptotic points (nominal values).We noticed that the critical values converge faster to their nominal values when the number of replication r increases.It is also noticeable that the estimated critical values approach their asymptotic limits from above.Hence, the use of these estimated quantiles in small samples seem to give slightly liberal tests.
The aim of the simulation is to ensure that the estimated critical values of the proposed tests converge to their asymptotic distributional points.However, in statistical inference, the results can serve as critical values of the proposed tests at the corresponding sample size n and significance level α.

Powers comparison
As a change point rank test statistics, Lombard's test statistics are natural competitors to our proposed test statistics and it is tempting to compare their powers.The Lombard-type test statistics for the single abrupt change point problem are given by and where the constant β is to be taken 1.5 and 2.0.To calculate the powers, we simulate 10000 realizations of samples of size n = 10, under the alternative hypothesis in (1.2).Then, we compute the four test statistics D n,1 , D n,2 , L n,1 and L n,2 , in each realization at random shift positions m.Then for each replication of the 10000 realizations we obtain the fraction of times, when each test statistic exceeds its critical value at α = 5%.This whole simulation setting is then repeated for samples of size n = 20, 40 and 80. the results of these simulations are reported in Tables 3 -7.In the last two tables (Tables 6 and 7) we also report, in brackets, the powers of the four tests when the alternative distributions F := U (0, 1) and G := G i , i = 4, 5 are switched.That is F := G i , i = 4, 5 and G := U (0, 1).
Examining the entries of Tables 3, 4 and 5, we can see that the proposed new tests are superior over their competitors in all cases.The second new test D n,2 , have the highest powers as the shift position m move to the middle of the sample.
In Tables 6 and 7 the powers of the competitor tests were higher than the new tests.But when we switched the alternative distributions, the proposed tests outperform the competitors.This turning result indicates that the new proposed tests works very well when the alternative is in the form of one-sided hypothesis, i.e.F (x) ≤ G(x), for all x ∈ R or F (x) ≥ G(x), for all x ∈ R

Application to the Nile data
As an illustration we applied the new tests D n,1 and D n,2 to the Nile data Cobb (1978).These data represent the annual water discharge from the Aswan dam, for years 1871-1970, in units of 10 8 cubic metres.The data sequence is plotted in Figure 2.  Visual inspection of the sequence in Figure 2 indicates that there may be a change, in the sequence, around the year 1900.
These data have been studied by many authors in the area of change point problem.These studies indicate that there was a shift in the flow levels starting from the year 1898.This shift in 1898 is attributed partly to the weather changes and partly to the start of construction work for a new dam at Aswan.Cobb (1978) assumes that the Nile observations are independent normal variables with common variance for the whole sequence.He approximates the conditional distribution of the maximum likelihood change point estimator of the data.The results show that the year 1898 is the most likely change point.Cobb cites independent meterological evidence that this change is real.Carlstein (1988) proposes a nonparametric strongly consistent estimators for the change point location.Applying that to the Nile data, the results show the change location is at 1898.Dümbgen (1991) introduces asymptotically valid confidence regions for the change point location by inverting bootstrap tests.As an example, this method was applied to the Nile data and found that the 95% bootstrap confidence sets refer to the region given by the years [1896,1899].Zeileis, Kleiber and Krämer (2003) apply a dynamic programming algorithm for the dating of the break points to the Nile data and confirm the above results.
Applying the proposed tests D n,1 and D n,2 to these data, we have the following results.The first test D n,1 = 3.0098, which is significant at level less than 1%.Also, we found that the the shift position is at m = 28 (year 1898).For the second test D n,2 = 2.6758 which is also significant at level less than 1%.These results are in agreement with all the previous findings.

Figure 1 :
Figure 1: The empirical critical values of the proposed tests, (a): the first test D n,1 , (b): the second test D n,2 .The horizontal lines represent the appropriate nominal critical values at each α Table 1: Estimated critical values of D n,1

Figure 2 :
Figure 2: Graphical representation of the Nile data

Table 1 :
. The horizontal lines represent the appropriate nominal critical values at each α Estimated critical values of D n,1

Table 2 :
Estimated critical values of D n,2

Table 3 :
Estimated power percentage at α