Test Procedures for Change Point in a General Class of Distributions

Abstract: This paper is concerned with the change point analysis in a general class of distributions. The quasi-Bayes and likelihood ratio test procedures are considered to test the null hypothesis of no change point. Exact and asymptotic behaviors of the two test statistics are derived. To compare the performances of two test procedures, numerical significance levels and powers of tests are tabulated for certain selected values of the parameters. Estimation of the change point based on these two test procedures are also considered. Moreover, the epidemic change point problem is studied as an alternative model for the single change point model. A real data set with epidemic change model is analyzed by two test procedures.

The initial parameter θ 0 may be known or unknown. The change point k 0 (k 0 = 1, · · ·, n − 1) and the magnitude of change δ are unknown parameters. Without loss of generality, let δ ≥ 0. The following regularity conditions are needed.
The following example shows that these conditions are satisfied in two rich families of distributions.
Example 1. In the exponential family with density function f θ (x) = h(x) exp{ϕ 1 (θ)u(x) + ϕ 2 (θ)}, condition (i) is satisfied provided In the location family where the derivative f (·) exists, condition (i) is satisfied provided For example, for the logistic distribution L(θ, Condition (ii) is typically satisfied. Chernoff and Zacks (1964) considered the quasi-Bayesian change point analysis for independent normal observations. Kander and Zacks (1966) (KZ) extended the work of Chernoff and Zacks (1964) to the case of exponential family distributions. The nonparametric methods in change point analysis can be found in Brodsky and Darkhovsky (1993). Broemeling and Gregurich (1996) surveyed the Bayesian estimation of change point via resampling methods. An excellent reference in change point analysis is Csörgő and Horváth (1997). Gupta and Ramanayake (2001) used KZ's quasi-Bayes method to study the epidemic change point in exponential distribution. For more references see Hjort and Koning (2002) and Habibi et al. (2005) among the other.
In this note, we consider quasi-Bayes and likelihood ratio test procedures to detect a change in a general class of distributions. This paper is organized as follows. The quasi-Bayes test is studied in Section 2. The exact distribution of the test statistic in some special cases and its asymptotic distribution in general cases are also derived. Section 3 contains the exact and asymptotic distributions of the likelihood ratio test statistic. The performances of the two test procedures are compared in Section 4. Estimation of the change point based on two test procedures is also considered in this section. Section 5 considers the epidemic change point model which is an alternative model for the single change point model. A real data set is also considered in this section. This paper although is extension of an old paper however its approach in presenting the results in term of stochastic integrals is interested. It also considers change point detection in general class of distribution with single and epidemic change point model, a topic which is not considered before.

Quasi-Bayes Test
In this section, following KZ the quasi-Bayes test statistic is derived. Assume that k 0 = [nt 0 ], for some unknown t 0 ∈ (0, 1). We consider the point t 0 as a random variable with prior density π(t), t ∈ (0, 1). First, suppose that θ 0 is known. The marginal likelihoods of the sample under H 0 and H 1 are n k=1 f θ 0 (x k ) and respectively, and so the marginal likelihood ratio function under H 1 to that under H 0 is given by Following KZ, as δ → 0, then the marginal likelihood ratio can be approximated by and it can be expressed by Then to test H 0 the corresponding test statistic becomes By partitioning [0, 1] to n equal subdivisions, it can be shown that KZ derived the test statistic T π n in exponential families. The test procedure based on T π n is locally most powerful (see KZ). Under the noninformative prior π(t) = 1 for t ∈ (0, 1) then the test statistic will be obtained. Habibi et al. (2005) studied the behavior of this test statistic.
Example 2. The exact null distribution of T n can be found in some special cases. In exponential families T n reduces to KZ test statistic. So the exact null distribution of T n can be obtained in the normal, exponential, and binomial distributions (see KZ). The exact null distribution of T n can also be found in the logistic distribution as follows. Without loss of generality, let θ 0 = 0. It is easy to verify that where F (·) is the distribution function of the standard logistic distribution L(0, 1). Let S n = n i=1 (i − 1)F (X i ) and g n (·) be the density function of S n . Then S n = −n 2 Tn 2 + n(n−1) However, since the exact distribution of T π n ( or T n ) is very complicated in many cases, the asymptotic distribution of T π n is considered in Theorem 1. Suppose that σ 2 = I(θ 0 ), the Fisher information computed at θ 0 . Theorem 1. Assuming regularity conditions (i) and (ii) and under the null Proof. Consider the stochastic process S n (t) as follows: is the standard Brownian motion on [0, 1] and d is Skorokhod metric (see Billingsley, 1968). The map Λ defined as Integration by part can be applied to show that Remark 1. When the initial parameter θ 0 is unknown, then θ 0 is substituted by θ 0 , the maximum likelihood estimate of θ 0 under the null hypothesis, resulting in the following test statistic: It is easy to show that under the null hypothesis (since θ 0 Example 3. As a special case of Remark 1, consider a sequence of independent random variables X i such that The initial mean θ 0 is unknown and it is replaced by X n . Then, the test statistic is given by It is seen that Next, the asymptotic distribution of T n under the alternative hypothesis is considered. To do so, the following extra condition is assumed. Let µ θ 1 = E θ 1 (g(θ 0 , X k 0 +1 )) and I(θ 0 , θ 1 ) = V ar θ 1 (g(θ 0 , X k 0 +1 )).
Corollary 2. Under (i), (ii), (iii) and H 1 then Although, deriving Corollary 2 from Theorem 1 is straightforward, but we present a proof briefly.
Corollary 3. The approximate power of test in size α based on T n is given by ).

Remark 3.
We can estimate the location of change point using the quasi-Bayesian test. To see this in details, we consider the special case X i = θ 0 +δI(i ≥ k 0 + 1) + N i , where N i are iid random variables from N (0, 1) distribution and δ > 0. The change point estimator k n based on quasi-Bayes test is given by that is U [nt] is pretty close to its mean function E(U [nt] ). Then to study limiting behavior of U [nt] , it is enough to study the limiting behavior of E(U [nt] ). It is easy to see that E(U [n·] ) → U (·), where This shows the consistency of t n = > kn n , that is t n p → t 0 , as n → ∞ (see Bai, 1994).

Likelihood Ratio Test
Here, the likelihood ratio test is considered to test the null hypothesis of no change point. First, assume θ 0 is known. The likelihood ratio function under H 1 to that under H 0 is given by It is easy to verify that as δ → 0 + , then the likelihood ratio function can be approximated by (see Section 2). One would reject H 0 whenever the observed value of T * n is large, where One can show that under the null hypothesis H 0 , as n → ∞, then where N is distributed as standard normal distribution (see Billingsley, 1968).

Remark 4.
The likelihood ratio test statistics T * n is larger than the quasi-Bayes test statistic T π n . To see this, note that This shows that the critical values of likelihood ratio test are larger than the values for the quasi-Bayes test.
Remark 5. When the initial parameter θ 0 is unknown, again θ 0 is substituted by θ 0 , the maximum likelihood estimate of θ 0 under the null hypothesis, resulting in the following test statistic: It is easy to show that under some mild conditions then Example 4. To see Remark 5, consider the special case Since θ 0 is unknown it is estimated by X n and the test statistic is given by Under the null hypothesis where B(·) is the standard Brownian bridge on [0, 1]. Since B(·) d = −B(·), the continuity theorem implies that √ n T * n d → sup 0<t<1 B(t). Under the null hypothesis, random vector v = (v 1 , ..., v n−1 ) has a multivariate normal distribution N n−1 (0, Σ) with (see Hawkins, 1977). Under H 1 , then v ∼ N n−1 (δµ, Σ), where µ = (µ 1 , · · ·, µ n ) with n. The exact distribution of T * n is the distribution of maximum of a multivariate normal. Then the α-th quantile of T * n is the α-th equi-quantile of a multivariate normal distribution which is considered by Genz (1992).
Remark 6. The change point estimator k n based on the likelihood ratio test when δ > 0 is given by This fact suggests plotting V k for k = 1, 2, ..., n − 1. The first point k n at which V k attains its minimum is the likelihood ratio change point estimator.

Comparisons
In this section, we compare the performance of the quasi-Bayes and likelihood ratio tests by studying their significance levels and powers. The significance levels of quasi-Bayes and likelihood ratio tests are α n and α * n respectively, where   In what follows, we compare the rate of convergence of α n and α * n to α in the case of logistic distribution. For a given n, we compute α n and α * n using a Monte Carlo experiment with R = 20000 repetitions. Let α nR ( α * nR ) be the number of times that the null hypothesis H 0 of no change is rejected based on the quasi-Bayes test (likelihood ratio test) over R. The SLLN guarantees that α nR ( α * nR ) (see Tables 1, 2) is pretty close to α n (α * n ). The rates of convergence of α n and α * n to α seem good although it seems α n converges to α a little faster.

Approximated power of two tests
Here, we compare the powers of two test procedures in the logistic observations L(δ, 1). The power of quasi-Bayes test β α (δ) (see Corollary 2) are given in Table  3 for α = 0.05 and k 0 = 1, 3, ..., 49. Table 3 also contains the power of likelihood ratio test β * α (δ) which is estimated using a Monte Carlo simulation study with R = 20000 repetitions. In order to keep the table in reasonable size, only the case of sample size n = 50 and magnitude of changes (δ 1 , δ 2 ) = (0.09, 1) with a significance level α = 0.05 is reported. It is seen from the Table 3 that the power of quasi-Bayes test is larger than the power of likelihood ratio test in all cells. The power of likelihood ratio test is too small for δ 1 = 0.09. Higher powers for two tests are achieved if k 0 occurs in the beginning of the sequence.

Epidemic Change Point
The epidemic change point model is an alternative for the single change point model. Yao (1993) published a survey of the available test procedures together with their comparisons. Brodsky and Darkhovsy (1993) constructed estimators for change points and studied their properties. In this section, the epidemic change point is considered in a general class of distributions. Epidemic change point analysis has many applications in practice and studying it in a general class of distribution is an interested topic. Consider a sequence of independent random variables X 1 , · · ·, X n whose density functions are f θ i (x i ), θ i ∈ Θ, i = 1, · · ·, n, one has to test the null hypothesis H 0 : θ 1 = · · · = θ n = θ 0 , against the alternative hypothesis H 1 : θ i =    θ 0 i = 1, 2, · · ·, k 0 , θ 0 + δ i = k 0 + 1, · · ·, k 1 , θ 0 i = k 1 + 1, · · ·, n.
Similar to Section 2, the quasi-Bayes test will reject H 0 , when T πe n is large, where

Remark 7.
When θ 0 is unknown, it is estimated by θ 0 . The above asymptotic distributions of quasi-Bayes and likelihood ratio statistics are held by replacing W (·) with B(·), the standard Brownian bridge on [0, 1].

Stanford Heart Transplant Data
The data set is (taken from Kalbfleisch and Prentice, 1980) contains 35 patients with known age groups. The average survival time of the patients were indexed by age group. There can be doubts about the exsitence of an epidemic change in the sequence. To check this possibility, we performed the two test procedures for this data set. The p-values of quasi-Bayesian and likelihood ratio tests are 0.0235 and 0.0552, respectively. We can reject the null hypothesis of no change, in favor of an epidemic change for this data set. The ML estimators of two change points are 29 and 48 years, respectively.