Multiple Change Point Analysis for the Regular Exponential Family using the Product Partition Model

Abstract: As an extension to previous research efforts, the PPM is applied to the identification of multiple change points in the parameter that indexes the regular exponential family. We define the PPM for Yao’s prior cohesions and contiguous blocks. Because the exponential family provides a rich set of models, we also present the PPM for some particular members of this family in both continuous and discrete cases and the PPM is applied to identify multiple change points in real data. Firstly, multiple changes are identified in the rates of crimes in one of the biggest cities in Brazil. In order to illustrate the continuous case, multiple changes are identified in the volatility (variance) and in the expected return (mean) of some Latin America emerging markets return series.


Introduction
Most of the methodologies considered in change points analysis assume that the number of change points is known and fixed (see Chen and Lee, 1995;Geweke and Terui, 1993;Hawkins, 2001, and many others).Other authors have studied the one-change-point problem using a Bayesian approach (see Menzefricke, 1981;Hsu, 1984;Smith, 1975, for example).The product partition model (PPM) developed by Hartigan(1990) introduces more flexibility into the analysis of these problems since it considers the number of change points as a random variable.As shown by Barry and Hartigan (1992), by applying the PPM one can easily obtain product estimates for the parameters of interest at each point of the time, the posterior distribution of the random partition generated by the change points, and also the posterior distribution of the number of change points.Applications and extensions of the PPM can be found in Barry and Hartigan (1993), Crowley (1997), Quintana and Iglesias (2003), Loschi and Cruz (2005) and others (see more on Bayesian approach of multiple change points in Elliott and Shope, 2003).
Despite of all flexibility introduced by the PPM in the analysis of change point problems, in many situations, we have noticed that the posterior distribution for the random partition, as originally defined by Barry and Hartigan (1993), puts similar or equal mass for a large number of partitions.For such cases, it is a hard task to identify the change points.One way to measure the evidence of a change is to compute the posterior probability of each instant being a change point as established by Loschi and Cruz (2005).Another way to solve this problem is to consider the theoretical decision approach for choosing the best partition introduced by Quintana and Iglesias (2003).
In this paper we apply the PPM and its extensions proposed by Loschi and Cruz (2005) to identify multiple change points in the parameter that indexes the regular exponential family.We suppose that only contiguous blocks are possible and consider the prior cohesions proposed by Yao (1984).The change point model considered by Yao (1984) is a discrete time version of the model discussed by Barnard (1959).We also present the PPM for some particular members of the exponential family in both discrete and continuous cases.Consequently, we extend to a more general family the methodology earlier developed by Loschi et al. (2003) to identify change points in normal means and variances.
Two real data sets are analyzed.Firstly, to illustrate the discrete cases, we identify multiple change points in the rates of crimes in Belo Horizonte, one of the biggest cities in Brazil.Then, multiple change points are identified in the volatility (variance) and in the expected return (mean) of some Latin America emerging market return series, illustrating the use of the PPM in the continuous case.
This paper is organized as follows.In Section 2, the parametric PPM is defined.Some computational procedures proposed in the literature to manage the PPM are also described.In Section 3, we apply the PPM to the exponential family and to some particular distributions.In Section 4, the results are applied to the analysis of two real data sets.Finally, in Section 5 some conclusions close the paper.

The Product Partition Model
In this section we briefly describe the PPM as well as its extensions to compute the posterior probability of each instant being a change point.The computational method involved is also presented.

Consider a random partition
Consider that each partition divides the data sequence X 1 , . . ., X n into B = b contiguous subsequences (blocks), which are denoted here by . . . , b. Let c [ij] be the prior cohesion associated with the block [ij] = {i + 1, . . ., j}, for i, j ∈ I ∪ {0}, and j > i, that represents the degree of similarity among the observations in X [ij] and that may be interpreted here as transition probabilities in the Markov chain defined by the change points.
Let 0 ≤ p ≤ 1 be the probability that a change occurs at any instant in the sequence.The prior cohesion for block [ij] proposed by Yao (1984) is given by: for all i, j ∈ I, i < j.These prior cohesions imply that the sequence of change points establishes a discrete renewal process, with occurrence times identically distributed with geometric distribution.
Let θ 1 , . . ., θ n be a sequence of unknown parameters conditional on which the sequence of random variables X 1 , . . ., X n has conditional marginal densities f 1 (X 1 |θ 1 ), . . ., f n (X n |θ n ), respectively.The prior distribution of θ 1 , . . ., θ n is constructed as follows.Given a partition ρ = {i 0 , . . ., i b }, for b ∈ I, one has that are independent and also independent from p, with θ [ij] having prior (block) density π is the parameter space corresponding to the common parameter, say, θ [ij] = θ i+1 = . . .= θ j , that indexes the conditional density of X [ij] .Hence, the PPM for Yao's prior cohesions is defined as follows.
ii) Given ρ = {i 0 , . . ., i b } and p, the sequence X 1 , . . ., X n is independent from p and has the joint density given by: (θ)dθ is the joint density of the random vector, called data factor.
It is also shown that the posterior distribution of θ s is given by: and that the posterior expectation (or product estimate) of θ s is given by: . ., X n ) denotes the posterior relevance for the block [ij].More details can be found in Barry and Hartigan (1992).
Let us assume that p has prior distribution π(p).As a consequence of these assumptions, it follows that the posterior distributions of ρ and B are given, respectively, by: and where C b denotes the set of all partitions of I into b contiguous blocks.
Because the product estimates are strongly influenced by the prior distribution of p, it is important to obtain its posterior distribution.Assuming that p ∼ π(p), such posterior distribution is given by: in which the summation is over all partitions of I.
An evidence of a change point is the posterior probability of each instant being a change point.Let C l be the set that contains all partitions that include the instant l as a change point, that is, each partition in C l assume the form {i 0 , . . ., i k−1 , i k = l, i k+1 , . . ., i b } for any k ∈ I.The event A l denotes that the instant l is a change point, for l = 1, . . ., n − 1.Thus, the posterior probability of A l is given by: where the summation in the denominator is over all partition of I.

Computational methods
To compute the posterior distributions of ρ, B and p, the product estimates of θ k , and the posterior probability of A l , the following algorithm was proposed in the literature.Assume that, given ρ, θ l = θ i+1 = . . .= θ j , for l = 1, . . ., n and i, j ∈ I, i < j.Let X [0n] = (X 1 , . . ., X n ) and θ = (θ 1 , . . ., θ n ) and denote by θ −l the vector (θ 1 , . . ., θ l−1 , θ l+1 , . . ., θ n ).The full conditional distributions of p, ρ, and θ l , for l = 1, . . ., n are given, respectively, by: Notice that, because all partitions must be considered, it may be very difficult to sample directly from the full conditional distribution of ρ in case of long sequences are assumed.Let us define the auxiliary random quantity U l , such that U l = 1, if θ l = θ l+1 , and U l = 0, otherwise, for l = 1, . . ., n − 1.Notice that the random partition ρ is immediately identified by considering vectors U = (U 1 , . . ., U n−1 ) of these random quantities.Each partition (U s 1 , . . ., U s n−1 ), s ≥ 1, is generated by using Gibbs sampling by considering the following ratio: (2.3) for r = 1, . . ., n − 1, and where x is the last change point before r and y is the next change point following r ( see, Loschi and Cruz (2005) for all the details).

The Product Partition Model for the Regular Exponential Family
In this section we extend the PPM to identify multiple change points in the parameter that indexes the exponential family.In order to permit a tractable implementation of the PPM, some results related to the conjugacy for the regular exponential family established in the literature are considered (see Bernardo and Smith, 1994, for example). Let It follows that the common parameter θ [ij] for the block X [ij] has the following conjugate prior distribution: where τ [ij] = (τ As a consequence of such assumptions it follows that the prior predictive distribution of ) and the block posterior distribution of θ [ij] , given Consequently, from expressions (3.3) and (2.1) the posterior distribution of θ s , s = 1, . . ., n assumes the following expression: However, we should notice that the general formula for the product estimates can be specified only for the natural parameter.In such a case, we should consider some results presented in Diaconis and Ylvisaker (1979).The product estimates for special members of the exponential family will be shown shortly in Sections 3.1 to 3.6.
From (2.3) and (3.2), the estimates of the posterior distributions of ρ, B, A l and p as well as the estimates of the posterior relevancies can be generated considering the following expression: Notice that, for the exponential family, the sampling scheme is simplified since it is not necessary to know the complete expression of the predictive distribution.
Since the exponential family provides a rich set of models the PPM will be applied to some particular members of this family, as follows.

Bernoulli data sequence
1 ).From (3.1) it follows that the conjugate prior distribution for the common parameter θ [ij] is the beta distribution with parameters τ 1 + 1), whose density function is: where 1 − 1 and 1 − j k=i+1 X l + 1.Thus, from (2.2) the product estimates of θ s , s = 1, . . ., n, are given by: The posterior relevancies and the posterior distributions of ρ, p, B and A l can be generated by substituting (3.5) into (2.3).
Let τ [ij] = (τ 1 ).From (3.1) it follows that the conjugate prior distribution for the common parameter θ [ij] , related to the block [ij] is the gamma distribution with parameters τ 1 ), with density function given by: (3.8) Because we have that t 0 + j − i + 1.Thus, from (2.2) the product estimates of θ s , s = 1, . . ., n, are given by: The posterior relevancies and the posterior distributions of ρ, p, B and A k can be generated by substituting (3.8) into (2.3).

Normal data sequence with unknown means
Let θ l ∈ R, l = 1, . . .n. Assume that, conditional on θ 1 , . . ., θ n , the sequence X 1 , . . ., X n , X l ∈ R, l = 1, . . ., n, are independent and X l |θ l has normal distribution with unknown mean θ l and the same known variance σ 2 , denoted by X l |θ l ∼ N (θ l , σ 2 ).Consequently, for the block X [ij] we have that 1 ).From (3.1) it follows that the conjugate prior distribution for the common parameter θ [ij] is the normal distribution with mean , whose density function is: where τ (3.11) Since from (3.9) we have that t . Thus, from (2.2) the product estimates of θ s , s = 1, . . ., n, are given by: The posterior relevancies and the posterior distributions of ρ, p, B and A k can be generated by substituting (3.11) into (2.3).For different approaches for this PPM model see Barry and Hartigan (1993) and Crowley (1997).

Applications
We shall now illustrate some of the results given in the previous sections by applying them to two real data sets.As an example for the discrete case, we consider the series of counts of violent crimes in a central neighborhood of Belo Horizonte, the capital city of the State of Minas Gerais, Brazil.Given the rate, we assume that the number of crimes are distributed according to a Poisson distribution.For illustrating the continuous case, we consider the return series of four Latin American emerging market indices and assume that, conditional on the expected return and in the volatility (measured here as variance), the returns are normally distributed with unknown mean and variance.
In order to estimate the posterior distributions 10,000 samples of 0-1 values were generated with the dimension of the time series, starting from a sequence of zeros.Since the convergence was reached before the 1,000th step (not shown), the initial 1,000 iterations were discharged as burn-in.In order to avoid correlation among vectors a lag of 10 was selected.Further discussions about the number of iterations to be discharged, as well as the lag to be taken, can easily be found in the literature (Gamerman, 1997).

Change point analysis for criminality data
In this section we consider the series of counts of violent crimes in a central neighborhood (5th CIA) of Belo Horizonte, recorded monthly from January, 1998, to September, 2001.The time series is plotted in Figure 1  estimates for the rate of crimes.Our interest is to verify if the "Policing with Results" a program introduced by the State of Minas Gerais Military Police Command in the late 1990s, made the crime rate to decrease.We assume that, given the rate, the number of violent crimes within the same block are independent and distributed according to the Poisson distribution P(θ [ij] ).We also adopt the natural conjugate prior distribution for the rate of crimes θ [ij] which, in this case, is a gamma distribution.As we do not have a precise information about the rate of violent crimes, we assume that θ [ij] ∼ G(2.01, 0.001) which is a low informative prior distribution and has mean and standard deviation equal to 2.01 × 10 3 and 1.43 × 10 3 violent crimes per month, respectively.
Since a small number of changes is expected in the series, we also assume that the probability p of having a change in any instant has a beta prior distribution  with parameters α = 1.5 and β = 28.5 (see detail in Section 4.2).Consequently, we are considering in the prior evaluation that the expected number of blocks in the series is 2.2 and the standard deviation is 2.2 blocks.
From Figure 1 we can notice the increase of the rate of crimes from January, 1998, to December, 2000.The rate reaches its maximum in December, 2000 (approximately 68 crimes per month), and decreases in January and June, 2001.These reduction could be a positive effect of the "Policing with Results" program.
Figure 2 presents the posterior distribution for the probability of occurring a crime in any instant p.This distribution is slightly asymmetric and has only one mode.The posterior distribution of p concentrate most of its mass in values close to 0.20.
Notice from Table 1 that, if a square loss function is considered, the posterior estimate for the probability of a crime being committed in any instant is 19.71% which is higher than the prior estimate, 5.0%.The posterior estimate for the number of change points (B − 1) is also higher than in the prior evaluation (mean = 13.03crimes per month).The posterior variance of B is small which means that there is less uncertainty about B.
From Figures 3 and 4 we can see the posterior most probable partition and the posterior probability of each instant being a change point, respectively.The posterior most probable partition occurs with probability 40.78% and indicates that the following months are change points: June and October, 1998;March, July, and November, 1999;February, May, August, October, and November, 2000;and January, April, and July, 2001.Beside this, we can also observe that the following months has probability higher than 80% of being a change point: July and December, 1998;April, August, and December, 1999;March, June, September, and November, 2000;and March and June, 2001.

Change point analysis for Latin American emerging indices
In this section we compare the behavior of four important Latin American stock market indices by means of their return series: the MERVAL of Argentina ( Índice de Mercado de Valores de Buenos Aires), the IBOVESPA of Brazil ( Índice da Bolsa de Valores do Estado de São Paulo), the IPSA of Chile ( Índice de Precios Selectivos de Acciones), and the IPyC of Mexico ( Índice de Precios y Cotizaciones), from October 31, 1995, to October 31, 2000, recorded fortnightly.A return series is defined by using the transformation R t = (P t − P t−1 )/P t−1 , where P t is the last price observed in the tth fortnight.In this period, three important financial crises involving emerging markets occurred: Mexico's crisis, in January, 1995, Asia's crisis, in August, 1997, and Russia's crisis, in July, 1998.As it is well-known, these events could eventually produce changes in Latin American stock market behavior.It is noticeable from Figure 5 that the behavior of the returns of all these indices suggests the existence of some changes in the variance and in the expected return.In fact, Lochi et al. (2005) proposed a PPM to analyze the behavior of the volatility of these indices concluding that they possess volatility clusters.Our purpose here is to extend Loschi et al.'s (2005) analysis to show that within the period from 31 October, 1995, to 31 October, 2000, MERVAL, IBOVESPA, IPSA, and IPyC series possess both expected return and volatility clusters and also to provide the probability of each instant being a change point for each return series.
We suppose that returns within the same block are conditionally independent and distributed according to the normal distribution N (µ [ij] , σ 2  [ij] ).We also adopt the natural conjugate prior distribution for the parameter (µ [ij] , σ 2 [ij] ) which, in this case, is the normal-inverted-gamma distribution such that ), where a [ij] and b [ij] assume the different values pointed out in Table 2. Consequently, we are assuming that the returns are distributed according to a Student-t PPM, which has heavier tails than the normal distribution and discloses a structure of correlation amongst the returns within the same block.Table 2 presents the descriptive statistics for prior distributions of σ 2 [ij] for each index.Notice from Table 2 that we are assuming that in the prior evaluation the IPSA has the smallest volatility (the prior mean and mode are 1.67 × 10 −4 and 1.00×10 −4 , respectively).The volatility for MERVAL, IBOVESPA and IPyC are considered the same (the prior mean and mode are 2.50 × 10 −4 and 1.25 × 10 −4 , respectively).Observe also that there is less uncertainty about the volatility of IPSA since the variance for the volatility is the smallest.
Since a small number of changes is expected to MERVAL, IBOVESPA, IPSA, and IPyC, we also assume that the probability of having a change in any instant p has a beta prior distribution with parameters α = 5 and β = 50.This prior distribution has modal value equal to 0.076, mean equal to 0.091, standard deviation equal to 3.84 × 10 −2 , and concentrate most of its mass in small values of p.We can observe that this prior specification implies that the expected number of blocks in the prior evaluation for all indices is 11.82 and the standard deviation is 5.53 blocks.The modal value of the number of blocks is 10.0 which means that the most probable number of change points in the four indices is 9.0.That is, for us the four indices are equally susceptible to shocks and to the global political atmosphere.Figures 6 and 7 present the product estimates of the fortnightly expected returns and volatilities for MERVAL, IBOVESPA, IPSA, and IPyC return series, respectively.We can notice that all indices present changes in both expected return and volatility.It is also noticeable that the product estimates for the volatility tends to increase through time and the product estimates for expected return tends to decrease for all indices.Important changes can easily be identified.In general, comparing the four indices we notice from Figure 8 that IPSA presents the smallest volatility and expected return.From the 1st fortnight, July, 1997, IBOVESPA presents the highest volatility and the expected return for IBOVESPA is smaller than that we obtain for MERVAL and IPyC.Before the 1st fortnight, July, 1997, the product estimates for the volatility of IBOVESPA and IPSA are very close.We also can notice that important changes in all indices occur almost simultaneously.
It is noticeable from Figure 9 and Table 3 that for all indices the posterior distribution for the number of blocks are asymmetric, there are unique modes and they typically concentrate most of their mass in small values.The posterior estimates for the number of blocks in each index are much smaller than we considered in the prior evaluation (mean, 11.82, and mode, 10).That is, these posterior distributions disclose that all four indices are more stable than we considered in the prior evaluation.Moreover, we can perceive that MERVAL and  Table 5 presents the posterior most probable partition for MERVAL, IBOVE-SPA, IPSA, and IPyC series.We see that the posterior most probable partitions for MERVAL, IPSA, and IPyC indicate that there are no changes in these indices with probability 65.89%, 26.78% and 99.56%, respectively.The posterior most probable partition for IBOVESPA {0, 40, 115, 120} indicates that this index experiences changes in its behavior in the observations 40 (2nd fortnight, June, 1997) and 115 (1st fortnight,June,2000). Notice that this partition occurs with probability 13.11% only.
Figure 10 presents the probability of each instant being a change point.We notice that for MERVAL, IPSA, and IPyC the probability of each fortnight being a change point is less than 20.89%.For IBOVESPA we observe that the 1st fortnight, July, 1997, is a change point with probability 35.56%, the 2nd fortnight, August, 2000, has probability 59.67% of being a change point and the other instants have probability not superior to 21.34% of being a change point.
Figure 11 and Table 6 present the posterior distributions for p for each index and their descriptive statistics, respectively.Notice that all these distributions are asymmetric, have unique modes, and typically concentrate most of their mass in small values.We can notice for all indices that the posterior estimates for the probability of a change in any instant are smaller than that we have assumed in the prior evaluation (prior mean = 0.0909, for all indices) and it is different for each index.We also notice that the probability p of a change in any instant is highest for IBOVESPA (posterior mean= 0.0439) and is smallest for IPyC (posterior mean = 0.0287).
Notice that the change points identified by the PPM in MERVAL, IPSA, and IPyC indices are close to an important international event, Asia's crisis in August, 1997.

Summary and Conclusions
The product partition model (PPM) was defined for Yao's cohesions and applied to the identification of multiple change points in the parameter which index the exponential family.Because the exponential family provides a rich set of models the PPM was also defined the PPM for some particular members of this family.
In order to illustrate the use of PPM, a series of counts of violent crimes in Belo Horizonte, Brazil, and some Latin American emerging market indices (MERVAL-Argentina, IBOVESPA-Brazil, IPSA-Chile, and IPyC-Mexico) were analyzed.We concluded that the rate of violent crimes in a particular neighborhood of Belo Horizonte is high and presents a high number of change points.We also concluded that all indices possess clusters in volatility and expected return and the changes are almost simultaneous.We noticed that the changes experienced by MERVAL and IPyC have less magnitude and, as expected, IPSA felt later the effect of the crisis.
The results indicate that the PPM effective as it may provide useful inferences.Mainly, for the particular data sets analyzed here, it could be observed that the posterior probability of each instant being a change point provides an efficient tool for decision making.
Despite of the good performance of the PPM in the data analyzed here, some other Bayes estimates could be considered in case of the interest is not a retrospective analysis as considered here, see for instance, Quintana and Iglesias (2003).

Figure 1 :Figure 2 :
Figure 1: Product estimates for the rate of crimes.

Figure 3 :
Figure 3: The posterior most probable partition

Figure 4 :
Figure 4: Posterior probability of a change.

Figure 7 :
Figure 7: Product estimates for the volatility.

Figure 10 :
Figure 10: Posterior probability of a change point.

Table 1 :
Descriptive statistics for the posterior distributions of B and p.

Table 2 :
Parameters and descriptive statistics for the prior distribution of the volatility.

Table 3 :
Descriptive statistics for the posterior distribution of the number of blocks.

Table 5 :
The posterior most probable partition and their prior and posterior probabilities.

Table 5 :
Descriptive statistics for the posterior distribution of p.