On a Stepwise Hypotheses Testing Procedure and Information Criterion in Identifying Dynamic Relations between Time Series

This paper studies an effective stepwise hypotheses testing procedure in identifying dynamic relations between time series, and its close connection with popular information criteria such as AIC and BIC. This procedure, labeled M2, extends Chen and Lee’s (1990) procedure to cover both the strong and weak form dynamic relations; and to be used with a guided choice of significance levels which are adapting in nature. Intuitively, procedure M2 can be viewed as a backward-elimination approach that simplifies the all-possible pairwise comparisons approach implied by information criterion. New insights concerning identification of strong and weak form dynamic relations using these approaches are given. Extensive simulation experiments are conducted to illustrate the performance of the IC and M2 approach in different settings. For applications, we study the dynamic relations between price level and interest rate in US and UK, and the robustness of the model identified is also addressed.


Introduction
Popular model selection criteria such as Akaike information criterion and Schwarz's Bayesian information criterion are widely used and successfully implemented in many applications.In general, using the information criterion (IC) approach, the decision for choosing an appropriate model depends upon a tradeoff between the goodness-of-fit and the complexity of the model.Other commonly used criteria can be found, for example, in Billah, Hyndman and Koehler (2005).On the other hand, stepwise hypotheses testing approach is also a popular model selection strategy in specific applications.For instance, in choosing the order of vector autoregressive model, a sequential likelihood ratio tests procedure can be used to determine if the next higher order AR term is needed (e.g., Reinsel (1993, chap.4).In phylogenetics, a popular strategy to select model is hierarchical likelihood ratio tests (see, for example, Posada and Buckley (2004) and the references therein).In choosing an appropriate dynamic relation between time series, a stepwise likelihood ratio tests procedure is developed in Chen and Lee (1990).
The information criteria and hypotheses testing are apparently different approaches for model selection, and it is important to study their connection.First, the basic connection between the penalty term in information criterion and significance level in hypotheses testing is well known (e.g., Smith and Spiegelhalter (1980) and Aitkin (1991)).Second, the IC approach can be viewed as a decision procedure that involves all-possible pairwise testing of the candidate models, with the critical values used in the testing approach determined by the penalty terms of the criteria (e.g., Pötscher (1991)).Stoica, Selén and Li (2004) discussed an implementation of generalized likelihood ratio tests which is equivalent to IC rule and applied it for sparse models.
In this paper, we study an extended Chen and Lee's (1990) procedure and investigate its interesting and close connection to popular information criteria in the context of specifying an appropriate dynamic relation between time series.
In many business and economics applications, it is crucial to identify the presence or the absence of specific dynamic relations between the time series of interest.For instance, does full feedback relation exist between the series, or the relationship is solely uni-directional or even independent.In addition, is there concurrent association between the series or not.Following the concept of Granger's causality, such relations can be represented by imposing specific restrictions on the model parameters within the vector ARMA model.See, for example, Chen and Lee (1990) and the references therein.They proposed an effective sequential inference procedure based on a series of likelihood ratio tests on various model structures to identify the best structure.It is used to study the price and interest rate relationship and shed light on the Gibson's paradox.Chen and Wu (1999) extended the procedure to more than two series, and studied the dynamic relations among corporate dividends, earnings and prices.
A natural question arises as how does this effective procedure compare to classic model selection criterion such as AIC and BIC?We begin with extending Chen and Lee's (l990) procedure to handle dynamic relations both in the weak and strong forms in Section 2. The strong form excludes concurrent association between the series while the weak form does not.See, for example, Geweke (1986) who emphasized the importance of distinguishing the presence of concurrent relation in economics.The extended procedure, labeled M2, begins with determining an appropriate form (weak or strong) for the dynamic relations, then seeks to move up a hierarchy of models with increasingly simplified structure.A model is selected at the simplest level the procedure can move up to, without significantly reducing the likelihood.It is important to point out that the choice of significance level to accompany the various tests is critical to the success of the procedure.To assess the effect, we perform an extensive simulation study.We experiment with a wide range of significance levels and find that a better choice in our setting is around 2.5%, which is about the usual level choice of 1 to 5%.Furthermore, when M2 is conducted at 2.5% level, the procedure gives an encouraging overall empirical probability of about 82% in correctly identifying the different types of dynamic relations under consideration.The rationale behind such preferred level choice, and a even better choice of using adapting levels, will be addressed in Section 4.
Section 3 turns to study the performance of information criterion in identifying both strong and weak form dynamic relations.To accommodate strong form relation, an effective number of parameter is used in the IC rule.Simulation results show this adjustment works well.To the best of our knowledge, this adjustment has not been studied in the literature.A variety of penalty weights including AIC, BIC, lighter and heavier weights relative to BIC are investigated in the simulation.Results in Section 3.1 show that BIC serves well in selecting the right dynamic relation overall.Its overall empirical probability in identifying the model correctly is about 84%, which is only about 2% higher than using M2.Moreover, we note that the performance of M2 when conducted at 12.5 to 15% significance level is close to AIC.This is striking as M2 and IC are apparently different approaches, but the results suggest a close connection between them in fact exists.
It is well known that when casted in the testing framework, the IC approach can be viewed as a decision procedure that involves all-possible pairwise comparisons of the candidate models (see Smith andSpiegelhalter (1980), Aitkin (1991), Pötscher (1991) and Stoica, Selén and Li (2004) for further discussions).In this light, procedure M2 when used with adapting significance levels can be viewed as a backward-elimination approach that simplifies the all-possible searching procedure, by taken into consideration model structure.A noted feature of procedure M2 is that the model under the null hypothesis is always nested within the alternative, and the number of restrictions is usually small due to the stepwise structure.This feature is appealing because the null and alternative hypotheses are not necessarily nested under the all-possible comparisons approach.See Vuong (1989) for likelihood ratio tests for non-nested hypotheses.Here, the notion that a guided choice of significance levels can be made according to the penalty term leads to further improvement of M2.Simulation studies in Section 4.1 confirm that the performance of using M2 with adapting significance levels improves to about 84%, which is essentially the same as using BIC.
For empirical applications, we revisit the price level and interest rate series in Chen and Lee (l990).We strengthen their result by further distinguishing between the strong and weak form relations in the dynamic structure.Moreover, we also illustrate how the robustness of the dynamic relation selected can be effectively assessed through the use of a variety of adapting significance levels and penalty weights.
The rest of the paper is organized as follows.Section 2 studies procedure M2 in choosing dynamic relations.Section 3 studies information criteria in choosing dynamic relations including both the strong and weak forms.Section 4 studies their connection and improves M2 by using adapting significance levels.Section 5 presents empirical applications and concludes the paper.

Choosing Dynamic Relations with a Stepwise Hypotheses Testing Procedure
Consider a stationary and invertible vector ARMA (VARMA) model where t = (a t , b t ) is an identically and independently distributed Gaussian noise with mean zero and covariance matrix Σ which consists of elements σ i,j , i, j = 1, 2. The φ ij and θ ij are the usual AR and MA components and B is the back-shift operator.In many applications, it is important to determine an appropriate dynamic relation between the time series given the observed data.For instance, is the relationship uni-directional or in feedback form; and whether the errors are contemporaneously related or not.
These various dynamic relations under examination lead to different restrictions to be placed in the model parameters.The parametric constraints and the corresponding hypotheses which are sufficient (but not necessary) for various dynamic relations to hold are summarized in Table 1.In the table, a strong form relation has the same structural parameterization as the weak form, and further requires the error covariance σ ij to be zero.In other words, a strong form relation excludes the contemporaneous association between the series (see, for example, Geweke (1986)).
Chen and Lee (l990) developed a stepwise likelihood ratio tests procedure to identify the best dynamic specification.Their method with a minor modification, labeled as M1, is described in the Appendix using our notations.In essence, it is a backward-elimination approach that seeks to move up a hierarchy of models with increasingly simplified structures.A model is selected at the simplest level the procedure can move up to.The specifications they considered, however, do not include strong form relations.Below, we extend their procedure to cover this form.
Among the eight possible structures under the hypotheses we considered, it is clear that not all hypotheses between the most restrictive H ∧ and least restrictive H F are nested.However, there are subsequences such as {H F , (H U or H L ), H B } or its strong form counterparts which are nested.A noted feature of our stepwise testing procedure is that the null hypothesis is always nested within the alternative.
To facilitate the procedure design, observe that if a weak form relation is estimated under the more restrictive strong form specification, the resulting log determinant of the error covariance matrix (ln|Σ|) will be substantially affected.On the other hand, ln|Σ| will not be much affected if a strong form is estimated as weak form.Our limited experience from simulation studies suggested that this is the key to distinguish between the strong and weak forms.Our extended Chen and Lee's (l990) procedure begins with choosing a form (strong or weak) by testing three pairs of hypotheses (H L 0 , H L ), (H U 0 , H U ) and (H F 0 , H F ).If there are no significant difference between all of these strong/weak form pairs, the procedure would only search for strong form relations; otherwise it would search for weak form.
To simplify presentation, we use the notation (H j , H i ) to denote testing of the null hypothesis H j against the less restrictive alternative H i using the usual likelihood ratio test.The determinant of the error covariance matrix of the VARMA model obtained under the parametric constraints imposed by hypothesis H k is denoted by |Σ k |. (For simplicity of notation, this is understood to be the sample estimate).The result of the likelihood ratio test is represented by an 'x' or an 'o', as judged by comparing n * ln|Σ j |/|Σ i | to the chi-square value χ 2 #r (α).The degree of freedom # r equals to the number of parametric restrictions between the null and alternative hypotheses, α is the pre-specified significance level, and n is effective number of observations.An 'x' indicates H j is rejected when compared to H i at the α level; an 'o' is used if H j is not rejected.In other words, an 'x' indicates that ln|Σ j | is significantly different from ln|Σ i |; an 'o' indicates that it is not.Similar notation is used for testing of more than one pair of hypotheses.For example, in testing (H j , H i ) and (H l , H k ), the result, say, (x,0), indicates that H j is rejected in the first test and H l is not in the second.
(Weak form) Stage one: Use Chen and Lee's (l990) procedure as described in the Appendix.
It is clear that this is not the only way to design the stepwise hypotheses testing procedure.We have also tried some alternative designs and found procedure M2 to be simple and also performed well.Moreover, the approach is intuitively appealing in the sense that the partially nested structure has been employed in the testing procedure.The other designs we tried are reported in an earlier version of the paper and are not reported here for brevity.It is important to point out that the significance level chosen to accompany the various tests in the procedure plays an important role in the success of the procedure.In the simulation study below, we experiment with a wide range of significance levels for exploration purpose.The issue of choosing a better significance level will be addressed in subsequent sections.

Simulation study: Procedure M2 conducts at various fixed significance levels
Bivariate ARMA(1,1) models are simulated according to the eight types of dynamic relations we considered.The parameter values are chosen according to the estimation results for the price level and interest rate series as studied in Chen and Lee (l990) and their related working paper, which are given as follows: For H U , Φ = ) . For ) .
From each of these eight underlying models, we simulate 200 observations with the first 100 removed.This is the simulated time series, and M2 is applied to select the best specification.Specifically, we estimate eight different VARMA(1,1) models (with intercept term) to the simulated time series, each according to the restrictions imposed by the eight dynamic relations under consideration, and the corresponding sample ln|Σ| is recorded.The best model is then identified by M2, which is conducted at a pre-specified significance level being held fixed in all tests.In this experiment, we try a wide range of levels at 0.1, 0.5, 1, 2.5, 5, 10, 12.5 and 15% for exploration and to locate a better choice.The whole experiment is replicated 5000 times.In this experiment, a simulated series will be removed if its sample ln|Σ| under the H F and H ∧ structure are not being the smallest and largest value, respectively.The number of series removed ranges from about 1% to 5% out of the total number of series simulated from each of the eight underlying models.
For each underlying model i, denote the probability of selecting the correct specification as P i|i (or correctly select H i given H i is true); and that of incorrectly selecting other specification j as P j|i .The first panel of Table 2 reports the empirical probability of correct selection P i|i for each i using M2 at various significance levels.The detail tabulations of P j|i for each underlying model are available on request.It is useful to come up with an overall performance measure to summarize the effectiveness of M2.Here, we use a geometric mean measure defines as Pall = (Π 8 i=1 P i|i ) 1/8 to summarize the eight individual empirical probabilities of correct identification.Similarly, denote PW = (Π i P i|i ) 1/4 , i=F , U , L, B and PS = (Π i P i|i ) 1/4 , i=F 0 , U 0 , L 0 , ∧ as the overall performance measure for identifying the weak and strong form relations, respectively.These measures are reported in Table 3.The following observations can be made.Results in the first panel of Tables 2 and 3 show that a better choice of significance level is 2.5%, which gives an encouraging Pall = 0.823, or about 82% chance in selecting correctly using M2.The individual P i|i ranges from about 70% to 95%.When conducted at other significance levels, the performance at 1% level is fairly similar with a Pall = 0.804, but the performance starts to decline when using levels beyond the 1 to 2.5% range.
Two interesting questions then arise naturally.First, why is 2.5% a better choice of significance level in this setting?Second, how does procedure M2 compare to classic model selection criteria such as AIC and BIC?These issues are investigated in the next two sections.

Choosing Dynamic Relations with Information Criteria
By treating models with different parametric constraints as different VARMA models, the issue of identifying an appropriate dynamic relation can be addressed by model selection criteria.The IC approach provides a tool for a trade-off between the complexity of a model and the goodness-of-fit in the process of model selection.Specifically, denotes as the information criterion for model i, # i the number of estimated parameters, and n the effective number of observations.AIC corresponds to a fixed penalty of p n = 2, while BIC uses a heavier penalty term of p n = ln(n).These criteria are well studied and it is well known that BIC is consistent if the underlying true model is finite-dimensional; while AIC is motivated with a focus on prediction.
In general, many other choices of penalty term besides the popular AIC and BIC have been considered in the literature.Generally speaking, the choice of penalty term can be made according to the underlying assumptions as well as the specific objective of building model.Later on in our simulation studies, we try a wide range of p n weights other than AIC and BIC to explore the effect of other types of information criteria.Our purpose is exploratory and we do not focus on any particular alternative choice.
Typically, the # i in (3.1) refers to structural parameters in the model, and there is no zero restrictions placed in the error covariance matrix.Therefore, to account for strong form relations, we need to adjust the standard IC rule in (3.1).In this paper, we shall use a simple adjustment to the number of parameters in (3.1) as described below.To the best of our knowledge, this adjustment has not been studied in the literature.
Consider a k-dimensional VARMA(1,1) model where x t − Φx t−1 = t − Θ t−1 , and t has unrestricted covariance matrix Σ. Decompose Σ = LDL where L is a lower triangular matrix with unit diagonal, and D is a diagonal matrix.Transform x t by L −1 , and the original VARMA(1,1) can be re-written as t has covariance matrix D. In this form, the contemporaneous relation in the error term is now explicitly expressed as structural parameters in the model.Note that by restricting the lower diagonal elements of L −1 to be zero, the model becomes a VARMA(1,1) in strong form.Hence, we can regard the strong form version of a model as its weak form with restrictions, and the number of restrictions is equal to the number of σ ij =0, for j > i.In other words, when compared to the weak form, a strong form has k(k − 1)/2 less parameters to estimate.In the same manner, this discussion also applies to higher order VARMA(p, q) model.Based on this rationale, we modify the definition in (3.1) to accommodate both strong and weak form relations by using the effective number of parameters as follows: For strong form, # Σ = k(k − 1)/2 is the number of restricted zeroes.For weak form, # Σ = 0 for no restrictions.

Simulation study: IC approach with a variety of penalty weights
For comparison on the same basis, we apply the IC approach to the same simulated series in Section 2.1 when we study the performance of M2.In addition to AIC (p n = 2) and BIC (p n = ln(n)), we also use some lighter and heavier penalty weights for exploration purpose and to gain insight.In particular, we experiment with adjusting the BIC penalty upward and downward as ln(n) ± v, where v=0 through 3 with an increment of 0.5.For 100 observations simulated from VARMA(1,1) model, these weights cover a wide range of penalty weights beyond AIC and BIC.The empirical probabilities P i|i of correct model identification using IC with this range of penalty weights and the overall measures are summarized in the second panel of Tables 2 and 3, respectively.A number of interesting observations can be made.
The simulation results show that BIC, without adjustment, serves well in selecting the correct model from the eight underlying structures as a whole.This is not surprising as our simulation is based on finite-order VARMA model and BIC is known to be consistent in selecting the right model.We also note that the strong form adjustment (3.2) works well.In addition, results show that the highest Pall is about 84% by using BIC or BIC with a slight downward penalty adjustment (v = −0.5).Interestingly, this Pall is just about 2% higher than using M2 (with Pall = 0.823).Moreover, we note that not only the performance of M2 conducts at 1 to 2.5% level is close to using BIC; M2 at 12.5 to 15% level is close to AIC.Such close connection is striking as M2 and IC are apparently different approaches.Later on in Section 4, we shall examine in details why such correspondence occurs.
While BIC performs well as a whole, further insights are observed in regards to identifying strong and weak form relations.First, it is important to point out that each dynamic structure has its own preferred penalty weight.Consider the full model H F for instance.The second panel of Table 2 shows that the empirical probability of identifying this specification increases as v gets more negative.Intuitively, a larger downward adjustment means a lighter penalty on the number of parameters, and therefore favours selecting the full model.In fact, our simulations show a lighter penalty weight (relative to BIC) is preferred by other weak form relations as well.Specifically, the best adjustment in identifying H L , H U and H B are all non-positive, with v L = −1.5, v U = −1 and v B = 0 respectively.In sum, for all weak form relations collectively as a group, a lighter penalty weight of ln(n)−1 increases PW to 0.822 from 0.794 of BIC's ln(n) weight (see the second panel of Table 3).
On the other hand, for strong form model H ∧ , a large positive adjustment is preferred.Intuitively, a high penalty ln(n) + v imposes on the number of parameters favours the selection of the simplest model.In fact, the preferred adjustment for all strong form relations is non-negative, with v L 0 = 0.5, v U 0 = 0, v F 0 = 0.5 respectively.In sum, for all strong form relations collectively as a group, a heavier penalty weight of ln(n) + 0.5 increases PS slightly to 0.889 from 0.884 of using BIC's ln(n) weight.Even though the advantage over using BIC is minimal; further evidence can be found in the testing framework.
That each underlying specification has its own preferred penalty adjustment is also noted in the testing framework when M2 is used, but is now manifested as a preferred choice of significance level.Results in the first panel of Table 3 shows that a larger significance level is preferred in identifying weak form relations ( PW =0.782 using 5% improves slightly over PW =0.777 of using 2.5%); while a smaller level is preferred in identifying strong form relations ( PS =0.908 of using 0.5% improves over PS =0.871 of using 2.5%).Intuitively, a larger/smaller significance level makes the procedure easier/harder to reject the more restrictive null hypothesis and conclude the less restrictive alternative.This is consistent with imposing a lighter/heavier penalty on the number of parameters.Further discussion will be given in Section 4.
To summarize, our simulation results support BIC as a good overall criterion in identifying the eight dynamic relations of interest and no major advantage is seen by using a lighter or heavier penalty.However, if prior information about the form (strong or weak) of the dynamic relation is available, then penalty/significance level can be adjusted accordingly as described above to favour the identification of that type of relations, and this is achieved at the expense of the other type.
This section illustrates how the penalty weight p n and the number of parameters # i in the standard information criterion for VARMA model are subjected to possible adjustment.This occurs when prior information about the model type is available and when there is zero restrictions in Σ, respectively.Formal investigation of these adjustments is interesting future work.

The Connection between Information Criterion and Procedure M2
The connection between the penalty term p n in information criterion and significance level α in likelihood ratio test is well known (e.g., Smith and Spiegelhalter (1980) and Aitkin (1991)): For convenience, its rationale is outlined as follow.Consider two VARMA models that model 1 is nested within model 2. Using the minimum IC rule, if , model 1 will be selected.On the other hand, we can use likelihood ratio test to test model 1 as the null hypothesis against model 2 as the alternative.Model 1 is concluded if , where χ 2 #r (α) is the chi-square value with degree of freedom # r = # 2 − # 1 at the α level.Therefore, if α is chosen as a function of # r and p n , such that it solves p n # r = χ 2 #r (α), the likelihood ratio test would conclude the same as using minimum IC.
This connection implies that the minimum IC approach can be viewed as a decision procedure that involves pairwise comparisons of all candidate models under examination.See Pötscher (1991) and Stoica, Selén and Li (2004) for details and how the critical values in the testing framework are determined by the penalty terms of the criteria.For convenience, the key characteristic of this IC-induced all-possible pairwise testing procedure is summarized as follows.The idea is to find a specification H * which is 'close' to all other specifications H i with more parameters (i.e., n ); yet 'far away' from all other specifications H j with less parameters (i.e., n * ln|Σ ; and is the 'best' (i.e., n * ln|Σ k |/|Σ * | > 0) within its own class (# k = # * ).The 'closeness' (or significance) is being judged by the critical value p n # r , or its implied significance level α(# r , p n ) which is not fixed.In this testing framework, such adapting significance level α(# r , p n ) plays the same critical role as the penalty term p n in information criterion.Some of the likelihood ratio tests involved in the IC-induced testing procedure are not standard because the null and alternative hypotheses are not necessarily nested.
In accordance with the above interpretation, procedure M2 can be viewed as a backward-elimination procedure that simplifies the all-possible comparisons procedure implied by minimum IC approach.In other words, procedure M2 captures the essence of the IC-induced procedure by seeking the 'closest' simplified model (without significantly reducing the likelihood) in a sequential manner, starting from the full feedback model.Using the same idea, one can also design a forward-addition approach similar to M2 but begins the search with the simplest H ∧ model.The performance of M2 when used with adapting significance levels is expected to be similar to the IC approach.This is illustrated by simulations next.

Simulation study: Procedure M2 with adapting significance levels
We apply the testing procedure M2 with adapting significance level to the simulated series in Section 2.1 and 3.1, and results are compared to that of using fixed levels as well as the IC approach.
In our simulation set-up, it is easy to check that the adapting significance levels corresponds to AIC is α A =(0.1573, 0.1353, 0.1116, 0.0916) for 1 to 4 restrictions respectively; and the adapting levels corresponds to BIC is α B =(0.0321, 0.0101, 0.0032, 0.0010) when the effective sample size is 99.Note that for tests involve one or two parameter differences, BIC corresponds to a significance level of about 3% and 1% and AIC corresponds to about 16% and 14%.Recall that many of the hypotheses tests in M2 involve only 1 or 2 parameter differences.Hence, if a fixed significance level were to use in all tests, a level of 1 to 2.5% will generally perform like BIC; and a 12.5% to 15% level perform like AIC.This provides a justification to the findings we had in Section 2.1 and 3.1.Obviously, the choice of a better fixed significance level will depend on the sample size as well as the most common # r used and therefore may change in different applications.
Just as we use a variety of penalty weights p n for exploration in Section 3.1, we also try a variety of adapting significance levels here.We explore with two proportionally larger levels (2 and 4 times of α B ), as well as two proportionally smaller levels (1/2 and 1/4 times of α B ) to gain insight and assess the sensitivity of the model selected.The chance of correct model selection under each specification P i|i and the overall measure are summarized in the third panel of Tables 2 and 3, respectively.In comparison to the results of using IC approach, it clearly shows that M2 with adapting significance level α B performs similarly to BIC; and α A to AIC.Their chances of selecting over the eight underlying models follow very similar pattern.The highest Pall = 0.839 of using α B is essentially the same as 0.838 of using BIC.
For individual specification, the remark we made in Section 3.1 that each specification has its own preferred penalty adjustment is again noted here.In identifying the full feedback model H F , AIC or α A is preferred.AIC imposes a lighter penalty on the number of parameters, and therefore tends to favour model with more parameters.Similarly, using a larger significance level α A (relative to α B ) is easier to reject the null hypothesis and conclude the less restrictive alternative.Intuitively, this means the procedure is harder to move up the hierarchy of models with simplified structures, thus favouring the selection of the full model.On the contrary, a heavier penalty term or a smaller significance level favours the identification of the simplest model H ∧ .
In between these two extremes H ∧ and H F , broadly speaking, weak form relations generally prefer a relatively large significance level or light penalty term; and the reverse is true for strong form relations.Consider H L for example.The use of α B or BIC correctly identifies the true model about 78% of the time, but increases to about 85% when 4 times α B were used.For weak form relations collectively as a group, a relatively large significance level is preferable ( PW =0.835 of using 2 times α B improves over PW =0.801 of using α B ); while the reverse is true for strong form relations ( PS =0.893 of using 0.5 times α B improves over PS =0.879 of using α B ). Recall that procedure M2 begins with selecting a form by testing three strong and weak form pairs. Hence, if the true structure is in strong (weak) form, it is desirable to use a smaller (larger) significance level or a heavier (lighter) penalty weight as found in Section 3.1 because it would make the strong form null hypothesis harder (easier) to reject.

Empirical Analysis: Prices and Interest Rates
For empirical analysis, we revisit the price (P t ) and interest rate (I t ) dynamics studied in Chen and Lee (l990).They examined the full sample period as well as two sub-periods selected according to monetary standard.Using the various approaches described in this paper, Tables 4 and 5 report the model selection results for the US and UK series respectively.The conclusions we draw confirm their results and that all P ⇒ I relations found in US second and full periods as well as UK full period can be strengthened to P ⇒ 0 I.Further details regarding the performance of various procedures as well as the robustness of the structure selected are discussed below.
In Tables 4 and 5, Column 2 reports the ln|Σ i | when the model is estimated under the parametric restriction imposed by H i .To simplify the table, we do not list all likelihood ratio comparisons made in our procedure, but only present those with respect to the full model H F .In particular, n * ln(|Σ i |/|Σ F |) of a hypothesis H i from the base H F is reported in column 3. The model selected by M2 using level α A is marked by a ' ‡'.For the adapting level α B as well as two proportionally higher and lower levels (4, 2, 1, 0.5, 0.25 times of α B ), the model selected by k of these levels are marked by a ' k '.Details regarding the corresponding k levels is described in the text.Columns 4 and 5 report the AIC and BIC values, and the model selected accordingly is marked by a '*'.We also try different penalty adjustments to the BIC weight as p n (v) = ln(n) + v, where v ranges from -3 to 3 at an increment of 0.5.For those v that select differently from BIC, the model they selected is marked by an '•', and details regarding the corresponding v is described in the text.As expected, procedure M2 with levels α A and α B selected similarly as AIC and BIC respectively.For the US series, strong form dynamic relationship is supported over the full and both sub-periods.In sub-period 2, we note that the H L 0 structure identified is robust.This is the model chosen by M2 with all 6 different adapting significance levels, as well as AIC, BIC and BIC with all adjustments (except for v = −3 which chooses H F 0 , but a large negative adjustment is not preferred for strong form relation).Also, the likelihood ratio test statistics show an obvious jump after model H L 0 as the model structure attempts to simplify further.
In sub-period 1, we also select H ∧ as Chen and Lee (l990).First note that the largest likelihood ratio test statistic among all models is just 4.68 for H ∧ which is not 'far' from the full feedback model.Also, model H ∧ is chosen by BIC and M2 conducts at α B and all relatively small levels, and they are the preferred levels to identify strong form relations.Moreover, the model selected is robust using BIC and all its positive adjustments (all v ≥ 0).BIC with negative adjustments (−2.5 ≤ v ≤ −0.5) choose H L 0 , and when a even larger negative adjustment of v = −3 is used, H L is chosen.However, a negative adjustment or small penalty weight is not preferred in identifying strong form relations.Our intuition is that the use of a smaller penalty or larger significance level makes the procedure harder to move up the hierarchy of models.Here, the procedure basically stops at H L 0 before reaching H ∧ .In this sense, we regard H L 0 as a second choice to H ∧ .This is noteworthy as H L 0 is the structure selected in sub-period 2.
For the full period, we choose H L 0 over H F 0 since the use of AIC or α A is not preferred in identifying H F 0 .Also, BIC with all adjustments (−3 ≤ v ≤ 3) choose the same model H L 0 .The choice of H L 0 structure for the full period is reasonable, as judged by the structures identified over sub-period 1 and 2, which are H ∧ (with H L 0 as a second choice) and H L 0 respectively.To summarize, we find that US price and interest rate do not have concurrent association (or in strong form) over the full period as well as the two sub-periods chosen according to monetary standard.Interest rate is related linearly to the previous prices but not the other way around, except for sub-period 1 when the relationship is simply independent.
Turning to the UK prices and interest rates, we also find (see Chen and Lee (l990)) that the strong form relation is appropriate to sub-period 2 but not for sub-period 1.For sub-period 1, we choose the full feedback model H F because it is selected by AIC, BIC (and with all negative v), and M2 with larger significance levels; these are the preferred criteria to identify weak form relation.The BIC with positive adjustments (v ≥ 0.5) and M2 with smaller level choose H L .Our intuition is that under such criteria, the procedure is easier to move up the hierarchy of models and in this case reaching H L from H F .For sub-period 2, the independent structure H ∧ is chosen, with H L as a second choice.The H L is chosen when a larger significance level or lighter penalty (v ≤ −1.5) relative to BIC is used.Again, these criteria make the procedure harder to move up, and in this case it stops at H L before reaching H ∧ .For the full period, the results favour H L 0 using BIC and M2 with various significance levels except 4α B .A second choice would be H L , which is chosen by using the larger level and BIC with large negative adjustments (v ≤ −2.5).
To summarize, we confirm that contrary to US, price and interest rate in UK are independent in sub-period 2, whereas a full feedback relation is detected in sub-period 1.The selection of a strong from H L 0 structure in the full period while having a weak form H L structure as second choice reflects the difficulty of compromising between the vastly different behaviours over the two sub-periods.In fact, the preference of having strong form relation over weak form is not overwhelming.Finally, by exploring with different adapting significance levels and penalty adjustments, we find that H L is a common second choice for both sub-periods and full period.This consistency is notable in view of the vastly different dynamic relations selected over different sub-periods.
As a conclusion, this paper sheds new lights on a stepwise hypotheses testing procedure and information criteria in choosing an appropriate dynamic relation between time series.The simulation studies and empirical applications illustrate the similarity between the IC approach and M2 conducts at adapting significance levels.Moreover, we illustrate how the use of BIC with penalty adjustments or M2 with a variety of adapting levels helps assess the robustness and sensitivity of the model selected, and develop further insights regarding the underlying structures.The analysis performed in this paper will be useful in studying other similar stepwise hypotheses testing procedures.

Table 1 :
Parametric constraints and the corresponding hypotheses which are sufficient Initial step: Test three pairs (H F 0 , H F ), (H U 0 , H U ), and (H L 0 , H L ).If the result is (o,o,o), go to (Strong form) Stage one to test for strong form relations; otherwise, go to (Weak form) Stage one to test for weak form relations.
(Strong form) Stage one: Test two pairs (H U 0 o): go to (Strong form) Stage two and check if further parametric constraints can be imposed.
(Strong form) Stage two: Test two pairs (H ∧ ,

Table 2 :
Simulation studies: Empirical percentage of correctly identifying the true underlying specification using three different procedures.

Table 3 :
Overall measure of the empirical percentage of correctly identifying the true underlying specification using three different procedures.

Table 4 :
US price (P t ) and interest Rate (I t ): Model selection using M2 and IC approach In column 3, the notations ‡ and k denote the models chosen by M2 at significance levels α A , α B , and some proportionally larger and smaller levels of α B .The notation * in column 4-5 denotes the models chosen by AIC and BIC; and notation • denotes the model selected by those adjustments v that is not the same as BIC (the specific v are given in the text).For full period and sub-period 2, all variety of α B chose the same model.For sub-period 1, all variety of α B chose H∧ except for 4 and 2 times of α B , which chose H L 0 .

Table 5 :
UK price (P t ) and interest rate (I t ): Model selection using M2 and IC Approach In column 3, the notations ‡ and k denote the models chosen by M2 at significance levels α A , α B , and some proportionally larger and smaller levels of α B .The notation * in columns 4-5 denotes the models chosen by AIC and BIC; and the notation • denotes the model selected by those adjustments v that is not the same as BIC (the specific v is given in the text).For full period and sub-period 2, all variety of α B chose the same model except for 4 times of α B , which chose H L .For sub-period 1, all variety of α B chose H F except for 1/4 and 1/2 times of α B , which chose H L .