A Comparison between Bayesian and Frequentist methods in Financial Volatility with Applications to Foreign Exchange Rates

In this paper, a comparison is provided for volatility estimation in Bayesian and frequentist settings. We compare the predictive performance of these two approaches under the generalized autoregressive conditional heteroscedasticity (GARCH) model. Our results indicate that the frequentist estimation provides better predictive potential than the Bayesian approach. The finding is contrary to some of the work in this line of research. To illustrate our finding, we used the six major foreign exchange rate datasets.


Introduction
In the last few decades, volatility in financial time series has been of a key interest to both academics and practitioners as uncertainty is at the heart of financial decisions. Volatility plays a critical role in pricing derivatives, calculating measures of risk, and hedging. Since the gold standard abandonment in 1971, asset prices and stock markets began to broadly change and searching for predictive volatility modeling has been one of the major areas in time series analysis. Early work on volatility includes the ARCH (autoregressive conditional heteroscedasticity) of Engle (1982) and the GARCH (generalized autoregressive conditional heteroscedasticity) of Bollerslev (1986), which have become the benchmark models for estimating the volatility. ARCH/GARCH and their extended implementations have been proven to be a successful tool in modeling the conditional variance of financial time series data. A few examples are as follow. Wang et al. (2010) investigated volatility on Shanghai Stock Exchange with high-frequency intraday data. Huang et al. (2012) investigated the performance of GARCH models in option pricing. More recently, Jahufer (2015) has used GARCH models to examine Sri Lanka stock market using non-parametric specification test.
The traditional frequentist approach uses the (conditional) maximum likelihood estimation (MLE) technique to estimate the parameters in the GARCH or GARCH-type models. We briefly describe this method in the next section and one can refer to Fan and Yao (2005) for more details. An-other technique that has gained momentum in recent years is the Bayesian approach, which takes into account prior information to estimate the posterior distribution. Nakatsuma (1999) developed three Bayesian methods: Markov chain Monte Carlo, Laplace approximation and quadrature formula to estimate the parameters of the ARMA-GARCH model. Bauwens (1998) explained how a Gibbs sampler can be implemented to perform the inferences on Bayesian GARCH models. Vrontos (2012) proposed a full Bayesian analysis of GARCH and Exponential-GARCH (EGARCH) model on parameter estimation, model selection, and volatility prediction.
The Bayesian method has been an alternative way to model datasets in many different fields. The comparison of GARCH models under frequentist and Bayesian has garnered some attention in research. Nakatsuma (1996) conducted a study which focuses on this comparison. Based on a small sample Monte Carlo experiment, they found that the Bayesian approach performs better than the frequentist approach when comparing the mean square errors of the posterior mean in the ARMA-GARCH models. Hoogerheide (2012) examined density prediction of stock index returns us-ing GARCH models under both frequentist and Bayesian estimation. They showed that there is no significant difference between the qualities of whole density forecast, while Bayesian estimation exhibits better left-tail fore-cast accuracy. More recently, Sigauke (2016) modeled the Johannesburg Stock Exchange (JSE) using the Bayesian and frequentist approaches and concluded the Bayesian Autoregressive Moving Average-Generalized Autoregressive Conditional Heteroskedasticity (BARMA-GARCH-t) provided a better fit for the data than the standard ARMA-GARCH-t model. In a more general setting, studies have been conducted to compare the Bayesian and frequentist methods. Wagenmakers et al. (2008) advocate the use of Bayesian inference in the field of psychology. Samaniego (2010) gives the comparison of the Bayesian and frequentist approaches to estimation. Albers et al. (2018) outline the ramifications of using frequentist and Bayesian analyses. In our work, we show that the traditional frequentist approach renders better predictive performance than the Bayesian approach.
The rest of the paper is organized as follows. Section 2 introduces the GARCH model along with the maximum likelihood estimation and Bayesian methodologies. Section 3 describes the results and Section 4 provides the discussion.

Methods
Let { : t ∈ Z} be a stochastic process that is adapted to filtration { : ∈ Z}, where = σ({ ∶ s ≤ t}) and σ({ }) is a sigma-field generated by {xs}. Following Geweke (1993), we assume where ϵ are innovations and ϵ |Ft-1 either follow a standard normal distribution or a tdistribution with v degrees of freedom. Although the mean, µ, can be time dependent in practice and modeled separately, we fix this value to be zero. In this work, we are primarily concerned with σt, the volatility, in time series economics. A plethora of works have been devoted to modeling this latent variable in the last thirty years and the work is still ongoing. As mentioned previously, the pioneer work on volatility is the ARCH/GARCH model of Engle (1982) and Bollerslev (1986). The GARCH model with order (1,1) (or GARCH(1,1)) assumes Our main focus is based on this GARCH(1,1) by examining the predictability of σ 2 under two cases for wt: (1) fixed wt = v/(v − 2) and (2) wt ∼Inv-Gam(v/2, v/2). The details are given in the following subsections. In practice, if yt is a stock price then the log-return series xt is defined as This measures the relative changes in the stock price. The above form can also be written as： log(y t ) − log(y t−1 ) = log (1 + y t − y t−1 y t−1 ) ≈ y t − y t−1 y t−1 Many financial studies use the return series xt instead of price series yt for many benefits. First, the returns are scale-free. Second, they have more attractive statistical properties than the price series and third, they are time-additive. The reader can refer to Tsay (2010) for more details and elaboration.

Frequentist GARCH Estimation
In the traditional frequentist statistics, the parameters are fixed unknown constants. Under this framework, we fix wt = v/(v-2) so that the equation (1) becomes where ϵ follow N (0, 1) or tv. Under a standard normal distribution, the likelihood function of x = (x1, ..., xT ) T is defined as and under a t-distribution with v degrees of freedom, the likelihood function is defined as The maximum likelihood (ML) estimators are the maximizers of the functions above. Note that σt 2 is a function of the unknown parameters α0, α1,and β and it depends on the past squared return series and the past squared volatility σt 2 . In addition, the likelihood is conditioned on (x 1 2 , x 2 2 , . . . , x p 2 ) and (σ 1 2 , σ 2 2 , . . . , σ p 2 ). The reader is referred to Fan and Yao (2005) for more details. In our work, we used the nonlinear optimization under the augmented Lagrange method which is implemented in the R package solvnp of Ghalanos (2011) in rugarch of Ghalanos (2016).

Bayesian GARCH Estimation
To describe the Bayesian framework, we first write by following Geweke (1993). Let = ( 1 , . . . , )' , and α = (α 0 , α 1 )' and we regroup the unknown parameters as = ( , , )'. Upon defining the T × T diagonal matrix: the likelihood function of (θ, w), under the normal distribution, is defined as: The parameters (θ, w) are random variables which are characterized by a prior density, denoted by p(θ, w). Inferences are made based on the posterior density defined by p(θ, w|x) = L(θ,w|x)p(θ,w) After observing the data, the posterior distribution gives a probabilistic description of the knowledge about the model parameters. Following Ardia (2010), we take the truncated normal prior distributions for the GARCH parameters α and β is the d-dimensional normal density, µ. and Σ. are the hyperparameters, and I[·] is the indicator function. Assuming that wt are inde-pendent and identically distributed as the inverse gamma with (v/2, v/2), the prior distribution of the vector w given v is The prior distribution of v is chosen as the translated exponential with λ > 0 and δ ≥ 2: . The mass of this prior is mostly concentrated near δ when λ is large and hence, the degree of freedom can be constrained in this manner. Deschamp (2006) points out that this prior density is useful in two ways. Bounding the degrees of freedom away from two may potentially be important from a numerical perspective to avoid a rapid divergence of the conditional variance. Next, the normality of the errors can be estimated while allowing the prior to remain reasonably constrained, which may allow for better convergence of the sampler.
Assuming the prior independence among the parameters, the joint prior distribution is then ( , ) = ( ) ( ) ( | ) ( ). (9) There is no closed form for the joint posterior distribution in (8) and no conjugate prior exists for this joint posterior density. Hence, we resort to the Markov chain Monte Carlo (MCMC) method for simulation to approximate the density of the posterior distribution. The MCMC sampling technique was initially introduced by Metropolis (1953) and was later generalized by Hastings (1970). The basic idea of this MCMC sampling method is based on the creation of a Markov chain ( (0) , (0) ), . . . , ( ( ) , ( ) ) in the parameter space. Under some regularity conditions, as k goes to infinity, the asymptotic distribution of (θ(k), w(k)) will be (8). To implement the MCMC sampling technique, we used the Metropolis-Hastings (MH) algorithm. The details can be found in Chib (1995). This algorithm is used to update the GARCH parameters in blocks with one block for α and one block for β, while the parameter for degrees of freedom is sampled through an optimized rejection method from a translated exponential density defined earlier. This process is incorporated in the R package bayesGARCH for its MCMC sampler which uses the approach of Ardia (2008).
For ( ), we specify two cases for the variance-covariance matrix : Similarly, the variance for p(β) was set to be 1000 and 0.01. Both prior means μ α and μ β were set to 0.

Model Assessment
In the frequentist setting, we assumed xt = σt with having the mean 0 and standard deviation 1. Therefore, ( 2 ) = [ ( 2 | −1 )] = [ ( 2 2 | −1 )] = 2 , where, in practice, Ft denotes the past financial information up to time t. This is also true under the Bayesian setting because ϵ t and w t are independent, and E(wt) = −2 . Using this fact and the fact that the true squared volatility σ t 2 is unknown when we deal with the actual datasets, we have used the squared series as a proxy for the squared volatility. Hence, we measure the mean square error (MSE) and the mean absolute deviance error (MADE) by where a t = |̂2 − 2 |. As another measure of accuracy, we've used the directional accuracy (DA), which is defined by: ℎ The DA gives the average direction of the forecast volatility by measuring the correctness of the turning point forecasts.
To test for significance in forecasting accuracy, we carried out the Diebold and Mariano (DM) test proposed by Diebold and Mariano (1995). The underlying hypotheses associated with this test are where z row and z column are the squared deviance a t 2 (and absolute deviance a t ) from the models in the row and the column, respectively. Hence, the null hypothesis indicates the "equal accuracy" between the two approaches. In large samples, the DM statistic is the spectral density of the loss differential at frequency 0, and ( ) = [( − μ)( − − μ)] is the auto-covariance function at τ.

Results
In this section, we compare the predictive potentials of the GARCH(1,1) model under the frequentist and Bayesian methods using six daily exchange rates. We consider the daily exchange rates of six major currencies against US dollars. These currencies are Euro (EUR), Japanese yen (JPY), Pound sterling (GBP), Australian dollar (AUD), Swiss franc (CHF), and Canadian dollar (CAD). We analyze the most traded pairs of currencies, commonly called the Majors. The Majors are EUR/USD, GBP/USD, USD/JPY, AUD/USD, USD/CAD, and USD/CHF. Except for the EUR/USD pair, Several numerical summaries for the datasets are given in Table 1. It is noticeable that the skewness and kurtosis of AUD/USD are very high. This indicates that the distribution of the return series may be right-skewed and have fat tails. The fat-tail can also be noted from other datasets except for EUR/USD.
We've conducted some preliminary analyses of the datasets. Table 2 shows the results from Ljung-Box test based on squared return series. Except for EUR/USD when Q(1) = 2.103, the results indicate significant serial dependence. The time series plots of the return series are shown in Figure 1. It can be seen that volatility clustering are present in the datasets. Also, the variability increases in the USD/CAD dataset. These may indicate that the datasets may not be stationary. Figure 2 shows the autocorrelation function (ACF) for the squared return series for each dataset. It is evident that the squared series seems to be serially correlated indicating a possible dependence at a higher moment.
For each dataset, the in-sample data consist of the first 70% of the dataset to fit the model and the out-of-sample data contain the last 30% to test the model. In practice, in-sample measures do not mean much since we are interested in predictive nature of the model. Table 3 gives the comparison based on these three measures for both in-sample and out-of-sample periods. Under the out-of-sample measures, the best measure that corresponds to the method is printed in bold for each dataset. The results indicate the frequentist approaches are generally better than the Bayesian approaches.

4.Conclusion
Our main interest in this study was to compare the frequentist and Bayesian estimation approaches using the GARCH(1,1) as a basis model. In contrary to the existing literature, we have found that the frequentist method pro-vides better predictive potential than the Bayesian method. We considered six foreign exchange rate datasets. We computed MSE, MADE, and DA to compare different model outcomes and the outof-samples indicate that the frequentist performed better. We also carried out DM test to observe the significance in these results. We have observed that it remains true, in general, that the frequentist provide more accurate predictive potential than the Bayesian approach. Finally, the current study is limited to the GARCH(1,1) as the basis model; however, one can use other basis model such as Exponential GARCH or Integrated GARCH models as well.