Parameter Estimation of a Seasonal Poisson INAR(1) Model with Diﬀerent Monthly Means

Analysing seasonality in count time series is an essential application of statistics to predict phe-nomena in diﬀerent ﬁelds like economics, agriculture, healthcare, environment, and climatic change. However, the information in the existing literature is scarce regarding the performances of relevant statistical models. This study provides the Yule-Walker (Y-W), Conditional Least Squares (CLS), and Maximum Likelihood Estimation (MLE) for First-order Non-negative Integer-valued Autoregressive, INAR(1), process with Poisson innovations with diﬀerent monthly means. The performance of Y-W, CLS, and MLE are assessed by the Monte Carlo simulation method. The performance of this model is compared with another seasonal INAR(1) model by reproducing the monthly number of rainy days in the Blackwater River watershed located in coastal Virginia. Two forecast-coherent methods in terms of mode and probability function are applied to make predictions. The models’ performances are assessed using the Root Mean Square Error and Index of Agreement criteria. The results reveal the similar performance of Y-W, CLS, and MLE for estimating the parameters of data sets with larger sample size and values of α close to unite root. Moreover, the results indicate that INAR(1) with diﬀerent monthly Poisson innovations is more appropriate for modelling and predicting seasonal count time series.


Introduction
Time series data with seasonal characteristics can be found in a variety of fields, such as economic, healthcare, environment, and climate change. Many of them consist of counts, such as the number of patients visit a hospital, the number of road accidents, the number of polio cases, etc. The number of rainy days is one example of the count time series with seasonal characteristics that have a profound influence on flooding. Flooding has affected the economy, agriculture, tourism, and our daily life in diverse ways. Several reasons affect the magnitude of flooding, such as sea-level rise (Wang et al., 2017), precipitation characteristics (Bracken , née Bull), and seasonal variability, and extreme storms (Niroomandi et al., 2018). Changes in precipitation patterns can be associated with severe environmental events. For instance, the reduction in the number of rainy days can result in drought, while the upward trend in the frequency of days with precipitation can increase the runoff coefficient and the risk of flooding events. The critical impact of the aforementioned extreme events on human life emphasizes the importance of modelling and predicting the number of rainy days.
One of the most popular models for discrete time series data is First-order Non-negative Integer-valued Autoregressive, INAR(1), defined as (Alosh and Alzaid, 2008): where {X t : t = 0, ±1, ±2, · · · } is a sequence of non-negative integer-valued random variables, α ∈ [0, 1], {ε t } is a sequence of non-negative integer-valued random variables having mean µ and finite variance σ 2 , and • is the binomial thinning operator defined as (Steutel and van Harn, 1979): where {Y i } is a sequence of independent and identically distributed (i.i.d) random variables independent of {X} with a Bernoulli distribution B (1, α), such that P (Y i = 1) = 1−P (Y i = 0) = α. The properties of the binomial thinning operator have been investigated by Weiß (2008). Poisson INAR(1) has also been introduced by Alosh and Alzaid (2008). Its theoretical properties have been discussed by several researchers (Alosh and Alzaid, 2008;Freeland and McCabe, 2004b;McKenzie, 1988). Several methods for estimating parameters of Poisson INAR(1) have been extensively discussed: Alosh and Alzaid (2008) introduced the Y-W, CLS, and MLE methods for estimating the parameters of the INAR(1) model, and Jung et al. (2005) examined the performance of these estimators under equidispersion and overdispersion condition; Silva and Oliveira (2004) used Whittle criterion for estimating parameters of the INAR process; the higher-order moments and cumulate of INAR process have been discussed in many studies Oliveira, 2004, 2005); Keith Freeland and McCabe (2005) derived a corrected asymptotic variance matrix of the Conditional Least Squares (CLS) estimators; Enciso-Mora et al. (2009) discussed the problem of model selection for count time series;  recommended Maximum Likelihood Estimation (MLE) rather than CLS and Yule-Walker (Y-W) estimators, and Bourguignon and Vasconcellos (2015) introduced parameter's bias-reduced estimator.
In addition, several types of INAR(1) process have been introduced in literature: Ghodsi et al. (2012) proposed a Spatial INAR model (SINAR (1,1)); Weiß (2012) investigated the properties of fully observed INAR(1) process; Pedeli et al. (2015) proposed Saddlepoint approximation for the estimation of higher-order INAR models with Poisson and negative binomial innovations; Mohammadpour et al. (2018) proposed an INAR model with Poisson-Lindley marginal distribution; and Shirozhan and Mohammadpour (2018) discussed an INAR model based on the mixing Pegram and dependent Bernoulli thinning operator. The Bayesian approach has also been used to analyse count data (Bisaglia and Canale, 2016;McCabe and Martin, 2005;Silva et al., 2009).
However, few studies of seasonal count time series have been reported in the literature. Higuchi (1999) implemented quasi-periodic oscillation models for analyzing seasonal count time series. Parametric and semiparametric methods for testing seasonality and trend in count time series data were proposed by Hunsberger et al. (2002). Zhu and Joe (2006) extended the INAR process to allow non-stationarity from trends and covariates. Also, two random coefficient models for describing seasonality in INAR (p) models were presented by Zheng et al. (2006). Kim and Park (2006) proposed a signed binomial thinning operator for modelling seasonal INAR process, and Zhang et al. (2010) developed an extension of this new operator called signed generalized power series thinning operator for handling non-stationary integer-valued time series. Tian et al. (2020) discussed the limitation of using binomial thinning operator for modelling seasonal count time series with over-dispersion and proposed a new seasonal geometric INAR process using the negative binomial thinning operator with seasonal periods.
Further, two different seasonal INAR processes have been introduced by Moriña et al. (2011) and Bourguignon et al. (2016). The former model was established based on Poisson innovations with different monthly means, while the latter model considered data as a first-order seasonal process. The comparison between MLE, Y-W, and CLS estimators of the first-order seasonal method was provided by Bourguignon et al. (2016). Moriña et al. (2011) estimated the parameters of INAR (2) with different monthly Poisson innovations using MLE and provided several methods for forecasting seasonal integer-value time series. However, the MLE, Y-W, and CLS estimators of INAR(1) with different monthly Poisson innovations model have not been introduced in the literature. Moreover, the aforementioned models have not been conducted on the same data to identify which model is more appropriate for describing the seasonal count time series.
Furthermore, the prediction of the seasonal count data has rarely been discussed in the literature. For count time series, an appropriate prediction model should be forecast-coherent and produce non-negative integer values (Bisaglia and Gerolimetto, 2015). To satisfy this criterion, Freeland and McCabe (2004a) used a conditional median, whereas, Vazifedan and Shitan (2012) considered mode for producing coherent predictors.  produced predictors by considering the INAR process as a Markov Chain. Moriña et al. (2011) proposed short-term and long-term methods for forecasting seasonal count time series.
The objectives of this study were to provide Y-W, CLS, and MLE for INAR(1) with different monthly Poisson innovations and to evaluate their performances through a simulation study. This paper is structured as follows. Two methods for modelling seasonal INAR(1) model are identified in Section 2. Further, Section 2 introduces Y-W, CLS, and MLE for the first model. Section 3 provides simulation results and discusses the performances of estimation methods. Moreover, Section 3 compares the performance of proposed models by reproducing the monthly number of rainy days in the Blackwater River watershed located in east coastal Virginia. Remarks and conclusions are provided in Section 4. All statistical modelling, simulation, estimating, and forecasting was performed using R (R Core Team, 2018).

The Seasonal INAR Models
In the INAR(1) model identified by Equation (1), when {ε t } is a sequence of Poisson errors with mean and variance λ, then {X t } has a Poisson distribution with parameter λ/(1 − α). Here two models for assessing seasonal INAR(1) process with Poisson errors are discussed. Model.1 has been defined by Moriña et al. (2011) as, where error terms {ε t } are distributed by several Poisson distributions with parameter λ t , where s is the observed periodicity such that λ t = λ t+s , and, The conditional probability function of a seasonal INAR (2) model has been introduced by Moriña et al. (2011). The conditional probability function of Model.1 was computed similarly as, The Autocorrelation Function (ACF) of the Poisson INAR(1) process is given as ρ x (k) = α k , k = 0, 1, · · · (McKenzie, 1988). Another seasonal Poisson INAR(1) process, Model.2, has been introduced by Bourguignon et al. (2016) as, where s represents seasonal period and {ε t } has a Poisson Distribution with parameter λ, such that, and, The ACF of this model is given as, The main difference between Model.1 and Model.2 is that in the former one, observation at time t is correlated to the observation at time t − 1 and elements which entered the system in the interval (t − 1, t] with mean λ t , while in the latter one, observation at time t is correlated to observation at time t − s and elements which entered the system in the interval (t − s, t] with a fixed mean of λ.

Estimation of Model Parameters
The Maximum Likelihood estimators of Model.1 have been introduced by Moriña et al. (2011). The Y-W, CLS, and MLE of Model.2 and their asymptotic distributions have been discussed by Bourguignon et al. (2016). Here, we introduce the Y-W and CLS estimators for Model.1.

The Yule-Walker Estimators
The Y-W estimator of the INAR(1) process has been introduced by Alosh and Alzaid (2008). The Y-W estimators of Model.2 have been provided as (Bourguignon et al., 2016): Here, the Y-W estimators of Model.1 are proposed in the following manner. The time indicator in Equation (3) can be written as t = i + sτ , where s is the observed periodicity, (i = 1, 2, · · · , s), (τ = 0, 1, · · · , T), and T = (n − 1)/s , and · is the floor function. Then Equation (3) can be represented as X i+sτ = α • X i+sτ −1 + ε i+sτ . By taking expectation, we have, where M i = min (length (X i+sτ ) , length (X i+sτ −1 )) and α can be estimated using the sample auto-covariance function. The covariance at lag k is given as (Alosh and Alzaid, 2008): Then, the Y-W estimator of α is obtained as,

Conditional Least Squares Estimators
The CLS estimators of parameters Model.2 have been proved as (Bourguignon et al., 2016), Here, the CLS estimators of Model.1 is proposed by considering E (X t | X t−1 ) = αX t−1 + λ t and minimizing L = n t=2 (X t − α.X t−1 − λ t ) 2 . Hence, the CLS of parameter λ is obtained as, Similarly, CLS of α can be obtained as, where from Equation (9) we can conclude that λ t = M i λ i n−1 , then,

Maximum Likelihood Estimators
Although several methods for estimating parameters of the INAR(1) process have been introduced, conditional MLE has been reported to produce smaller biases and lower RMSE (Bourguignon et al., 2016;. MLE of parameters (α, λ t ) are those values that maximize the likelihood functions of Model.1, which is defined as where , which is defined by Equation (4). And MLE of parameters (α, λ) can be computed by maximizing the likelihood function of Model.2 (Bourguignon et al., 2016), where P (α, λ) (X t |X t−s ) is the conditional probability of X t given X t−s defined by Equation (6). Estimators' Standard Errors (SE) were obtained using the inverse Hessian matrix and were used to calculate 95% Confidence Intervals (CI) for the parameter estimators. The optimization of Equation (11) and Equation (12) and Hessian matrix were implemented by using numerical methods in R (R Core Team, 2018).

Simulation
This section investigates the results of Monte Carlo simulation in this study. The properties of different estimators for the parameters of Model.2 have been discussed in detail by Bourguignon et al. (2016). Here, the performance of Y-W, CLS, and MLE for parameters of Model.1 is carried out for different finite sample sizes n = (120, 360, 600), with s = 12, α = (0.2, 0.5, 0.8), and two sets of monthly averages {λ s = 8.5, 8.0, 7.0, 6.0, 5.5, 4.5, 5.0, 6.5, 7.0, 7.8, 8.5, 9} and {λ s = 12. 5, 12, 11.0, 10.0, 9.5, 8.5, 9.0, 10.5, 11, 11.8, 12.5, 13}. The empirical results are presented by Table 1 to 3 and Figure 1 to 2. The obtained biases and RMSE were calculated over 1000 replications. The performance of Y-W, CLS, and MLE for parameters of Model.2 has been discussed (Bourguignon et al., 2016). According to Bourguignon et al. (2016), Y-W and CLS estimators were susceptible to a process that was closer to non-stationary such that the bias and RMSE increased by increasing α. They concluded that the performance of MLE was profoundly better than Y-W and CLS estimators (Bourguignon et al., 2016).
Though, the simulation results of this study revealed that Y-W, CLS, and MLE had similar performances for estimating parameters of Model.1 for simulated data with larger sample size      and α closer to the unite root ( Figure 1). Moreover, despite the sample size, all estimators produced comparable estimators of α close to the unit root (Table 1 to 3). For α = (0.2, 0.5), the performance of Y-W and CLS estimators improved when the sample size decreased. Also, when α = 0.5, the Y-W and CLS were preferred over MLE for estimating parameters of small-sized simulated data (Figure 1). Although MLE produced smaller RMSE by increasing the sample size, the RMSE obtained by Y-W, CLS, and MLE were similar for small values of α and small-sized simulated data ( Figure 2).
In general, the RMSE and standard error (SE) of estimators were almost equal, which is evidence of small bias values generated by the estimation methods proposed in this study (Figure 2). Going by RMSE and bias values, Y-W, CLS, and MLE had comparable performances for simulated datasets with larger sample sizes and α values close to the unite root. Still, since MLE uses the full information of the distribution, it was more robust to the changes in the values of α and n.

Real Data
The Blackwater River is located near south-eastern Virginia. It starts to flow from its source located in Prince George Country, the city of Petersburg, Virginia, for about 105 miles (170 km) through the Inner Coastal Plain region of Virginia. The Blackwater River joins the Nottoway River to form the Chowan River, which empties into Albemarle Sound. Since the Blackwater River drains an extensive area of southern Virginia, intense flooding occasionally happens in several regions once heavy rainfall occurs all over the area. Important counties that have been affected by flooding are Zuni and Franklin city. As the river's channel is limited to those areas, flood stages can occur quickly.
In this study, the precipitation data of four stations that are closed to the Blackwater River were used, namely: Holland, Hopewell, Stony Creek, and Suffolk Lake Kilby. Here a rainy day was defined as a day with precipitation more than 0 mm. The monthly numbers of rainy days were computed from daily data collected from the National Oceanic and Atmospheric Administration website (NOAA, 2018).
In this section, both Model.1 and Model.2 are conducted on the monthly number of rainy days in Blackwater River Franklin, VA, from January 1980 to December 2014. Figure 3 illustrates the time series plot of data and boxplot of rainy days per month, and Table 4 represents the mean, standard deviation (s.d.), and five-number summary of data (Shitan and Vazifedan, 2011). The sample ACF and PACF of data were close to the theoretical ACF and PACF, such that ACF values decayed slowly, while PACF values were not significantly different from zero after lag one. Moreover, ACF and Partial ACF (PACF) plots of residuals indicate that a first-order seasonal model is a valid model for this data, and higher-order correlation does not exist between observations (Figure 4).  Maximum likelihood estimators of parameters of Model.1 and Model.2 were obtained using Equation (11) and Equation (12) ( Table 5). The estimated covariance matrix of parameters of Model.1 and Model.2, using the inverse Hessian matrix, was calculated asΣ 1 andΣ 2 , respectively, Diagonal elements of the above matrices, i.e., SE, were used to compute 95% CI. Calculated CIs indicated that all the parameters are significantly different from zero, except α in Model.2. Moreover, Table 5 shows that λ t : t = 1, 2, · · · , 12 is a good estimator of the average number of rainy days per months, whileλ is a good estimator of the overall average of the number of rainy days.

Model Prediction
In this paper, two different approaches were considered for forecasting the monthly number of precipitation days: using the mode of simulated point predictors (Vazifedan and Shitan, 2012), and using conditional probability function . In the first approach, 1000 one-step-ahead forecast values were generated for X t given X t−1 using Equation (3) and Equation (5), and the mode of them was taken as the final predictor. This process was repeated to obtain the h-step ahead predicted value. In the second approach, time series model was considered as a Markov Chain such that at time t, the probabilities of all values that can be observed at time t + 1 were calculated using Equation (4) and Equation (6). Then the value with maximum probability was considered as the predicted value. Note that the number of rainy days in a month is a finite number between 0 and 28-31. However, it has been suggested that even when the number of possible values is infinite, there is a large positive integer M such that the probability of observing more than M values is minimal .
In this research, forecasted values of 12 steps ahead, starting from Jan 2014, were calculated by implementing the above approaches in both Model.1 and Model.2. Several measures have been introduced to assess the suitability of competing models. In this study, Index of Agreement (I.A.) (Jorquera et al., 1998) and RMSE were used to evaluate models' performance and accuracy. The model with an I.A. value closer to one and a smaller RMSE is the preferred model, where: wherex t is the average of observations andx t s are predictors. Table 6 represents the observed values, predicted values, calculated I.A., and RMSE. In general, Model.1 was more appropriate for modelling rainy days since it produced smaller RMSE and generated I.A. values closer to 1. According to the results, the RMSE values were calculated as 10.54 and 12.96 for Model.1, whereas 21.73 and 20.25 for Model.2 using mode and probability function, respectively. Further, the I.A. values for Model.1 were reported as 0.55 and 0.29, while the I.A. for Model.2 were 0.10 and 0.16 using mode and probability function, respectively. By comparing RMSE and I.A. values, it can be concluded that modelling and forecasting count time series by implementing Model.1 and using mode is more appropriate amongst the models considered in this paper.
Furthermore, Figure 5 illustrates the forecast plots of the monthly number of rainy days. Figure 6 shows the histogram of forecasted values by conducting Model.1 and implementing mode. The comparison between the reported results in Table 6 and calculated mean and standard deviation in Figure 6 suggested that estimating the average number of rainy days in a month using mode produced smaller bias than using conditional probability function.  The performances of Y-W, CLS, and MLE were evaluated, and it was revealed that all estimation methods produced comparable estimations for datasets with larger sample sizes and values of α close to the unite root. Although MLE produced smaller biases and RMSE for smaller values of α, Y-W and CLS had similar performances as MLE for small-sized simulated data. Furthermore, both seasonal models were conducted on the monthly rainy days, and two forecast-coherent approaches were implemented to produce integer-valued predictors. The RMSE and I.A. criteria were calculated to evaluate models. According to the results, considering mode as a forecasted value in the INAR(1) model with different monthly Poisson innovations was a better choice for predicting seasonal non-negative integer-valued time series with the Poisson distribution.