A Geostatistical Approach to Predict the Average Annual Rainfall of Bangladesh

: In this paper we tried to fit a predictive model for the average annual rainfall of Bangladesh through a geostatistical approach. From geostatistical point of view, we studied the spatial dependence pattern of average annual rainfall data (measured in mm) collected from 246 stations of Bangladesh. We have employed kriging or spatial interpolation for rainfall data. The data reveals a linear trend when investigated, so by fitting a linear model we tried to remove the trend and, then we used the trend-free data for further calculations. Four theoretical semivariogram models Exponential, Spherical, Gaussian and Matern were used to explain the spatial variation among the average annual rainfall. These models are chosen according to the pattern of empirical semivariogram. The prediction performance of Ordinary kriging with these four fitted models are then compared through 𝑘 -fold cross-validation and it is found that Ordinary Kriging performs better when the spatial dependency in average annual rainfall of Bangladesh is modeled through Gaussian semivariogram model.


Introduction
Bangladesh is an agro-economic country, most of the agricultural productions of the country largely depend on the annual amount of precipitation. In addition, the geographical location along with the low elevation above the sea level made Bangladesh vulnerable for climate change issues. Also, increasing population density in the country became an alarming issue to deal with given its limited natural resources. Moreover, optimal use of its natural resources is not possible by ignoring the natural calamities like flood, drawn etc. Therefore, to meet the demands it has become very urgent to ensure the maximum production of food. The production of various types of crops, especially rice is extremely dependent on adequate amount of rainfall (Rana et al., 2007), thus, estimation of rainfall is frequently required issue for agricultural planning, water resource management, hydrological and ecological modeling. Moreover, hydrologists often require point estimate of rainfall at unmeasured locations from the gauged rainfall at surrounded locations.
Besides traditional approaches geostatistical methods like Thiessen polygon, inverse distance weighting (IDW) and isohyetal methods (Thiessen, 1911;Shepard, 1968;McCuen, 1989) are now used as alternative approaches for rainfall prediction. Geostatistics mainly deals with the characterization of spatial attributes, employing primarily random models in a manner similar to the way in which time series analysis characterizes temporal data (Olea Ricardo, 1999). The concept of geostatistics was first coined by Danie Krige in 1950, basically from the mining and petroleum industries. In contrast to traditional statistics, geostatistics examine both the statistical distribution of the sample data as well as the spatial correlation among the sample data.
According to Goovaerts (1997), Geostatistics is based on the theory of regionalized variables and provides a set of statistical tools for incorporating the spatial correlation of observations in data processing. They utilized the spatial correlation between neighbouring observations to predict attributed values at unsampled locations (Goovaerts, 1999). A detailed discussion of the methods can be found in Webster & Oliver (2007). A number of authors (Tabios & Salas, 1985;Phillips et al., 1992;Dubrule, 1983;Burrough & McDonnell, 1998;Stein & Corsten, 1991;Laslett, 1994) have shown that geostatistical interpolation techniques (kriging) provide better estimates of rainfall than conventional methods. Also Goovaerts (2000); Lloyd (2005); Hutchinson (1981a,b) concluded that kriging method yields more realistic spatial behaviour of the climatological variable of interest.
The objective of this paper is to explore the semivariogram model among the Exponential, Spherical, Gaussian and Matern semivariogram models for which the Ordinary Kriging (OK) method best predicts the average annual rainfall of Bangladesh. The prediction performance of OK is evaluated through k-fold cross-validation technique.

Data
The study data have been collected from official website www.barc.gov.bd of Bangladesh Agriculture Research Council (BARC). The data were recorded by Bangladesh Water Development Board (BWDB) and uploaded by BARC. The BARC uploaded this data for helping in research works of different agricultural institutes under BARC. The data contain information about monthly average rainfall for 246 different rainfall gauge stations of Bangladesh. The locations of the rainfall gauge stations are shown in Figure 1. The Cartesian coordinates along with altitude for all the stations are also available in the data. Average annual rainfall is calculated by taking sum over the long term average of monthly rainfall. The straight line distances between all possible pairs of rainfall stations are calculated by using the Cartesian coordinates of the rainfall stations.

Methodology
Geostatistical approach to predict the precipitation is composed by a set of chained procedures that aims at choosing an inferential model that explicitly considers the spatial relationships present in the phenomenon. In general, the modeling process is preceded by the phases of exploratory analysis associated with visual presentation of the data in the form of graphs and maps and the identification of spatial dependency patterns in the phenomenon under study. The Ordinary Kriging is used in this study as a geostatistical approach for predicting the average annual rainfall. To accomplish this task, we first fit suitable theoretical semivariaogram model and then Ordinary Kriging is used to predict the rainfall.

Estimation of Sample Semivariogram
The semivariogram measures the spatial continuity for two dimensional spatial data. Assuming the considered stochastic process as stationary or intrinsic (Armstrong, 1998) the traditional experimental semivariogram is defined as, where the summation is over all distinct pairs of sites and in the sample that are distance ℎ apart and ℎ is the number of pairs those are ℎ distance apart.
The semivariogram (h) in equation (1) representing spatial covariance, usually computed by using the available sample data, is the key part of the kriging procedure. A semivariogram model takes ℎ, the distance between two sampled points as input, and gives an estimate of semivariogram (h) as output that explains the variance in the attribute over the distance between two points.
To estimate the sample semivariogram (h) in equation (1), the distances between all pairs of sample points are usually computed using any suitable measure of distance. Then zero to half of the maximum distance is divided into a set of discrete intervals. The width of each discrete interval ℎ must be large enough so that there are enough points of pair for estimation. A rule of thumb in computing sample semivariogram is to have at least 30 points of pair in each discrete interval. Finally, for each discrete interval, formula in equation (1) is used to compute the sample semivariogram.

Theoretical Semivariogram Model
A theoretical semivariogram model is important in order to compute the value (ℎ) of semivariogram at any given lag ℎ which is required for interpolation algorithms (Goovaerts, 1997). There are several possible theoretical semivariogram models in geostatistics. Among those models, we considered Exponential, Spherical, Gaussian and Matern models based on the similarity with the pattern of empirical semivariogram.

Exponential Model
The exponential model describes a semivariogram that approaches the sill gradually but never converges with the sill,

Spherical Model
The spherical model is a modified quadratic function that describes a semivariogram which actually reaches an asymptote at the sill,

Gaussian Model
The Gaussian model is similar to the exponential model, and describes a variogram that begins rising slowly from the -intercept, and then rises more rapidly toward the sill,

Matern Model
The Matern model is a family of semivariogram models that includes an additional parameter known as smoothness parameter in the model. Both the Exponential and Gaussian models are member of this family. The Matern model with nugget effect can be defined as, where, Γ is the gamma function and (. ) is the modified Bessel function of the second kind of order (Gelfand et al., 2010;Bowman, 1958). This model in equation (2) would be reduces to Exponential when its smoothness parameter = 0.5 and to Gaussian when → ∞. For the semivariogram models mentioned, 0 , and 0 represent nugget effect, sill variance and range respectively are the three parameters of a theoretical semivariogram model. These parameters are required to be estimated to fit such semivariogram models.
In literature, there are four methods usually used for estimating parameters of a theoretical semivariogram model (Cressie, 1993). These methods are visual, ordinary least square, weighted least square and maximum likelihood method. This study used weighted least square estimation technique to estimate the parameters. Cressie (1985) proposed the simplified form the weighted least square criterion as, where ̂(ℎ ) is the empirical semivariogram and (ℎ ; ) is the parametric semivariogram at lag ℎ , is the number of discrete intervals for which empirical semivariogram has been computed and summation is taken over all the discrete intervals. The three parameters nugget, sill and range are estimated in this study by minimizing the criterion in equation (3) in case of the theoretical models considered in this present study.

Ordinary Kriging
Formally krignig refers to a family of least squares linear regression algorithm that attempts to predict values of regionalized variable, or a phenomenon that is spread out in space, at locations where data for the variable is not available, based on the spatial correlation of the available data. In short, Kriging is an optimal interpolation method based on regression against observed values of surrounding data points, weighted according to spatial correlation values (Goovaerts, 1997). Among the several possible kriging methods only Ordinary Kriging is applied here for estimating the average annual rainfall.
The Ordinary Kriging assumes the value of the stochastic process at the unobserved locations as weighted sum of the values at observed locations. Suppose, values of the stochastic process at observed locations are ( 1 ), ( 2 ), … , ( ) then kriged value at an unobserved location 0 is where the coefficients 1 , 2 , … , are estimated so that the mean square prediction error ( ( 0 ) −̂( 0 )) 2 is minimum subject to the constraint that (̂( 0 )) = ( ( 0 )).

Assuming
is the semivariogram value for the difference between and locations. The solution obtained by the Lagrange multiplier method (Cressie, 1992 , where, * is a constant obtained along with the coefficients { } =1 , is the Lagrange multiplier. The mean square prediction error of ( 0 ) is,

Performance Assessment
In this paper, the prediction performance of different methods of fitting semivariogram models is assessed by cross-validation. Cross-validation is one of the processes used for choosing the best model among several competing candidates. It has achieved extensive popularity after the works of Stone (1974) and Geisser (1975). The main idea of cross-validation techniques is to omit a portion of the observed data for model building. The fitted model is then validated by predicting the omitted data through the prediction error which is usually calculated using observed and predicted values. A general description of cross-validation approaches has been given by Geisser (1975). However, in geostatistics, the cross-validation techniques can be used to choose the best variogram model for the spatial dependency (Cressie, 1993).
Though "leave-one-out" (LOO) is a classical approach of cross-validation, it has several limitations such as computational complexity and sensitiveness with the small change in data. That is why, this study utilized -fold cross-validation to overcome such limitations. In -fold cross-validation, firstly, the data are split into groups randomly. Secondly, the ℎ group is omitted from the model building and the values of response in the ℎ group are predicted, the process is iterated for all = 1,2, … , . To do this, oridnary kriging described in Section 3.3 is used. After the completion of folding algorithm, the following two measurements (Cressie, 1993) based on standardized prediction residuals have been computed to compare the competitive models considered in this study.
where is the number of locations where the value of is interpolated, ( ) is the observed value of at location , − ( ) is the kriged value of at location omitting ( ) from the model building and − ( ) is the standard error of the prediction at location . Cressie (1993) suggested that the in equation (4) should be closer 0 and in equation (5) should be closer to 1 if the model fit the data well.

Results and Discussion
In geostatistical approach the stochastic process under consideration is needed to be second-order stationary for accurate semivariogram model building. Second-order stationary in geostatistics means that 1 and 2 moments in the process are constant throughout the study region Othman et al. (2011). This stationarity assumption is checked by investigating whether any trend in the log of average annual rainfall exists throughout the study region. One of the ways to carry out such investigation is exploratory analysis. A surface plot of log average annual rainfall is drawn to investigate the presence of any trend and shown in Figure 2, where a clear evidence supporting spatial monotonic increasing trend in log average annual rainfall is observed. More precisely, rainfall is increasing towards the North-East direction of the country. Therefore, a generalized linear model (GLM) technique is used to remove the monotonic trend found in the average annual rainfall of Bangladesh. In GLM, we assumed that the log average annual rainfall is a function of and coordinates as in the equation (6), where represents log average annual rainfall, represents coordinates and represents coordinates. In the model (6), the random error is assumed to have two components that is = + , where is spatially correlated that is satisfy the assumptions and is the pure noise term. After fitting the model in equation (6), we investigated the stationarity assumptions in case of the component of the random error .
In Figure 3, we showed the surface plot for the residuals obtained after fitting the model (6) on original rainfall measurements. It is clearly evident that the monotonic behaviour in log average annual rainfall in Figure 2 is almost removed while we considered the residuals. Since, the residuals from model (6) are nearly trend free; hence, the geostatistical analysis is done on the residuals. Here, it is important to note that after predicting the residuals through geostatistical analysis; the results will be back transformed using equation (6) for obtaining the prediction for average annual rainfall. At the first phase of geostatistical analysis, empirical semivariogram according to the method described in Section 3.1 is estimated to model the spatial dependency of average annual rainfall. Figure 4 shows the empirical semivariogram estimated using equation (1). The pattern revealed through plotting the estimated values of (ℎ) in equation (1) against different values of ℎ suggests that any of the theoretical semivariogram models Exponential, Spherical, Gaussian or Matern can be appropriate for describing spatial dependency presence in average annual rainfall. These candidate models are then fitted using the weighted least square criteria in equation (3) and the fitted models are represented in Figure 5. The graph representing the fitted models in Figure 5 depicts that the Spherical and Matern models fit much closer to empirical semivariogram whereas Gaussian model seems to cover the pattern of the empirical semivariogram best. Among the four candidate models, it is also observed in Figure 5 that the Exponential model does not fit well as compared to the other models. The simplified weighted least square (WLS) criterion (Cressie, 1985) in equation (3) is utilized for estimating the parameters of the four theoretical semivariogram models. The estimated parameters obtained after fitting the four semivariogram models are presented in Table 1. Though the nugget ( 0 ) is found to be 0 in case of Exponential and Spherical models, non zero nuggets are found for the two other models. But, both of them seem to be too small which indicates that the average annual rainfall of Bangladesh at a given location has a small variation. In addition, the results by all the semivariogram models deduce that the underlying process vary finitely over the spatial domain of Bangladesh because we observed finite values for the sill variance ( ) in case of all the models. In addition, although the estimates of sill variance do not vary substantially for different models, the estimates of range parameter ( 0 ) vary substantially across semivariogram models. This leads to the conclusion that different semivariogram models consider spatial dependency up to different lags. Based on the estimates in Table 1, it can be interpreted that if spatial interpolation is done at unobserved locations, the Exponential model will assume that the average annual rainfall at two locations apart by more than 322.59 kilo meters are spatially independent. In case of Spherical, Gaussian and Matern models, these distances are 269.21, 229.71, and 279.77 kilo meters respectively. Ordinary Kriging (OK) is then applied using all the four fitted models for predicting the average annual rainfall of Bangladesh. Here, we predict the average annual rainfall for a (200×200) grids covering the range of and coordinates. The prediction of the average annual rainfall has been made by taking exponential of log rainfall found by the back transformation using equation (6) of the residuals that was obtained through Ordinary Kriging. Figure 6 shows the kriged (using OK) average annual rainfall for the four semivariogram models. The points in graphs show the sampled locations of the stations from where the average annual rainfall were observed. From the kriged values it is obvious that the OK with Exponential and Spherical models behaved in a similar manner. Although, the maps in Figure 6 revealed similar interpolated value of average annual rainfall according to the OK in case of different models, the pattern of contour is different in different maps. That is, prediction of the average annual rainfall at hot spots is different when different semivariogram models are considered. In Figure 2, of observed data we can see that the average annual rainfall is higher at Sylhet division (North Eastern stations) of the country. Our interpolation technique (OK) using all the models gives indication of similar rainfall as observed at the same region. Moreover, Ordinary Kriging tells us that the South Western gauge stations of Bangladesh experience low rainfall.
At the final phase of our analysis, we investigated the prediction performance of Ordinary Kriging (OK) to find the semivariogram model that fits spatial dependency of average annual rainfall of Bangladesh relatively well. The interpolation performance was assessed on the basis of the cross-validation technique mentioned in Section 3.4. Thefold cross-validation has been applied for both = 5 and = 10 (Arlot et al., 2010). The performance indicators in equations (4) and (5) are computed for OK with the four semivariogram models fitted earlier. The results obtained from cross-validation of semivariogram models with Ordinary Kriging are shown in Table 2. In both cases, the Ordinary Kriging with Gaussian model produces values closer to the standard values as described in Section 3.4 than those found in other scenarios. Further, from the sign in it is also notable that with the Exponential and Spherical models Ordinary Kriging underestimates the rainfall. On the other hand, with Gaussian and Matern model OK performed overestimation. Though, with the four considered semivariogram models, prediction through Ordinary Kriging does not differ substantially, both the indicators provide evidence supporting that the Gaussian model describe spatial dependency relatively well.

Conclusion
The main aims of this study were to model the spatial dependence pattern and to predict the average annual rainfall (measured in mm) of Bangladesh through Ordinary Kriging. Investigating the data, we found that a spatial trend is present in the rainfall from South-West to North-East of Bangladesh. Removing this trend, we have seen that Exponential, Spherical, Gaussian and Matern models captured the spatial dependency of average annual rainfall well. In addition, all four models suggest a finite spatial variation among the rainfall of Bangladesh. Specially, this study attempted to predict the average annual rainfall at ungauged locations and found that Ordinary Kriging with Gaussian semivariogram model best predict the average annual rainfall of Bangladesh. Thus maps produced using the models can therefore be helpful for agricultural and meteorological planning in Bangladesh.