A Multivariate Non-Parametric Hazard Model for Earthquake Occurrences in Turkey

This paper provides an introduction to multivariate non-parametric hazard model for the occurrence of earthquakes since the hazard function defines the statistical distribution of inter-event times. The method is applied to the Turkish seismicity since a significant portion of Turkey is subject to frequent earthquakes and presents several advantages compared to other more traditional approaches. Destructive earthquakes from 1903 to 2009 between the longitudes of (39-42)N◦ and the latitudes of (26-45)E◦ are used. The paper demonstrates how seismicity and tectonics/physics parameters that can potentially influence the spatio-temporal variability of earthquakes and presents several advantages compared to more traditional approaches.


Introduction
The statistical modelling of the spatio-temporal distribution of earthquakes is an indispensable tool for extracting information from the available data on the physics of the earthquake occurrence process, and for making reliable earthquake forecasts.There are several statistical methods that can reveal the precursory seismic activity based on earthquake catalogues (Vere-Jones, 1970;Shimazaki and Nakata, 1980;Nishenko, 1985;Boschi Gasperini and Mulargia, 1995;Ellsworth et al., 1998;Ogata, 1998;Kagan and Jackson, 2000;Stock and Smith, 2002;Posadas et al., 2002), but the results obtained seem, in some cases, contradictory.Apparently, the factors that contribute mainly to such differences are the magnitude threshold and the spatial scale investigated.A spatio-temporal clustering for mainshock-aftershock sequences is widely accepted, but it has been suggested that the statistical distribution of large earthquakes might be different (Faenza et al., 2003).The distribution of large earthquakes is studied only in the temporal domain, by selecting small seismic areas where the hypothesis of spatially homogeneous sampling probably holds.Unfortunately, these small areas do not usually contain a sufficient number of earthquakes to test adequately any hypothesis and model (Jackson and Kagan, 1993).For this reason, quite different distributions, such as Poisson (Kagan and Jackson, 1994), Poisson generalized (Kagan, 1991), Brownian passage time (Ellsworth et al., 1998), Weibull (Nishenko, 1985), lognormal (Nishenko and Buland, 1987;Michael and Jones, 1998) distributions, or competitive models, such as general seismic gap hypothesis (McCann et al., 1979), time-predictable model (Shimazaki and Nakata, 1980;Papazachos, 1992), clustering of earthquakes (Kagan and Jackson, 2000), are still commonly used.
Several studies have modelled earthquakes in Turkey as a Poisson process ( Özmen and Kocaefe, 1999;Kasap and Gürlen, 2003;Kalyoncuoglu, 2007).In those, the time between earthquakes is assumed to be exponentially distributed.This assumes that the probability of observing an earthquake at any given time is independent of both the elapsed time since the last earthquake and its severity.
The technique described here based on survival analysis and may account for any tectonics/physics factor that can potentially influence the distribution of the events, and test statistically their importance.This is accomplished by attaching at each inter-event time between earthquakes a vector of covariates, which describes the seismic events.The technique is robust because it considers all of the spatially inhomogeneous data simultaneously to build the statistical model; moreover, it allows use of the censoring times that carry out a certain amount of information on the process of earthquake occurrence and that are not usually considered in traditional approaches.At the same time, the technique is very flexible, being based on a small number of mild assumptions and does not require the knowledge of distribution (Faenza et al., 2003).
The aim of this paper is to give some insight on the spatio-temporal distribution of large earthquakes in Turkey, including 111 destructive earthquakes having magnitude M ≥ 5.0 between years 1903-2009.A multivariate non-parametric hazard model is used to present several advantages compared to traditional approaches and this model has not been used before for the earthquake analysis in Turkey.This paper is organized as follows.In Section 2, survival analysis and its key properties are introduced.In Section 3, sample data of the Turkish seismicity, determination of tectonics/physics parameters are included and the results obtained applying the technique to the M ≥ 5.0 earthquakes that occurred in past century are reported.Finally, in Section 4, some future research perspectives are discussed.

Introduction
Survival analysis is a class of statistical methods for studying the occurrence and timing of events in both social and natural sciences.Survival time is defined as the time to the occurrence of a given event.This event can be the death, development of a disease, response to a treatment, equipment failures, births, marriages, divorces, promotions, retirements, earthquakes and so forth.Let T represents the survival time being in a given state or the time between two events.When subjects have not experienced the event of interest at the end of the study, the exact survival times of these subjects are unknown and these are called censored observations or censored times.In this study, the occurrence of an earthquake is defined as an event.The non-occurrence of an earthquake by the end of the study is referred to a censored observation, i.e. the most recent earthquake in each neotectonic zone represents a censored observation.Such data observations can be modeled as censored data and analyzed by survival models.
The distribution of survival time is characterized by probability density function f (t), hazard function h(t) or survival function S(t).S(t) gives the probability that failure will occur after time t and is given by Many parametric (exponential, Weibull, log-logistic, log-normal, etc.) and non-parametric (Kaplan and Meier, 1958) approximations are used to estimate S(t).Kaplan-Meier estimator is more often used and given by is the ordered failure time; n j is the number of individuals still at risk at time t j , n j − d j is the number of individuals surviving longer than t j .Hazard function is defined by for t > 0 and represents the probability that an individual alive at t experiences the event in the next period ∆t.

Proportional Hazard Model
The well-known model within survival models is proportional hazard model (PHM) which used to investigate the relation between the survival time and some covariates.Although it is based on proportional hazard assumption, no particular form of the probability distribution is assumed for the survival times.Therefore it is called as a semi-parametric model.
The data based on a sample of size n, consists of (t i , d i , x i ), i = 1, • • • , n, where t i is the time on study for the i th individual, d i is the event indicator (d i = 1 if the event has occurred and d i = 0 if the lifetime is censored) and x i is the vector of covariates for the i th individual.Hazard function for PHM is given by where h 0 (t) is the baseline hazard function and β is a p × 1 vector of unknown parameters.
The ordered earthquake occurrence times are denoted by and the set of places which are at risk at time t j are denoted by R(t j ), so that R(t j ) is the set of places which are uncensored at a time just prior to t j .Then, the likelihood for PHM is given by , where x j is the vector of covariates for the place where the earthquake occurs at the j th ordered earthquake occurrence time, t j .
Survival times are usually recorded to the nearest day, month or year and so tied survival times can arise.In order to accommodate the tied observations, Breslow (1974) proposed an approximation to the likelihood function, (2.1) In (2.1), the s j is the vector of sums of each of the p covariates for those individuals who die at the j th earthquake occurrence, t (j) , j = 1, • • • , r.If there are d j earthquake occurrence at t (j) , the h th element of s j is s hj = Parameter estimation is obtained by the Newton-Raphson procedure in PHM.Let u(β) be the p × 1 vector of first derivatives of the log-likelihood function with respect to the β parameters.Let I(β) be the p × p matrix of negative second derivatives of the log-likelihood, so that the (j, r) th element of I(β) is An estimate of the vector of β parameters at the (s + 1) th cycle of the iterative procedure, βs+1 , is βs+1 = βs + I −1 ( βs )u( βs ), (2.3) for s = 0, 1, 2, • • • , where u( βs ) is the vector of efficient scores and I −1 ( βs ) is the inverse of the information matrix, both evaluated at βs .The process is started by taking β0 = 0 and terminated when the change in the log-likelihood function is sufficiently small, or when the largest of the relative changes in the values of the parameter estimates is sufficiently small.When the iterative procedure is converged, the variance-covariance matrix of the parameter estimates is approximated by the inverse of the information matrix, evaluated at β, that is, If the number of covariates is relatively large, the number of possible models that need to be fitted is computationally expensive.In this situation, automatic routines based on forward selection, backward elimination or stepwise procedures for variable selection are used.
In survival analysis, comparisons between a number of possible model is made on AIC (Akaike's information criteria), AIC = −2 log L + αq where q is the unknown β parameters in the model and α is a predetermined constant.The values of α between 2 and 6 can be used in computing the value of statistic.The choice for α = 3 is equivalent to using a 5% significance level in judging the difference between the values of −2 log L for the two nested models which differ by between one and three parameters.

Data Set and Tectonics/Physics Parameters
Turkey and its surrounding area are known as an excellent natural laboratory to study postcollisional intracontinental convergence and tectonic escaperelated deformation, and the consequent structures that include fold and thrust belts, suture zones, active strike-slip faulting, and active normal faulting and the associated basin formation (Kalyoncuoglu, 2007).It is located on a highly active Eurasian Plate which has caused numerous large scale earthquakes throughout history.A significant portion of Turkey is subject to frequent earthquakes, most significantly from the North Anatolian Fault Zone (NAFZ), which stretches across the country and is responsible for many of Turkey's largest historical earthquakes (S ¸engör et al., 2004).Other main sources of seismic activity in Turkey are, East Anatolian Fault Zone (EAFZ) and West Anatolian Graben Complex (Erdik, 2001).To better understand the neotectonic features and active tectonics of Turkey, the simplified tectonic map of Turkey is given in Figure 1.Turkish earthquake catalogue is examined in this study and the investigated area is presented in Figure 2.This data are obtained from an earthquake monitor which is based on real-time earthquake list of Kandilli Observatory for Turkey.In this area, 111 earthquakes with magnitude (M) 5 or larger have occurred in the past 106 years.The examined area is located between between the longitudes of 36 • 42N and the latitudes of 26 • 45E.As seen in Figure 3, the whole country is divided into the five zones.Zone I has the highest level of seismic hazard and the majority of Turkey is in Zone I. Turkey's economy is becoming more dependent on industry in major cities and these major cities are in Zone II.
Survival model for Turkish earthquake catalogue would be useful to predict earthquake probabilities.In this study, PHM is applied to destructive earthquakes in Turkey.Seismic hazard analysis requires assessment of earthquake source parameters in order to estimate the seismic source potential.Our model deals with two kinds of random variable, namely the inter-event time (IET) between two consecutive earthquakes, and the censoring time (CT), i.e. the time interval between the present time and the last time earthquake occurred.At each one of these random variables a vector of covariates is attached that carries out any kind of information relative to the IET or CT considered.
In this study, at each inter-event time (IET) and censoring time (CT), a covariate vector is attached with components: neotectonic subdivision of Turkey (N), transformed energy of earthquake (TE), depth of earthquake (D), number of foreshocks (FS), number of aftershocks (AS), magnitude of the earthquake (M), Mercalli scale of intensity (I), calculation of acceleration values that will effect the construction (AV) and fault system (FS).It is now worthwhile to overview these covariates that are supposed to be statistically significant.
• Neotectonic Subdivision (N): As seen in Figure 1, Turkey and adjacent areas are divided into four main neotectonic domains: area of extensional neotectonic regime, area of strike-slip neotectonic regime with normal component, area of strike-slip neotectonic regime with thrust component and area of contractional neotectonic regime.
• Transformed Energy (TE): The energy of an event is chosen as measure of its strength.The formula of Gutenberg ( 1956) is used to calculate the energy of an earthquake (Jarkov, 1983) log E s = 11.8 + 1.5M.
To reduce, the very broad variability range of the energy released in an earthquake, this quantity is transformed through the formula where E max and E min are respectively the upper and lower bound of the observed values, and E is the energy of the i th event (Gospodinov and Rotondi, 2001) • Depth (D): Earthquakes occurring at a depth of less than 70 km are classified as 'shallow-focus' earthquakes, while those with a focal-depth between 70 and 300 km are commonly termed 'mid-focus' or 'intermediate-depth' earthquakes.In fact, the great majority of earthquake in Turkey are shallow.Shallow earthquakes often tend to cause serious damage.That is, the deeper an earthquake is, the less effect it has in the surface of the Earth.
• Foreshock (FS) and Aftershock (AS): To detect the significant seismic quiescence, it is necessary to decluster the earthquake catalogue.For this process, it has been made use of the algorithm introduced by Reasenberg (1985).It removes all the dependent events from each cluster, and substitutes them with a unique event.The foreshocks are searched with M ≥ 3.0 within a distance of 50 km and five days from each earthquake with M ≥ 5.0.The aftershocks of each earthquake with magnitude M ≥ 4.0 within one month and shallower than 40 km are used.
• Magnitude (M): The great majority of earthquake in Turkey are shallow and surface wave magnitude is generally used for shallow earthquakes.Surface wave magnitude is estimated using the formula M = log(A/t) + 1.66 log ∆ + 3.3, where A is the maximum amplitude (in micrometers) of the Rayleigh waves, T is the period (usually about 20 seconds) and ∆ is the distance (in degrees).
• Mercalli Scale of Intensity (I): Intensity measures the strength of shaking produced by the earthquake at a certain location.The Mercalli intensity scale is a scale used for measuring the intensity of an earthquake.The scale quantifies the effects of an earthquake on the Earth's surface, humans, objects of nature, and man-made structures on a scale of I through XII, with I denoting not felt, and XII total destruction.
• Fault System (FS): Fault is a planar or gently curved fracture in the Earth's crust across which there has been relative displacement.There are six kinds of fault systems in Turkey that are strike-slip fault, thrust fault with strike-slip component, subduction zone, suture zone, oblique-slip normal fault, strike-slip fault with normal component as seen in Figure 1.Since subduction zone did not produce earthquakes with M ≥ 5.0 from 1903 to 2009, five fault zones are used.

The Results of Survival Analysis
A probabilistic seismic hazard and methodology is applied to Turkish seismicity in order to determine earthquake occurrences.The collective median survival time for all earthquakes is estimated at 1.40 years.Considering each neotectonic subgroup, median survival time and the number of earthquakes are presented in Table 1.Table 1 shows that most earthquakes have been occurred in Zone I and II.It is also found that Zone IV had the longest median survival (24.80 years), while Zone II had the shortest survival.In the non-parametric approach to seismicity, the estimates of Kaplan-Meier survival probability are presented in Table 2.
As shown in Table 2, the risk of the earthquake occurrence after the preceding one is 0.29 in six months, 0.42 in one year, 0.83 in five years and 0.99 in 25 years, respectively.Kaplan-Meier method provides useful graphical presentation of survival distribution as well as estimates of survival probabilities.Kaplan-Meier estimate of the survival function is plotted in Figure 4.  Considering the four neotectonic zones, the estimates of Kaplan-Meier survival probability are summarized in Table 3.As expected, Zone II shows the highest recurrence probability for an earthquake within six months.Since the major industry centers are located in Zone II, this result is important for Turkey.The recurrence probability of an earthquake within 1.5 years is 0.5593 in Zone I; 0.6681 in Zone II; 0.375 in Zone III, respectively.On the other hand, the survival probability of Zone IV differs from other zones because of different seismic characteristics.An earthquake will be occurred in twenty five years with the probability 0.5.
Kaplan-Meier curves for each zone in Figure 5 also indicate that Zone IV has a better survival curve than other zones.Moreover, as the number of years increases, the curves appear to get farther apart.However, the weakness of this approach is that it does not provide a comparison of the total survival experience of the four groups, but rather gives a comparison at some arbitrary time point (s).For this reason, the survival functions of the neotectonic zones are also  4. As seen in Table 4, PHM is statistically significant at 95% confidence level.By using equations (2.1)-(2.3),we obtain β M = −0.69± 0.17 and β TE = 3.2 ± 0.1.The results show that the models containing magnitude (M) and transformed energy (TE) are found as important covariates which effect the occurrence of earthquakes whereas others are not significant at a 95% confidence level.
The estimated hazard of magnitude is exp(−0.68) .= 0.50; that is, for a 1-unit decrease in magnitude, the risk for occurrence of earthquake increases a 0.5unit.For transformed energy, the value of exp(3.20)corresponds to the change in hazard for an increase of a unit in the log scale.Thus, the estimated hazard increases 24.5 times if the transformed energy is higher by a factor of 10.

Discussion
United Nations Development Programme (UNDP) announces Turkey as the third country after Iran and Yemen according to the number of deaths as a result of earthquakes.Turkey is located on a highly active Eurasian Geological Plate and there have been 111 earthquakes with magnitudes 5.0 or greater.Sixteen earthquakes with casualties more than 1.000 have occurred since 1903.For this reason, a research on earthquake occurrences in Turkey can be informative for understanding the seismicity of the investigated area.
In this paper, a statistical analysis of the spatio-temporal distribution of Turkish seismicity is performed.A nonparametric multivariate statistical model is used for the first time that is able to account for seismological/geological parameters simultaneously.The method has been applied to Turkey by using a regionalization based on tectonic parameters for the period 1903-2009.Kaplan-Meier estimations provide that neotectonic Zones I and II are more risky relative to neotectonic Zones III and IV.The majority of Turkey is located in Zone I and Zone II is one of Turkey's most industrialized regions, home to much of Turkey's heavy industry, including petrochemical plants, car manufacturers, tire companies, paper mills, steel fabrication plants, cement plants and pharmaceutical firms.The results of PHM suggest that magnitude and transformed energy of earthquake are the important factors that influence the occurrence of earthquake.

Figure 1 :
Figure 1: Simplified map showing the neotectonic subdivision of Turkey and adjacent areas (Koçyigit and Özacar, 2003)

Figure 5 :
Figure 5: The graph of cumulative survival probabilities for neotectonic zones

Table 1 :
The median survival time for each neotectonic zone

Table 2 :
Kaplan-Meier estimations of survival probability

Table 4 :
Partial likelihood inference on Turkish seismicity data from PHM