On the spatial correlation between areas of high coseismic slip and aftershock clusters of the Maule earthquake Mw=8.8

We study the spatial distribution of clusters associated to the aftershocks of the megathrust Maule earthquake MW 8.8 of 27 February 2010. We used a recent clustering method which hinges on a nonparametric estimation of the underlying probability density function to detect subsets of points forming clusters associated with high density areas. In addition, we estimate the probability density function using a nonparametric kernel method for each of these clusters. This allows us to identify a set of regions where there is an association between frequency of events and coseismic slip. Our results suggest that high coseismic slip spatially correlates with high aftershock frequency.


Introduction
The most recent large Chilean earthquake occurred on February 27th, 2010 (M w = 8.8) and propagated northward and southward achieving a rupture length of about 450 km (35-37 • S; e.g., Lay et al., 2010). The earthquake filled a seismic gap (Ruegg et al., 2009) that has experienced little seismic activity since 1835, when it broke with an estimated magnitude of M w ∼8.5 (Darwin, 1851).  showed that the two regions of high coseismic slip of the Maule earthquake appeared to be highly locked before the earthquake. Subsequent geodetic studies have established that the main coseismic slip patch (>15 m) is located in the northern part of the rupture area, with a secondary concentration of slip to the south (5-12 m) (e. g. Lay et al., 2010;Delouis et al., 2010;Lorito et al. 2011;Vigny et al., 2011;Moreno et al., 2012). Lorito et al. (2011) concluded that increased stress on the unbroken southern patch may have increased the probability of another great earthquake there in the near future, but his model has poor resolution on this area. In addition, Moreno et al. (2012) suggests that coseismic slip heterogeneity at the scale of single asperities appear to indicate seismic potential for future great earthquakes. These studies are not limited to geodetic data, seismic and tsunami data have been used as well in those studies.
The aim of the present work is to examine the spatial correlation between areas of high coseismic slip and the aftershock frequency (seismic clusters). To achieve this, we identify the distribution of clusters in the rupture area of the 2010 earthquake. The hypocenter data are taken from the SSN (Servicio Sismologico Nacional de Chile), and we use the coseismic model of Moreno et al. (2012), which includes all available geodetic data for the Maule earthquake. We adopted the NPC formulation of Azzalini & Torelli (2007), which has the advantage of not requiring the input of some subjective choices such as the number of existing clusters. We apply this methodology to the SSN aftershock catalogue data of the Maule earthquake and focus on the data of the area between Valparaíso and Tirúa [33-38.5 • S], aiming the identification of seismic clusters and its spatial relationship with regions of coseismic slip. Finally, we discuss the possible improvements of the adopted methodology in terms of available geophysical data.

Methodology
Applications of clustering techniques range over an enormous set of disciplines, both in the natural sciences (see e.g, Contreras-Reyes & Arellano-Valle, 2012) and in the social sciences (see e.g, Azzalini & Torelli, 2007;Menardi, 2010). A standard account is the book of Kaufman & Rousseeuw (1990), which is focused mainly on the more classical methods, based of the notion of distance between objects. An alternative, more recent approach is the model-based clustering formulation which regards the observed data as generated by a probability distribution a mixture of multivariate random variables having distribution belonging to some parametric family. More specifically, for any point x in the multivariate set of possible observations, we associate a probability density function f (x) and assume that a decomposition of type exists, where the p j 's are mixing probabilities and f j (x) denotes the jth component of the mixture, which also corresponds to the j-th cluster of points, for j = 1, . . . , J. In the implementation of this approach, the most common option is to adopt the multivariate Gaussian assumption for each of the components f j (x), and estimate their parameters using an EMtype algorithm. An especially useful account to this approach is provided by Dasgupta & Raftery (1998). By applying the model-based clustering approach to earthquake data in the coastal area of central California, Dasgupta & Raftery (1998) have obtained six clusters, some of which are clearly linked to active faults, such as San Andreas, Calveras and Hayward faults; however one or two of their clusters do not correspond to some already identified area.
In a further approach to the clustering problem, the notion of an underlying density function f (x) is retained, but the assumption that f (x) is a mixture of components is removed, and so also the connected parametric assumption of the components. Therefore, this approach is based on the construction of a nonparametric estimatef (x) of f (x), and the association of a cluster to each of the observed modes of the density. Several alternative formulations of this approach exists (see e.g., Botev et al., 2010;Zhu, 2013), but we concentrate on the kernel-type estimate which is the most popular and conceptually very intuitive (Bowman & Azzalini, 1997). In the univariate case, the kernel estimate is defined bŷ where ϕ(z; h) denotes the normal density function with mean 0 and standard deviation h evaluated at point z. The normal density is adopted for simplicity and it could be replaced by another density symmetric about 0 without much effect on the outcome. A much more important role is played by h which in this context is called 'smoothing parameter'. In the d-dimensional case, ϕ(z; h) is replaced by a multivariate density with a 0 mean vector; the simplest and most popular choice is the product of d such terms, with a different smoothing parameter for each of the d components.
Clusters are then formed by the data points associated to the modes of f (x), and the clusters are separated by regions of low density of points. This logic procedure is referred as a nonparametric clustering (NPC).

Nonparametric clustering
We describe briefly the NPC method proposed by Azzalini & Torelli (2007). This works by assuming that the available set of d-dimensional observations S = {x 1 , . . . , x n } represent a set of points drawn from a continuous multivariate random variable having an unknown probability density function f (x), x ∈ R d . In the case we are concerned with, x = (latitude, longitude) denotes a geographical position, hence d = 2; the data points x 1 , . . . , x n represent the positions of the observed seismic events. For any given constant α such that 0 ≤ α ≤ max x f (x), consider the high density region defined by which has an associated probability p α = Rα f (x) dx. The region R α is, in general, formed by a number m of connected sets, where m ∈ {1, 2, . . . }. Now we let α moves along its range. This causes both m and p = p(α) to move accordingly, and we can regard m as a function of p since p is monotonic with respect to α, write m(p). Note that m(p) is instead a not monotonic function. With the additional conventional settings m(0) = m(1) = 0, it can be shown that the total number of increments of the step function m(p) is equal to the number M of modes of f (x), hence to the number of clusters, in the sense defined earlier. As α varies along its range, and so does p(α), the corresponding connected components of R(α) form a hierarchical tree structure.
Translating the above idea into a working methodology requires some additional specifications and algorithmic work. We sketch here the main step of the procedure. Full details of the method are given by Azzalini & Torelli (1997) and its implementation in the language R is provided by Azzalini et al. (2011). This procedure will later referred to as the 'pdfCluster method'. 1. A nonparametric estimatef (x) of the density is obtained from the observed sample S. Any sort of nonparametric estimate can be employed but in practice both by Azzalini & Torelli (1997) and pdfCluster adopt a kernel-type estimate. This involves to choose a set of kernel bandwidths, one for each component of x. When the target is the estimation of the density itself, the outcome depends critically on the choice of the bandwidth. However in the present context, wherê f (x) is only an intermediate step toward the final target of clustering, the bandwidth is much less influential and its choice can be varied over a quite wide range without affecting the end result.
2. For any α in the range 0 ≤ α ≤ max if (x i ), it considers the sample analogue of (2) given by The notion of connected subsets of S α is introduced via the notion of Delaunay triangulation which is built by making use of computational geometry tools. This leads to establish a certain set of line segments joining some of the data points in such a way that, for any two data points, there always exists a path joining them by a sequence of these segments. If x r and x s are both points in S α and the path joining them does not include any point outside S α , then we say that x r and x s are connected, at the given choice of α.
3. The above step is replicated for a grid of values spanning the admissible range of α, from 0 to max if (x i ). This generates a mode function m(p) and a tree structure of the modes. Notice however that the Delaunay triangulation needs to be determined only once for all α's.
4. At the end of the earlier step we have obtained a tree structure of the modes off (x). Moreover, for each of the M modes, we have allocated some elements of the sample S to the given mode. These M subsets of data points form the 'cluster cores' to which the remaining points, not belonging to any cluster core, must be aggregated. For each unallocated point x 0 , we must select one of distributionsf 1 (x), . . . ,f M (x) which represent the estimated densities of the cluster cores. This allocation is most naturally based on the likelihood ratio, that is we allocate x 0 to the j-th cluster core such that the ratiô is highest. In practice, there are some variant options, depending on whether thef j (x) densities of the cluster cores are updated at each new allocation or not, or according to some intermediate strategy.
The final outcome of the clustering process is represented by the a partition of the sample S into a set of clusters, say C 1 , . . . , C M .

Density-based silhouette diagnostic
In cluster analysis, the term 'silhouette' refers to a diagnostic tool for the validation of the outcome of the clustering process (Rousseeuw, 1987). This technique provides a graphical representation of how well each object spatially lies within a cluster to which it has been allocated; hence it provides an indication of how appropriately the data has been clustered. The idea arises from the comparison of the small distance of each observation to the cluster where it has been allocated and a measure of separation from the closest alternative cluster.
Since the original silhouette is based on a distance measure between the observations, it is not adequate for NPC methods where distances do not play an explicit role. An adaption of the silhouette idea to density-based clustering methods has been proposed by Menardi (2010). The method, called density-based silhouette (DBS) diagnostic, is based on the posterior probabilities whereπ j plays the role of prior probability of cluster C j ; in practice it is taken to be the proportion of points allocated to C j . The DBS index of observation where j 0 denotes the cluster to which x i has been allocated, and j 1 refers to the alternative cluster index for which p j is maximum, j = j 0 . In our case, after partitioning the SSN data with the pdfCluster method, we applied the DBS diagnostic to assess the quality of the outcome.

Temporal analysis
We briefly describe four indexes to illustrate the consistency of the NPC method across the time. These indexes perform accuracy of the observed in predicting the corrected category, relative to that of random chance. In Table 1, n(F i , O j ) denotes the number of forecasts in category i that had observations in category j, N (F i ) denotes the total number of forecasts in category i, N (O j ) denotes the total number of observations in category j, N is the total number of forecasts and i, j = 1, ..., 5 are the indexes of the five identified clusters.
where N SS corresponds to normal skill score with range [0, 1], such that 0 indicates no skill, and 1 indicates perfect score. The index HSS correspond to Heidke skill score (Brier & Allen, 1952) with range (−∞, 1], 0 indicates no skill, and 1 indicates perfect score. The index HK correspond to Hanssen & Kuipers discriminant (Hanssen & Kuipers, 1965) with range [−1, 1], such that 0 indicates no skill, and 1 indicates perfect score. Similar to the Heidke skill score, except that in the denominator the fraction of correct forecasts due to random chance is for an unbiased forecast. Hubert & Arabie (1985) noticed that the Rand index is not corrected for chances that are equal to zero and for random partitions having the same number of objects in each class. They introduced the corrected Rand index, whose expectation is equal to zero. The adjusted Rand index is based on three values: the number r of common joined pairs in O and F, the expected value E(r) and the maximum value max(r) of this index, among the partitions of the same type as O and F. It leads to the formula

Numerical Analysis
Our numerical work is based on data extracted from the SSN catalogue, available at http://ssn.dgf.uchile.cl/. Specifically, we have For the numerical processing, we used the R computing environment software (R Development Core Team, 2012). Most of the work was done using the R package pdfCluster by Azzalini et al. (2011); this package comprises the functions pdfCluster for the NPC method, dbs for DBS diagnostics and kepdf for kernel density estimation. In addition, R provides kruskal.test function for a nonparametric-type analysis of variance and wilcox.test function for individual test of means. Figure 1 displays the geographical map of the area of interest with the points denoting the locations of the events. These points have been clustered using the pdfCluster method described in Section 2, leading to the different coloring. The bottom portion of the figure shows cluster tree and the silhouette diagnostic. These indicate a lack of clear separation among clusters, which is not surprising, given the close geographical proximity of the clouds of points visible in the top panel of the figure.

Clustering process
In the second stage of our numerical work, we have introduced two variants. One was to consider only the areas were slip did take place in a non-negligible form, since our aim was exactly to examine the implications of the slip model. In addition, some numerical exploration has indicated that log-transformation of the quantities, slip and density function, lead to a more meaningful outcome. Since for a number of points the slip value is 0, we adopted the modification commonly used in this case of adding a small positive quantity, that is working log(k + slip), where k is some small quantity. Since slip is measured in meters, then we adopted k = 0.01 which represents a perturbation of only 1 cm of the original data. In the following, the term log-slip will be used for referring to log(0.01 + slip). Furthermore, association between slip and events density can be examined in two different ways. One is to chose a regular grid of points in the region of interest, and evaluate these variables or their log-transforms over this grid. The other option is to evaluate these variables at the observed points of the seismic events. Figure 2 refers to the first form, for three choices of the geographical area over which the grid of points is constructed. More precisely, the sets of points for which the computations have been performed have been obtained as the intersection of rectangular grids of sizes (192 × 214), (149 × 214) and (119 × 214) with the three regions shown on the left side of Figure 2, of different geographical size. This process led then to consider three nonrectangular grids, comprising 19630, 10696 and 4812 points, respectively. The area covered by first grid includes the largest number of points associated of the seismic events of the data set, while the last one refers to the area with highest concentration of events. The right side of the figure displays the scatter plots of log-slip and log-density of the points, separately for each cluster. Only clusters labelled No. 1, 2, 3, 6, and 8 are considered here, since the other clusters of Figure 3 have been dropped, for the reason explained earlier. Even if the selection of the three grids is somewhat subjective, the overall indication provided from Figure 2 provides convincing evidence of the presence association between the variables under consideration, specifically clusters labeled with No. 1, 2, and 6. This association is becoming more and more marked as we move down from the first to the last grid, that is when we focus on the area with greater intensity of events. The type of association is definitely non-linear, and so admittedly it does not lend itself to simple interpretation, but it is clearly present, especially so in the bottom portion of the figure. Figure 4 refers instead to the second form of comparison, where evaluation of log-slip and log-density is performed at the observed location of events instead of a regular grid of points. Also this figure exhibits some noteworthy features. One is that in the red, black, violet, and grey clusters there exists a clear positive association between log-pdf and log-slip. Hence, the maximum slip is associated with high frequency of events. The green cluster does not display any association, presumably so because sev-eral events matching with null slip zone where probably have not been involved with the main earthquake; however the slip produced in this zone is lower. Pichilemu city (34.38 • S, 72.02 • W) is located in the middle of the red cluster, approximately, which where the maximum slip is 16.6 meters. The sky-blue cluster correspond to seismic activity produced by Puyehue volcano eruption (June, 2011). Mocha Island (38.39 • S, 73.87 • W) is located in the bottom of the gray cluster, which where the maximum slip is 11.9 meters (see Figure 1). In the black and gray clusters, we can see a positive association: the values of log-slip increments in the measure that the log-pdf increase.

Nonparametric analysis of variance
A positive diagnostic of NPC allows us to implement the Kruskal-Wallis one-way analysis of variance method (KW; Kruskal & Wallis, 1952) by ranks that show the significance existence of slip between the clusters and consider a confidence level to testing what is the high slip of a cluster over the others. The KW is a classical method for testing whether samples originate from the same distribution where the null hypothesis is that the groups from which the samples originate, have the same median. Since it is a non-parametric method, the KW test does not assume a normal population or another distribution, but its purpose is analogous to the one of the classical analysis of variance for normal populations. The KW method consider a statistical test corrected by ties to compute the p-value and, large values of this test statistic produce the reject of the null hypothesis that the median of groups are equal. We made use of this method to test whether the clusters produced by the NPC method are associated to different slip measurements. The numerical outcome of the R function kruskal.test was 1861.5 with an associated p-value (below 10 −15 ), which provides an extreme indication of heterogeneity of slip among the clusters.  To examine where the differences among the groups are, we make use of the Wilcoxon test (Wilcoxon, 1945). This test perform individual test between two groups assuming for a null hypothesis that not exist differences between the two medians. The results are shown in Table 2. If each pvalue is consider isolatedly, there is only on non-significant comparison at 5% level, but we must make an allowance for repeated testing; in this case, 10 testing procedures have been performed. The more classical form of allowance for repeated testing is via the Bonferroni correction, which here leads to consider the 0.05/10 = 0.005 significance level. Therefore, also the value 0.0181 must be regarded as non-significant. We can see in Table 3 that the red cluster representing the Pichilemu zone has the higher maximum of slip in relation with red (Constitución zone) and violet (Pichilemu's offshore coast zone) clusters. With respect the gray (Arauco zone) and green (Rancagua city zone) clusters, present the lower values of slip and practically, the green cluster does not present slip.
The clustering estimation can be modified depending of the number of days after the mega earthquake and the observations involved in each day, so it may produce unequal results. Hence, for t > 1, we compare the results at day t respect to day t−1 to compare two alternative partitions of the same set. In each comparison it is necessary to keep the same number of clusters at day t and t + 1. Figure 5 shows the consistency of NP method along 200 days since the moment of the great earthquake with values higher than 0.89 for HA case, 0.95 for N SS and HSS cases, and 0.7 for HK case. In the first 200 days, some compressions between the clusters estimation at day t versus day t − 1 produce lower values of the indexes by the incorporation of one or more new groups related to the added observations.

Discussion
The pronounced crustal aftershock activity with mainly normal faulting mechanisms is found in the Pichilemu region (Farías et al., 2011;Lange et al., 2012, Rietbrock et al., 2012. Lange et al., (2012) consider the processed events between 15 March and 30 September 2010 to estimate local magnitudes (M l ) in the Pichilemu region, where those magnitudes are comparable with the SSN magnitudes for large events. Specifically, a crustal aftershock activity is found in the region of Pichilemu (∼ 34.5 • S) where the crustal events occur in a ∼ 30 km wide region with sharp inclined boundaries and oriented oblique to the trench. On the another hand, the aftershock seismicity parallel to the trench is apparent at 50-120 km distance perpendicular to the trench (see Table 3). Near ∼ 35 • S and in the southern part of the rupture at ∼ 38 • S occurs significant aftershock activity after the megathrust earthquake. This seismicity occurs in regions of high coseismic slip (see Table 3). Aftershocks and coseismic slip of the Maule 2012 earthquake terminate ∼ 50 km south of the prolongation of the subducting Mocha Fracture zone around ∼(73.5 • W, 38.5 • S), near of the bottom of gray cluster (see Figure 1).
We have proposed an alternative way to clustering the aftershocks seismicity of the 2010 Maule earthquake M W 8.8. The nonparametric clustering has shown to be consistent in the measure that the dairy aftershocks events are added in the analysis and we present the diagnostic tools to illustrate this feature. We using a nonparametric kernel method to fit the high empirical aftershock frequency, which were highly correlated with the used coseismic slip model. Our findings can be explored further by considering an extended data set, including the events with delayed effect, and modeling the relationship of high coseismic slip areas and aftershock clusters. Also, this catalogue should be considered to the study of the behavior of a typical aftershock sequence, to identify outliers and to classify sequences into groups exhibiting similar aftershock behavior (Schoenberg & Tranbarger, 2008;Schoenberg et al., 2006). Finally, this analysis should be considered in the attempt to help identification of an increasing risk of occurrence of another great earthquake.