Earthquake clustering in modern seismicity and its relationship with strong historical earthquakes around Beijing, China

XI or more) than the reported epicentre. The present-day event rates are similar to those predicted by the modiﬁed Omori law but there is no evidence of ongoing decay in event rates. Accordingly, the index is more likely to be picking out the location of persistent weaknesses in the lithosphere. Our results imply zones of high seismic density index could be used in principle to indicate the location of unrecorded historical of palaeoseismic events, in China and elsewhere.


I N T RO D U C T I O N
Sensitive instruments have been able to record ever smaller earthquakes in different parts of the world, especially since the 1950s. As the data on these small earthquakes accumulated, two fundamental questions emerged: (i) what are their characteristics and physical properties? and (ii) how do they relate to larger shocks? One of the prime characteristics of small earthquakes is their clustering in space and time, for example, as sequences of foreshocks, aftershocks, and earthquake swarms (Richter 1958;Mei 1960;Bak et al. 2002;Baiesi & Paczuski 2004;Zaliapin & Ben-Zion 2013;Gu et al. 2013;Moradpour et al. 2014;Zhang & Shearer 2016). Aftershocks occur almost universally, as a transient sequence of smaller events occurring after a main shock, at a rate that decays over a few months or years to the background level in the form of an inverse power law known as the Omori-Utsu law (Utsu et al. 1995). Swarms are clusters of events in space and time with no clear main shock.
In contrast, there are several areas of the world, especially of intraplate continental seismicity, where aftershock sequences sometimes appear to persist for centuries. This term has been described by several alternative names, such as 'a long-term aftershock sequence' (Wang 1985), 'a continuing aftershock sequence' (Mueller et al. 2004), 'a long aftershock sequence' (Stein & Liu 2009), 'a long-lasting aftershock sequence' (Castro et al. 2010) and 'a longlived aftershock sequence' (Page & Hough 2014). This diversity reflects the fact that there is no common definition of the characteristic features of the phenomenon, or indeed agreement on its underlying mechanism. However, it is recognized that such longlived aftershocks could be used in principle to indicate the location of unrecorded historical or palaeoseismic events (Ebel et al. 2000).  Table 1 (circles containing letters, dates and magnitudes as shown).
The interpretation of such persistent clusters of energy release as aftershocks of historical or palaeoseismic events is not always straightforward. For example, Page & Hough (2014) showed that the rate of present-day seismic activity in the New Madrid zone of the eastern US, an area known to be associated with three large earthquakes in 1812-1812, is not currently decaying with time at all (Page & Hough 2014), as one might expect from the Omori law. Instead such seismicity may be revealing areas of persistent weakness in the lithosphere.
In this paper we examine the case for persistent clusters for the case of seismicity around Beijing, China. It is a highly densely populated and an important economic region in an intraplate continental setting that has been associated with large historical earthquakes. The region has a rich historical archive that is still being evaluated and updated, as well as a high-quality earthquake catalogue from modern instrumental data since 1970. We introduce a multi-dimensional 'seismic density index' as a metric for clustering of earthquake energy release. The index depends on the event rate, the degree of spatial clustering of epicentres relative to a given position, and the magnitude of a population of earthquakes. We then test the hypothesis that regions of high density in the modern-day catalogue are persistent clusters of historical events.

S E I S M I C DATA I N T H E R E S E A RC H R E G I O N
Beijing is the capital of China, a metropolis which also includes administrational districts that cover some 16 800 km 2 . Our research region is chosen as a quadrangle to cover these administrational districts (115.5-117.5 • E, 39.5-41.0 • N, shown in Fig. 1). A seismic observation network was established around the capital area in 1966. Initially, there were only 8 stations, increasing to 20 in 1975, and 39 from 1984 to the present (also shown in Fig. 1). The network has been providing the best instrumentally recorded catalogue in China and relatively high-quality catalogue data is available since 1970. The number and distribution of these stations, and the instrument response determines the magnitude of completeness of reporting, and depends on time. From 1970 to 1974, earthquakes with M L ≥ 2 should be complete; Since 1975, earthquakes with M L ≥ 1 have been complete (Jiao et al. 1986;Institute of Geophysics 2006).
The locations and magnitudes of the modern events are shown in Fig. 1, in comparison with those of the largest 8 historical events listed in Table 1. The instrumentally determined epicentres are plotted to the nearest 2 or 3 km, representing the accuracy of the inferred location. This results in a more gridded pattern at a small scale on Fig. 1 than might be expected in reality. (Jiao et al. 1986;Institute of Geophysics 2006).

Earthquake clustering in modern seismicity 1007
The area also has an extensive historical archive that has been used to determine the location, magnitude and felt area at different levels of intensity for eight large (M ≥ 6) historical earthquakes, as described in the next section.

Historical earthquake data
Historical data on earthquakes and other natural events have been recorded in China since the Ying Dynasty (16-11th century B.C.), when official historians were appointed by governors to record a range of natural phenomena, including meteorological and astronomical events, flooding, landslides and earthquakes. As a consequence there is an extensive archive of contemporary observations of historical earthquakes, especially in North China (110-125 • E, 30-43 • N). These include other documents available from the historical archive of other government departments, in addition to those recorded by the appointed historians. By the time of the Yuan Dynasty (1271-1368 A.D.) such annals had become routine. Since 1949, there have been two large-scale collections of historical earthquake archives and four formal editions of the Chinese historical earthquake catalogue. Most earthquakes are concentrated in East China during the period between the mid-Ming dynasty (around 1470 A.D.) to the end of Qing dynasty (1911 A.D.) (Min et al. 1995;Wang 2004).
The research region in Fig. 1 has an abundant historical archive, providing the basic intensity data needed to estimate the felt and damaging effects of the earthquake, and to infer the location and magnitude of the causative earthquake. The strong historical earthquakes identified by Min et al. (1995) and Wang (2011) for earthquakes of magnitude M ≥ 6 have parameters as listed to the nearest half-unit in Table 1. This accuracy is limited by the greater uncertainty in magnitude inferred from historical data compared to modern instrumental determinations. The earliest occurred on September, 294 A.D., with an estimated magnitude of 6, and the largest on 1679 September 2, with an estimated magnitude of 8. This great earthquake had a dramatic impact, and hence was recorded in many written archives, providing the source observations for a detailed map of felt and damaging effects at different levels of intensity. Events labelled in lower case in Table 1 occur near the boundary, outside the administrative region.
The historical earthquake archive tends to become more complete with time. The earthquakes labelled E and F have enough recordings to draw out several macroseismic intensity contours. In this case earthquake magnitude (and to some extent depth) can be inferred from the areas defined by the intensity contours. In contrast the parameters of the earthquakes C and D can only be estimated by their peak intensity. For event B, there are fewer historical recordings, but the existing evidence is sufficient to have reasonable confidence in the parameters for event B inferred by in Gu et al. (1983) (Institute of Geophysics 1986Wang 2011). There is even less information for event A, allowing room for a significant a dispute about its parameters. The epicentre for event A was initially located in a range between 115.8 to 116.0 • E and 40.4 to 40.6 • N (Li et al. 1960;Institute of Geophysics 1990a;Min et al. 1995). Here we use the more accurate epicentre re-determined from additional intensity information and a more comprehensive analysis by Wang (2011). For the earthquakes in Table 1, only two earthquakes (labelled E and F) have sufficient recordings to estimate the contours of macroseismic intensity (Institute of Geophysics 1990b).

Instrumental earthquake data
In our research region, 5366 earthquakes with M L ≥ 1 have been recorded between 1970 and 2014. The largest earthquake in this period had an estimated local magnitude of M L 5.0 and occurred in 1990. This maximum magnitude from the instrumental period is much smaller than the estimated magnitude of the historical events listed in Table 1. The frequency-magnitude distribution from this catalogue is illustrated in Fig. 2.
The catalogue is not complete in terms of recorded earthquake depth. Only 2603 of the 5366 earthquakes have a reported depth, some 49 per cent of total. Other events most likely had insufficient data to constrain the depth. The distribution of recorded focal depth is listed Table 2. More than 90 per cent of earthquakes with a reported depth have one in the range 0-20 km. This is not atypical for an intraplate continental setting.
The catalogue also contains an estimate of the uncertainty in the epicentre location. Its distribution is listed in Table 3. Of those  events with a reported uncertainty, a large majority (95 per cent) have an estimated uncertainty in epicentre location of less than 10 km. Some 2542 earthquakes not included in Table 3 have no recorded uncertainty, or 47 per cent of the total of all events. The scientific or operational reasons for the omission are not recorded.

A N E W M E T R I C T O Q UA N T I F Y E A RT H Q UA K E C L U S T E R I N G
There have been many attempts to introduce a quantitative measure of the degree of clustering in earthquake populations. Kagan & Knopoff (1980) introduced the two-point correlation function as a measure the scaling properties of the relative location of event pairs. The data often exhibit power-law scaling indicating scaleinvariant or 'fractal' clustering over a finite range. However, the data are often insufficient to establish scale-invariance over a sufficient range of spatial scales to prefer the scale-invariant hypothesis over others (Aviles et al. 1987). As an alternative Frohlich & Davis (1990) used a single-link cluster analysis, based on the distance between elements in a defined set, to identify earthquake nests, isolated events, aftershock sequences, and zones of seismic quiescence. Eneva et al. (1992) used the technique of pair analysis on data from the Garm region of former Soviet Central Asia, confirming that the most prominent spatial pattern is observed in the form of clustering at short inter-event distances. Simpler methods, such as counting earthquake number in meshes (Mei 1960;Usami et al. 1984) have been attempted to quantify clustering, the main problem being that how to choose the size of the mesh (Ouchi & Uekawa 1986). Zschau (1996) developed the 'SEISMOLAP' method in an attempt to quantify time-dependent hazard from a combination of seismic clustering and quiescence, using the gridding technique described by Wiemer & Wyss (1994). Bak et al. (2002) applied a rescaling analysis using cells of different size L to show that the distribution of waiting times between earthquakes occurring in California obeys a simple unified scaling law valid from tens of seconds to tens of years. A metric to quantify correlations between earthquakes has been suggested, which consists of the time interval and spatial distance between two events. The windows have a spatial radius varying with time, which span a hyperbolic space-time region (Baiesi & Paczuski 2004). Another, based on nearest-neighbour distances of events in the space-time-energy domain, has been introduced and applied in a high-resolution earthquake catalogue covering Southern California (Gu et al. 2013;Zaliapin & Ben-Zion 2013;Zhang & Shearer 2016). In this paper we introduce a new multi-dimensional metric for clustering that depends on the number of events, the distances between their epicentres and the relevant grid node, and their magnitude, using data from an annulus centred on the node. The inner circle of this annulus is determined by the spatial resolution of the epicentres. The resulting 'seismic density index' can then be plotted for the entire grid and contoured to provide a map of the degree of spatial clustering of seismic energy release in the research area.

Method and definition of the seismic density index
For the given region and time period, we first delineate a grid with interval in longitude and latitude. For each grid node, we specify an outer circle centred on the node of maximum radius R. For the jth node, only those earthquakes with epicentres inside the circle will contribute the index. Those outside are assigned to a separate node in due course. The grid size should approximate to the resolution in epicentre location, but not be so large as to introduce too much spatial aliasing and smoothing due to overlapping data for different node points. We also introduce an inner annulus or minimum radius R min to account for the limits of resolution in the catalogue. Between these limits we calculate the distance between jth grid node and the ith earthquake epicentre (r ij ). There is no uncertainty in the node position, so the uncertainty in r ij is equivalent to the uncertainty in the epicentre location. This is a major advantage over methods using the interevent distance, where both locations are uncertain.
At this point we have the number of events n, the distances r ij and the magnitude M i of the ith event within the annulus. In principle the seismic density index for a node j at time t, denoted I j,t , should increase with respect to M i and n, and decrease inversely with respect to r ij . The index is then a mixture of physical and geometric measures of clustering of location, number and rupture dimension in proportion to magnitude. For pragmatic reasons, we use metrics that are logarithmic (such as the magnitude), mainly to reduce the potential for large statistical sampling error that might occur using linear measures (such as energy or moment release). For consistency, we also use a logarithmic measure of the separation distance r ij to account for the actual accuracy of epicentre location. For the jth node at time t, the seismic density index is then defined by Here m is the difference between the threshold magnitude for complete reporting and the maximum magnitude, introduced as a rough normalization factor. The finite minimum distance R min avoids problems with the singularity in 1/(ln r ij ) at r ij = 0. The maximum R is determined by a trade-off between having sufficient sampling and the resulting resolution. At this stage of the work we have not tested the possibility of using smoothed Gaussian kernels for the sampling (e.g. as in Helmstetter et al. 2006), though this would be a logical next step. Once the calculation has been applied to every node in the spatial grid, we can construct contours of the density index. The main ideas and steps are illustrated in Fig. 3.

Determination of the parameters
The parameters of grid size , search radius R and resolution R min are determined by the accuracy of epicentre location, as listed in Table 3. The first level is not larger than 5 km, so we take the grid size  as 0.05 degree of longitude and latitude, roughly equal to 5 km. The parameter R min represents the ultimate resolution of epicentre location. In China, the minimum distance that can be resolved by the networks is about 2-3 km, likely nearer the upper end of this range (Jiao et al. 1986; Institute of Geophysics 2006), so we take R min = e (e ≈ 2.71828) as a mathematically convenient number. The maximum radius R should also depend on the grid size and not contain too much overlapping data. Similarly it should be greater or equal to ( / √ 2), so that events are not restricted to within the dimension of the elementary grid cell. Here we take R = 10 km, that is, including all of the data from the nearest and next-nearest neighbour nodes. The parameter m = (M max − M min ) is 4.0, given M min = 1.0 and M max = 5.0, or 3.0 if M min = 2.0 and M max = 5.0. We use M min = 1.0 and M min = 2.0 to assess the sensitivity of the results to the choice of the threshold magnitude, which is often difficult to estimate uniquely. In addition, some methods (e.g. the 'maximum curvature' method) include events below the actual magnitude of completeness specifically in order to improve the sampling, at the expense of lower resultant b-values (e.g. Roberts et al. 2015). The resulting contours of seismic density and the epicentre distribution for the two magnitude thresholds M min = 1.0 and M min = 2.0 are drawn in Fig. 4. Having completed this sensitivity analysis, we retain M min = 2.0 as our threshold in the following.
For M min = 1.0 (Fig. 4a, upper diagram), the highest seismic density is 160, owing to a large number of earthquakes with M L ≥ 1 (Table 2). Because the resulting anomalies on the plot are so strong, with closely packed contours, we draw the contours increasing in steps of two units in this case. For M min = 2.0 (Fig. 4b, lower diagram), the highest value is only 50, almost 1/3 of that for M min = 1.0, so the contours increase in unit steps in this case. The seismic density contours in Fig. 4 map out clear anomalies associated with the characteristics of the population of earthquake magnitudes and locations. The results are not strongly sensitive to the choice of threshold magnitude, except that the amplitude of the anomalies is greater for the lower magnitude threshold. One anomaly has two subsidiary structures, a second to the NE (near Huangsongyu) and a side lobe to the SE (near Sanhe) of the main peak. In the following we will calculate the seismic density using a threshold magnitude of M min = 2.0, consistent with the estimated threshold of complete reporting.

Sensitivity analysis
The choices of parameters grid size and search radius R is to some extent arbitrary, so here we examine the effect of these choices on the outcome. For illustration we calculate the seismic density with two grid sizes = 0.025 and = 0.050 longitude and latitude degree. The results (Fig. 5a) do not show a strong sensitivity within this range. We then calculate the seismic density with different values of the search radius R (Fig. 5b). The metric is much more sensitive to the choice of R, with a much smoother pattern and a wider contour spacing for R = 12 km compared to R = 10 km, due to the greater number of overlapping data. Nevertheless the cluster identified remains robust to this choice. Much greater values of R degrade the resolution even further.

R E L AT I O N S H I P B E T W E E N T H E S E I S M I C D E N S I T Y I N D E X F O R M O D E R N E V E N T S A N D S T RO N G H I S T O R I C A L E A RT H Q UA K E S
We now analyse the relationship between that modern-day clustering illustrated in Fig. 6 and the strong historical events.

Strong historical earthquakes and seismic density anomalies
It is difficult to determine quantitatively the relationship between strong historical earthquakes and the smaller instrumentally recorded earthquakes from visual inspection alone (Fig. 1). The purpose of the density index is to highlight such candidate clusters for ongoing energy release to a greater extent using a multi-dimensional parameter. The contours for seismic density are compared with the location of the largest six historical earthquakes listed in Table 1 in Fig. 6. All of these events are associated with 'seismic density zones'. Here, we define a seismic density zone with a threshold peak density index of 5, and apply Euclidean (polygonal) boundaries approximately guided by density contours at the value 2. If two zones overlap, the dividing line is taken to follow approximately the saddle line between the two relevant peaks. In future work it would be useful to automate this choice based on such an algorithm, or to define the non-parametric boundaries that are not restricted to polygons. A total of 8 seismic density zones captured approximately by the polygons are shown Fig. 6. Five of the historical event epicentres are within 5 km or so of the location of maximum density.
However the location of earthquake D (the M6.5 event of 1665) is significantly offset to the south of the location of the peak intensity. This anomaly is also unusually strong, with closely packed   contours. There are also two anomalies without any associated historical earthquakes (labelled zones 7 and 8 on the diagram). Nevertheless, Fig. 6 identifies at least six clear candidate zones for persistent clusters. We say 'at least six' because some candidate zones may be associated with events that occurred outside the current historical record (Ebel et al. 2000). In addition events (a) and (b) are also associated with local anomalies.

Comparison of the contours of seismic density and macro-seismic intensity
The epicentre location, often determined by the location of the peak intensity for historical events, is somewhat uncertain due to the spatially variable quality of recording and the known scatter in ground shaking at different sites. Another approach is to use all of the intensity data, and estimate an epicentre as an optimum For the event in 1679, the contour at intensities X and XI has a similar shape and orientation as those of seismic density, and the centres of mass (centroid) of the areas defined by the two contours are closer to the peak density than the reported epicentres. The intensity XI contour also encloses a much weaker subsidiary peak in the anomaly to the northeast, near Huangsongyu. Such elongated contours in the density anomaly are likely to be associated with the orientation of the causative rupture plane inferred from the major axis of the macro-seismic contours. For the event in 1730, there are only two contours: intensity VIII and intensity VIII+. The main part of the intensity contours have a similar shape and orientation as those of seismic density, somewhat elongated towards the southeast, most likely due to amplification of ground shaking in alluvial deposits and/or a greater population density along the valley of a river which flows towards the northwest edge of Beijing.

Temporal distribution of earthquakes in the candidate zones
In this section, we examine the temporal distribution of instrumentally recorded earthquakes in each of the eight candidate zones outlined in Fig. 8. After selecting the earthquakes within each zone, we calculate the annual number within each magnitude bin, and plot the results in Fig. 8. The event rate for small earthquakes in Zones 2, 3, 5, 6 and 7 is relatively stationary, with small fluctuations (Fig. 8).
In contrast, the event rate for Zones 1, 4 and 8 has a single large excursion in each case. The location and seismic density of the transients will be analysed in more detail in the following section. The relatively stationary event rate in the five candidate zones may be due to a decay rate being finite but too slow to observe above the amplitude of the fluctuations in the long tail of the aftershock Omoritype decay, or may indicate true stationary background behaviour. It is not possible at this stage to rule out the possibility of being in the long tail of and Omori-type aftershock decay, because the rate would be expected to be relatively flat by now. Our prior belief is also informed by studies which have concluded aftershock decay can still persist after long delay times (Wang 1985;Mueller et al. 2004;Stein & Liu 2009;Castro et al. 2010;Page & Hough 2014). The main constraint on addressing this question is the duration of the modern catalogue compared to the time elapsed.

Spatial-temporal distribution of the transient clusters
In this section we identify the transient clusters highlighted in the previous section in more detail, calculate their number in monthly intervals separately, and then examine the effect of removing them (declustering) on the density anomalies. The sequence in Zone 1 identified in Fig. 8 occurred in July 1990, with a total of 12 earthquakes with M L ≥ 2. The main shock M L 5.0 occurred on 21 July, with no earthquakes above M L 3 and nine earthquakes with M L 2 followed in the same day. There were only two earthquakes with M L 2 on the 22 July. This transient lasted only 2 d (Fig. 9, left).  There were 45 earthquakes with M L ≥ 2 of the transient in Zone 4. The main shock, with M L 4.2, occurred on 1996 December 16, followed by 36 earthquakes concentrated in rest of the month of December. A further 8 earthquakes continued above background level in following months up to the end of April 1997 (Fig. 9, middle diagram).
For zone 8, a total of 19 earthquakes with M L 2 occurred from July 2012 to September 2013 with no obvious main shock, identifying this transient as a swarm. The largest event, with M L 2.9, occurred on 2013 March 25, towards the end of the swarm. When calculated the number in monthly intervals, the swarm was split to four small ones (Fig. 9, right).
We now consider the spatial distribution of the transient clusters identified in Fig. 9 separately, to assess their influence on the density anomalies. We use the exact same catalogues which temporal distributions illustrated in Fig. 9 to calculate the seismic density We utilize the window method suggested by Knopoff & Gardner (1972) to decluster in space and time. We found the optimal length (L) and duration (T) of the window depends on the maximum magnitude in the swarm. If M = 5.0∼5.49, L = 40 km, T (d) = 150; If M ≤ 4.99, L = 20 km, T (d) = 100. For Zone 1, the magnitude of main shock is 5.0. For Zone 4, magnitude of main shock is 4.2. For Zone 8, there are four small swarms, each of which is treated as a case of M ≤ 4.99. The declustered results are shown in Fig. 10 (right-hand diagrams). For the three cases, the seismic density contours for the transient clusters are more concentrated than those of the declustered catalogues on the right. None of the transient clusters occur outside the range of original density anomaly, although they clearly affect the density of contours and the detailed shape of the anomaly in the raw catalogue (compare the right-hand side of Fig. 10 with Fig. 8 for zones 1, 4 and 8). The density anomaly for the whole study region after declustering to remove the three transients is shown in Fig. 11.
We now investigate if there is a relationship between the seismic density index of the declustered catalogues and the magnitude of the associated historical earthquakes. The peak seismic density of each zone and the magnitude of the associated historical earthquakes are listed in Table 4. The peak seismic densities after declustering are reported, along with and the resulting annual rate of events. The annual rate is the earthquake number above M L 2 (with transient clusters excluded) divided by data time period (here 45 yr, Table 2), then normalized to unit area in units of per 1000 km 2 . The annual rate in Zones 1, 4 and 8 are corrected for the time periods of the transients that are excluded.
From Table 4, the anomalies associated with large historical earthquakes split into at least two groups. In one group, associated with strong historical earthquakes (B, E, F and D) with M ≥ 6 1 2 , the associated seismic density anomalies have peak amplitudes of about 20, that is, in zones 2, 5, 6 and 4 (after declustering), while their annual rates are larger than 1.2 (bold font in Table 4). The peak seismic density and annual rate in units of per 1000 km 2 of great earthquake M8 in Zone 5 are not higher than others, but the area of Zone 5 is largest and includes two seismic density peaks.
The event elapsed time (number of years before the catalogue end date of 2014) is also listed in Table 4. The longest elapsed time is 1720 yr, and still it is associated with a significant anomaly in the density index. The most recent associated historical earthquake occurred in 1730 A.D. Its elapsed time is more than 284 yr, 81 yr earlier than the New Madrid main shocks in Tennessee, US in 1811-12 A.D. These data imply that candidate zones for persistent clusters can persist at least 1720 yr in areas of slow continental deformation, and that the elapsed time has no clear relationship with the peak seismic density for present day seismicity. Table 4 also shows the annual rate of aftershocks predicted by the Omori-Utsu law for direct comparison with the observed rates, after declustering and normalizing to an area of 1000 km 2 . We use typical parameters established by Utsu et al. (1995), based on the observed decrease of aftershocks with time over a century or so following the 1891 M8 Nobi, Japan earthquake originally studied by Omori. Utsu et al. (1995) used this data to constrain the parameters of the modified Omori law n(t) = K(t + c) −p , already successfully applied to many aftershock sequences on shorter time scales. The best-fit parameters from regression with the aftershocks of Nobi earthquake between 1891 and 1991 are: K = 532.16, p = 1, c = 0.797 d (Utsu et al. 1995). The resulting number of aftershocks predicted by this formula is quite similar to the number currently observed in all cases (within a factor two or so) in Table 4.

D I S C U S S I O N
There are two possible interpretations of the results obtained here. One is that the six clusters of seismic energy release identified by anomalies in the density metric represent long-lived aftershocks of the six large earthquakes listed in Table 1. The data are consistent with this hypothesis to first order because the event rates predicted by the modified Omori law are similar to those observed. Moreover, each candidate zone contains a large historic earthquake, and the locations of five of these earthquakes are geographically very close to the location of peak seismic density, even after declustering. This is not the case for candidate zone 4, where there is a historical event on the outer edge of the anomaly. This could be due to a separate underlying process leading to the density anomaly in this case, or due to a strong unrecorded historical or palaeoseismic earthquake. For zone 5 there is also a subsidiary peak that occurs Figure 11. Comparison of the seismic density contours for the declustered catalogue (starting from value 2) compared to the locations of the strong historical earthquakes in Table 1 (circles). within and aligns with the Intensity contour XI, so is likely part of the same anomaly. This complexity could also be due to a second smaller event that is not recorded, or to a major subevent within the M8 earthquake, as seen, for example, in the M s 8 Wenchuan earthquake in 2008 (Zhang et al. 2008;Zhang et al. 2010). Two anomalies (Zone 7 and Zone 8) are not associated with any historical earthquake. Again this may be due to clustering from another cause, or a longer-term memory of an even older unrecorded large event. It is not possible to discriminate these possibilities at the moment, but the observation itself may be useful in prompting a targeted search and/or re-examination of historical archives or to inform future campaigns to acquire palaeoseismic data to constrain seismic hazard in the area around Beijing.
If the modern-day seismicity is part of a long-lived aftershock sequence in the six candidate zones, then we might also have expected the amplitude of the seismic density parameter and the event rate for the six zones to be correlated with the elapsed time of the historical event for the Omori-Utsu decay (Utsu et al. 1995 ; Table 4). This is not the case. Instead, the data are more consistent with an alternative hypothesis that the modern-day seismicity is picking out Table 4. Characteristics of seismicity and associated historical earthquakes in each zone. Note there are no historical earthquakes for zones 7 and 8 (Fig. 11) the locations of persistent weaknesses in the lithosphere, consistent with the relatively stationary event rates in Fig. 8. Such persistent but relatively stationary clustering is also observed in the results of Page & Hough (2014) for the New Madrid earthquake zone in the eastern US. This persistence has also been observed on a much smaller scale in the location of acoustic emissions (AE) during cyclic loading in laboratory rock deformation tests. Once a zone of weakness is created, subsequent loading in the next cycle appears to produce more AE events from the same volume (Sondergeld & Estey 1981). Generally, aftershocks delineate a finite region around the causative fault rupture, to the extent that aftershock area at 1 d was commonly used to estimate the rupture area and its shape, prior to the era of digital recording and finite size source inversion. Such spatial clustering diffuses anomalously slowly, at least for shortlived aftershock sequences or earthquake triggering more generally (Huc & Main 2003). Strongly aligned and persistently localized clustering is also a feature of the seismicity in the New Madrid area, where clear candidate fault planes for three large earthquakes in the 1811-12 sequence are delineated by clear epicentre trails (Zoback et al. 1980). In this case these aligned planes coincide with locations and orientations of faults and focal mechanisms associated with reactivation of an embryonic ridge-transform-ridge system in a failed rift arm under a different modern-day (compressive) stress regime (Zoback et al. 1980). The close association of the elongated epicentre clusters with the inferred positions of the three large earthquakes in the New Madrid seismic zone led to the hypotheses that the modern day seismicity are long-lived aftershocks, or at least represent a persistent memory of the historical events. In the data analysed here, there is no obvious alignment of epicentres of the modern-day seismicity, although the intensity XI contour in Fig. 7 includes two peaks in the density index aligned in the same direction as the major axis of the intensity contours.
Our method would only work as a method of identifying candidate zones for large historical or palaeoseismic earthquakes in cases where there is modern day seismicity. However, this is not always the case. Many faults known to be active from palaeoseismic observations in zones of shallow intraplate continental deformation are not associated with currently seismically active zones. This highlights the importance of combining studies such as ours with alternate methods such as palaeoseismic and geodetic methods, as well as expanding the historical search for significant past earthquakes.

C O N C L U S I O N S
The seismic density index is a multidimensional measure of clustering of energy release for an earthquake population, combining information on the magnitude, the number of events, and the distance between a local grid node and the location of nearby epicentres. It can be used to identify candidate zones for long-lived aftershocks, or reveal other forms of clustering associated with persistent weakness in the lithosphere and/or short-lived transients such as swarms. In either case the results imply that areas of high seismic density index could be used in principle to indicate the location of unrecorded historical of palaeoseismic events in areas of intraplate continental seismicity, thereby targeting the search for such events in historical archives. Our research area, around Beijing in China, contains six main candidate zones for persistent clusters, all consistent with the location of the epicentres of the six largest historical events. In one case there is a relatively weak subsidiary peak aligned with the intensity contours, contained within the XI contour, but not associated with another recorded historical earthquake. In three areas the density index is dominated by an anomaly caused by short-lived transients (two aftershock sequences and one swarm), but the anomaly nevertheless persists even after removing these transients by declustering the catalogue. The event rate per unit area is similar to that expected from Omori-type decay using global average values of the relevant parameters. However, it is also relatively stationary, with no evidence for a systematic decay in amplitude with time as one might expect from a long-lived aftershock sequence. Overall, the results are consistent with the hypothesis of persistent weaknesses in the lithosphere, as previously observed in the New Madrid Seismic zone of the eastern US, albeit with much longer elapse time in our case.

A C K N O W L E D G E M E N T S
The work described in this paper was initiated by a Royal Society of London China Fellowship to WANG Jian, and is currently supported by the NERC/ESRC Newton fund programme on Increasing Resilience to Natural Hazards In Earthquake-prone regions in China with the project title 'Probability and Uncertainty in Risk Estimation and Communication' (PUREC).
We are also grateful for that WU Xuan and ZHANG Zhe from Institute of Geophysics, China Earthquake Administration, who assisted in drafting the figures in this paper, and to two anonymous reviewers for their insightful and constructive comments.