SUMMARY

The productivity of a magnitude m event can be characterized in term of triggered events of magnitude above m − Δ: it is the number of direct ‘descendants’ |$\nu _\Delta$| and the number of all ‘descendants’ |$V_\Delta$|⁠. There is evidence in favour of the discrete exponential distribution for both |$\nu _\Delta$| and |$V_\Delta$| with a dominant initial magnitude m (the case of aftershock cluster). We consider the general Epidemic Type Aftershock Sequence model adapted to any distribution of |$\nu _\Delta$|⁠. Our first result shows that models with branching aftershock structure do not allow for the coincidence of distribution types of |$\nu _\Delta$| and |$V_\Delta$| (say, the discrete exponential, as in the scientific literature). The second problem is related to the tail behaviour of the |$V_\Delta$| distribution. We show the fundamental difference in tail behaviour of the |$V_\Delta$|-distributions for general-type clusters and clusters with a dominant initial magnitude: the tail is heavy in the former case and light in the latter. The real data demonstrate the possibilities of this kind. This result provides theoretical and practical constraints for distributional analysis of |$V_\Delta$|⁠.

1 INTRODUCTION

Earthquakes clustering, aftershocks occurrence in particular, is a prominent feature in observed seismicity. Different models have been proposed to describe the features of triggered events, especially in the case of aftershocks clusters with an initial dominant magnitude earthquake; yet, several theoretical aspects still need to be explored. In this study, we investigate the distributions of the number of triggered events as described by a branching process, which might provide useful practical constraints for empirical analysis of earthquake data.

Let |$V_\Delta$| denote the number of seismic events of magnitude mm0 − Δ caused by an earthquake of magnitude m0mc + Δ, where mc is the lower magnitude threshold and Δ a positive default value. |$V_\Delta$| is further referred to as the total Δ-productivity.

Considering an earthquake cluster as the result of a branching process (Zaliapin et al. 2008), we can also deal with the directly triggered events of m0, that is the first generation events of an m0-event. The number |$\nu _\Delta$| of such events is in the same magnitude range mm0 − Δ and is later named Δ-productivity of m0.

The distributions of quantities |$V_\Delta$| and |$\nu _\Delta$| have recently become the object of active analysis in the literature (Baranov & Shebalin 2019; Shebalin et al. 2020; Shebalin & Baranov 2021; Baranov et al. 2022). According to these publications, the |$\nu _\Delta$|-distribution |$f(n)=P(\nu _\Delta =n)$| is universal in shape, namely, it is geometric (or discrete exponential):
(1)
where p does not depend on the magnitude of triggering event. This result is relevant in statistical seismology because the popular Epidemic Type Aftershock Sequence (ETAS) model (Ogata 1998) is based on the Poisson distribution
(2)

Similarly, the geometric distribution (eq. 1) was adopted for statistics |$V_\Delta$|⁠, provided that it refers to clusters with a dominant initial magnitude m0, that is, to aftershocks. For the first time this hypothesis appeared in the work by Soloviev & Solovieva (1962); it was based on relatively scarce data (1954–1961) for the Pacific Belt (m0 ≈ 7; Δ = 2) and for the Kamchatka and Kuril Islands region (m0 ≈ 6; Δ = 2). Shebalin et al. (2018) expanded the analysis and confirmed the hypothesis using 850 aftershock sequences with main shocks m0 ≥ 6.5, Δ = 2 from the ANSS (1975–2018) catalogue. In the same time Kagan (2010), using aftershocks m ≥ 4.7 from PDE (1977–2007) catalogue for m0 = 7.1−7.2, found that the empirical distribution of |$V_\Delta$| is bimodal with the dominant peak at n = 0.

Zaliapin & Ben-Zion (2016) provided a thorough analysis of earthquake clustering and showed its spatial dependence and connection with heat flow production. They showed that, on the global scale, the total Δ-productivity distribution (for clusters with max m ≥ 6 and Δ = 2) looks like a power-law for locations with high heat flow (H > 0.2 W m–2), and as a log-normal distribution otherwise. These conclusions are qualitative because they are based on log–log plots. Nevertheless, the data for the regions of high heat flow differ significantly from the geometric hypothesis (eq. 1) for aftershocks. The same should be expected for any cluster with max m ≤ 5. In this case, according to Zaliapin & Ben-Zion (2016), the cluster size distribution in zones with both high and low heat flow is a power type distribution, f(n)∝n−κ − 1, κ ≈ 2.5.

These observations rise the following questions: (i) is it possible to consider the geometric distribution of |$\nu _\Delta$| and |$V_\Delta$| (for aftershocks) as laws of seismicity and (ii) how to explain the observed difference in the tail behaviour of the |$V_\Delta$| distribution for arbitrary clusters and aftershocks. A rigorous analysis of these issues is possible within the framework of a model adequately related to the concepts of |$\nu _\Delta$| and |$V_\Delta$|⁠. A model suitable for this purpose may be the general ETAS model, namely the ETAS model with an arbitrary (not only Poisson) distribution of direct aftershocks associated to any event (Saichev et al. 2005). The purpose of this note is twofold: first, to show the incompatibility of the hypotheses about the geometric distribution of |$\nu _\Delta$| and |$V_\Delta$| (for aftershocks) within the general ETAS model; and second, to prove the fundamental difference between size distributions for general type clusters and aftershock clusters.

2 ETAS MODELS

Let us consider the seismic events x = (t, g, m) occurred in the time–location–magnitude space |$\mathcal {S}=(t\ge 0, g\in \boldsymbol {R}^2, m\ge 0)$|⁠. The lower magnitude threshold mc is taken here as the zero reference point without loss of generality. According to the branching structure of the ETAS model, an earthquake cluster can be represented as follows (Saichev et al. 2005). The initial event x0 = (t0, g0, m0) generates a random number ν(m0) of events {xi: i = 1, …, ν(m0)}, where ν(m0) follows a special distribution F with an average λ(m0) = Eν(m0). The offspring events are distributed in |$\mathcal {S}$| as independent variables with the probability density
(3)
Each new event independently generates its own family of events following the generation law described above, and so on. If the mean of the direct triggered events is Λ1 = Eλ(m) < 1, then the mathematical expectation of the cluster size will be finite.

The independence of the magnitudes of direct ‘descendants’ between themselves and from the parent event x0 is an important property of the model. This allows us to consider the distribution f1(m) as a frequency–magnitude law for the cluster as a whole. This also is important for understanding the choice of the distribution models for |$\nu _\Delta$| and |$V_\Delta$|⁠.

The components of the regular ETAS model are specified as follows:

Note: The exponents (α, β) in the laws of Utsu and Gutenberg–Richter are more convenient (Saichev et al. 2005) to use with the base ‘e’ instead of the usual 10.

A theoretical analysis of various aspects of the general ETAS model with the regular components λ(m) and f1(m) is provided in the works of Saichev and co-authors (see Saichev & Sornette 2017, for further details). In particular, they proved the power law behaviour of the cluster size distribution fcl(n)∝n−1 − κ with κ ∈ [0.5, 1] (Saichev et al. 2005; Baró 2020). Note that, according to Zaliapin & Ben-Zion (2016), κ ∈ [1, 2.5] for real data.

In the general case we need to specify the family of ν(m) distributions: F = {f(nm)}. This choice will be reflected in the designation of the model as ETAS(F). In particular, we will set F = P for Poisson, and F = G for Geometric distribution. The component λ(m) = Eν(m) will be common to all models. The relation of λ(m) with the parameter p(m) of the geometric distribution is determined by the equality λ = p/(1 − p), that is
(4)
Recall that the regular ETAS model can be defined in terms of the conditional intensity of events (Hawkes & Oakes 1974):
(5)
where x = (t, g, m) and μ(g) is the rate of background seismicity. The general definition is broader because it allows for any ν(m) distribution.

To provide the G-distribution of Δ-productivity for any Δ, it is natural to consider the G-distribution of ν(m). It follows from the following seemingly obvious but useful statement.

Statement. Let the distribution type F of ν(m0) be F = G or P. Then |$\nu _\Delta (m)$|-distribution will have the same type for m0 ≥ Δ. In both cases the statistics |$\nu _\Delta (m_0)$| and ν(m0) differ only in average values:
(6)
where |$\bar{F}_1(m) = \int _m^\infty f_1(u){\rm d}u$|⁠, |$\bar{F}_1(0) = 1$|⁠. Let a = f1(0) > 0 and |$\lambda _\Delta (m_0)$| be independent of m0 in the area 0 < Δ ≤ m0 < M. Then λ(m) = ceam and |$\bar{F}_1(m)= e^{-am}$|⁠, m ∈ (0, M).

The proof of the statement and all subsequent ones are contained in the  Appendix.

 
Remark 1.

The conditions for the independence of the |$\nu _\Delta (m_0)$|-distribution from m0 are very strict, and therefore, in contrast with Shebalin et al. (2020), this property should be unstable.

 
Remark 2.

The Random Thinning (RT) property. Statement 1 allows the following generalization. Consider the RT(π) operation of a random cluster thinning: RT(π) saves each cluster event independently with probability π, and deletes it with probability 1 − π; the transformed cluster size V is RT(π)V. Given the magnitude independence in the ETAS(F) model, the productivity |$\nu _\Delta$| has the same distribution as RT(π)ν provided that π = P(mm0 − Δ).

In the case F = (P or G), E[RT(π)ν] = πEν and the distribution type of ν and RT(π)ν is identical. We will call these two properties of the discrete distribution the RT property. We can formalize the RT property as follows.

Let Vλ and Vλ(π) = RT(π)Vλ be the cluster sizes, where λ = EVλ. If |$Ez^{V_\lambda }=\varphi (z,\lambda )$|⁠, then by eq. (A1) from Appendix, |$Ez^{V_\lambda (\pi )}=\varphi (1-\pi +\pi z, \lambda )$|⁠, EVλ(π) = λπ. The RT property of Vλ implies
Let for simplicity λ0 ≥ 1. Setting π = Λ/n < 1, λ = n and ψ(n)(z) = φ(1 + z/n, n), we get:
(7)
Due to the analyticity of φ(z, λ) in the disk ∣z∣ < 1, ψ(n)(z) is analytical in Kn = {z: ∣1 + z/n∣ < 1}. Moreover, (7) implies that ψ(n)(z) and ψ(n + 1)(z) are equal in KnKn + 1 = Kn. Hence ψ(n)(z) is an analytical continuation of ψ(1)(z) = φ(1 + z, 1). The function ψ(z) = φ(1 + z, 1) specifies the type of F in the ETAS(F) model for which Statement 1 remains true. For example, ψ(z) = ez and ψ(z) = (1 − z)−1 correspond to the types of Poisson and Geometric distributions, respectively.

Examples of the RT property.

  • The Negative Binomial Distribution (NBD) for whichφ(z, λ) = (1 − λ(z − 1))−τ, τ > 0, holds the RT property. In the case τ = 1 it corresponds to the Geometric distribution.

  • Combining Nindependent clusters withRT property also preserves this property. Formally this follows from eq. (7) and the relation for the cluster characteristics φi(z, λi):
    (8)
    The NBD model was used by Kagan (2014) to describe the distribution of annual earthquake numbers for the CIT catalogue 1932–2001, m ≥ 5.0. Below we will show that the RT property, which NBD possesses, is poorly compatible with the branching cluster structure. However, in the Kagan’s case, the statistics of events occurring worldwide in one year consists mainly of independent weakly grouped events. Such case is closer to the model (eq. 8) with components: φi(z, λi) = 1 + λi(z − 1), λi < 1.
  • A cluster selected randomly with probabilities {ρi > 0} from N clusters withRT property retains RT property. In this case
    (9)
    However, if the cluster sizes distributions are of the same type, that isφi(z, λi) = ψ(λi(z − 1)), then the random cluster usually loses this type provided that λi ≠ const.

    For example, assuming a geometric distribution for both the size of a random cluster and the sizes of its components, (9) gives (1 − w)−1 = ∑i(1 − wpi)−1ρi, w = λ(z − 1), which is possible only when pi = 1. This is also true in general case if φ(z, λi) is N-smooth at z = 1. The considered case is typical for the observed clusters taken as one statistical population. Consequently, empirical distributions of |$\nu _\Delta$| or |$V_\Delta$| obtained over large territories generally do not represent the true types of their distributions. This circumstance makes it difficult to prove the universality of the type of |$\nu _\Delta (m_0)$| distribution.

 
Remark 3.

ETAS* model. To get the geometric distribution of |$\nu _\Delta$| in framework of the ETAS model, Shebalin et al. (2020) used randomization of the parameter λ(m0). They used the fact that a Poisson distribution with an exponentially distributed random parameter is equivalent to a geometric distribution. The resulting model was named ETAS*. Its analogue is the ETAS(G) model in which the average productivity does not depend on the initial magnitude, namely λ(m0) = λ. Because of (6), in this case |$\lambda _\Delta (m_0)$| becomes dependent on m0 for any f1(m) > 0.

The analogue of the ETAS* model may be interesting due to its connection with the classical branching Galton–Watson model (Harris 1963). Both models are identical if we omit the space–time component in the model. This gives us an exact formula for the distribution of the total productivity of any initial event (cluster size not including the initial event):
(10)
where Γ(x) is the Gamma function, and p < 1/2 is the parameter of the geometric distribution (Harris 1963). Formula (10) remains valid in the critical regime when λ = p/(1 − p) = 1 or p = 1/2. The distribution (eq. 10) has the following asymptotic behaviour:
(11)
where C1 = −ln (4pq)  ≥  0. Such asymptotic with the factor n−3/2 is typical for the Galton–Watson models (Harris 1963) and, consequently, for any ETAS(F) model with λ(m) = const.
The generating function of the distribution (eq. 10) is
(12)
where EV = Λ = λ/(1 − λ) = p/(1 − 2p) (see eq. 4). It is obvious that eq. (12) cannot be represented as eq. (7). Therefore, the analogue of ETAS* model has the geometric distribution for the Δ-productivity |$\nu _\Delta$|⁠, but does not have any RT property for the total Δ-productivity |$V_\Delta$|⁠.

3 THE TOTAL Δ -PRODUCTIVITY

We will consider the ETAS(F) model in a subcritical regime: Λ1 = Eλ(m) < 1. Eν(m) = λ(m) is a non-decreasing, finite, positive function; the other components, namely f1(m), f2(t) and f3(gm), are free from additional restrictions.

As above m0 > 0 is the magnitude of an initial cluster event and |$V_\Delta (m_0)$| is the total Δ-productivity of m0. Any initial event can be fixed or random, namely it has the f1(m) distribution. A further distinction concerns the dominant initial events, that is those initial events having maximum magnitude within their cluster. The case of a random m0 is important because identifying the initial cluster event is an unstable practical procedure.

Statement 2. Consider the specified ETAS(F) model. Assume that all moments of F-distribution are finite and F has the RT property.

  • Let |$V_\Delta (m_0)$|⁠, m0 > Δ > 0, be the size of the cluster with any condition C: the initial magnitude m0 is fixed or random. Then the number |$N_{V_\Delta }$| of finite moments of |$V_\Delta (m_0)$| is independent of C. Moreover,
  • Let F be Poisson or Geometric distribution and m0 is fixed and dominant. Then |$N_{V_\Delta }=\infty$| regardless of the Nλ-value. This is also true for the random dominant m0-event if additionally
    (13)
 
Remark 4.

Statement 2 allows us to judge on the tail attenuation of the |$V_\Delta$| distribution because, by the Chebyshev–Markov inequality, P(ξu) ≤ Eξn/un for any random variable ξ ≥ 0 and u > 0. Distributions having all moments will be called distributions with light tails. Otherwise, we will talk about heavy tails.

Let’s consider clusters with fixed or random initial magnitude. The conditions of the Statement 2 are easily verified for the regular ETAS components: λ(m) = λeαm, α > 0 and f1(m) = βe−βm, β > α. In this case we have Nλ < β/α. This means that the total Δ-productivity distribution has only a heavy tail for small β/α. The data of Zaliapin & Ben-Zion (2016) demonstrate the possibilities of this kind. In the considered case Saichev et al. (2005) give more information on the tail behaviour of |$V_\Delta$| distribution: if β/α is large, then the tail is subexponential and looks like eq. (11).

The case α = 0 corresponds to the analogue of the ETAS* model; here Nλ = ∞ in accordance with eq. (10).

Given αβ, condition (13) is satisfied. This guarantees the light tails in the distribution of the total Δ-productivity of main shocks. All the observations mentioned above support this fact.

 
Remark 5.

The difference in tail behaviour of |$V_\Delta$| statistics can be partially explained as follows. The total Δ-productivity of the main shock with a fixed magnitude m0 is based on events in the finite range of magnitudes (0, m0). In this case, the frequency–magnitude law f1(m) is replaced by its truncated counterpart in the interval (0, m0), where all moments of λ(m) are finite. Since |$N_{V_\Delta }=N_\lambda$|⁠, the tail of the |$V_\Delta$| distribution becomes typical for the case Nλ = ∞ that is quasi exponential. The similar effect for a random dominant magnitude is not so obvious and is a non-trivial analytical fact.

Statement 3. (1) Let m0 be a dominant initial magnitude in the ETAS(G) cluster and let f1(m) be strictly positive. Then the statistics |$V_\Delta (m_0)$| can have a geometric distribution for no more than one value Δ ∈ (0, m0).

(2) Let ν and V be the sizes of clusters of direct descendants and all descendants, respectively, in the ETAS(F) cluster with the random initial event. Assume that both distributions of ν and V have the RT property, as well as Eν2 < ∞ and ∫λ2(m)f1(m)dm < ∞. Then the distribution types of ν and V cannot be the same.

 
Remark 6.

According to Statement 3 we can say that, in the framework of generalized ETAS model, the geometric laws for |$\nu _\Delta$| and |$V_\Delta$|⁠, including the case of aftershocks, are incompatible. In the ETAS(G) model case with λ(m) = const., the geometric law of |$\nu _\Delta$| completely excludes the RT property for |$V_\Delta$| (see eq. 12). The reason for this lies in the branching structure of the cluster.

Numerical examples

To support the theoretical conclusions, we simulated a synthetic earthquake catalogue from the regular ETAS(P) model. To mimic the features of real seismicity, the set of ETAS parameters was mainly derived from the analysis of the earthquake catalogue of Northeastern Italy (Benali et al. 2020). The simulation region also corresponds to that area, namely: Lon 11.5–14.0 and Lat 45.5–47.0.

The model was based on the following parameters:

  1. The uniform background seismicity with the rate μ = 0.04 events (degree2·day)–1, which roughly corresponds to the rate of background events identified in the earthquake catalogue of Northeastern Italy. The uniformity makes it easier the declustering process, as it allows to better identify the clustered component.

  2. Magnitude range m ≥ 2; the Gutenberg–Richter parameter β = 2.07, which corresponds to the b-value 0.9.

  3. The Utsu Law parameters: λ = 0.22 (event/day), α = 1.54. The estimated λ = 0.67 for Northeastern Italy was reduced to 0.22 to satisfy the stability conditions Λ1 = Eλ(m) < 1.

  4. The Omori law parameters: c = 0.015 (day), p = 1.037. The estimated p parameter very close to 1 implies an extremely long tail of f2(t), so that offspring events may span several thousands years (Harte 2013); this practically hinders the analysis of cluster size for synthetic catalogues. However, we recall that, from the theoretical viewpoint, the temporal component of the ETAS model does not affect cluster size. Therefore, in order to prevent this inconvenience, instead of increasing the p parameter, we considered a truncated distribution f2(t) on the support t < 0.7 yr, which corresponds to the 95 per cent quantile of the cluster lifetime for Northeastern Italy.

  5. The space scattering law parameters: d = 0.0085 (degree), q = 2.25, γ = 0.62.

The synthetic earthquake catalogue, consisting of 143 568 events, spans about 500 yr and includes information about the links between each event and its direct descendants. The synthetic time-series were processed by the nearest-neighbour method (Zaliapin & Ben-Zion 2013) to identify clusters of events. This method allows partitioning earthquakes into background and clustered components, based on nearest-neighbour distances between earthquakes in the space–time–magnitude domain (Baiesi & Paczuski 2004; Zaliapin et al. 2008). In this study, the parameters necessary for the computation of nearest-neighbour distances, namely the b-value and the fractal dimension of epicentres df, were set according to the values estimated from the analysis of Northeastern Italy catalogue (Benali et al. 2020, and references therein): b-value = 0.9 and df = 1.1. The threshold of the nearest-neighbour distance, which separates the clustered and background components, was automatically set (log10η0 = −4.2) following a criterion based on a 1-D Gaussian mixture model with two modes, where the threshold is the maximum likelihood boundary between the two modes (Zaliapin & Ben-Zion 2013). The total Δ-productivity distribution was examined considering the clusters extracted from the synthetic earthquake catalogue.

Fig. 1 represents the total Δ-productivity distributions (Δ = 2) (a) for arbitrary initial events and (b) for main shocks, as identified by nearest-neighbour method applied to the synthetic ETAS(P) seismicity. In the first case, the distribution |$V_\Delta$| demonstrates a heavy tail (requiring the use of logarithmic values in the figure), which is also confirmed by the coefficient of variation |$\sigma (V_\Delta )/EV_\Delta \approx 10.8$|⁠. On the contrary, the distribution |$V_\Delta$| for main shocks (Fig. 1b) demonstrates the light tail behaviour; for comparison with the previous case |$\sigma (V_\Delta )/EV_\Delta \approx 0.7$|⁠.

Clusters identified by nearest-neighbour method applied to the synthetic ETAS(P) seismicity: The $V_\Delta$ distributions, Δ = 2, for (a) arbitrary initial events, m0 ≥ 4 and (b) main shocks, m0 ≥ 4.
Figure 1.

Clusters identified by nearest-neighbour method applied to the synthetic ETAS(P) seismicity: The |$V_\Delta$| distributions, Δ = 2, for (a) arbitrary initial events, m0 ≥ 4 and (b) main shocks, m0 ≥ 4.

Fig. 2 shows similar results based on the clusters originally defined by the branching structure of the synthetic ETAS(P) catalogue. The coefficients of variation are also consistent with previous results: (a) |$\sigma (V_\Delta )/EV_\Delta \approx 9.7$| for arbitrary initial events and (b) |$\sigma (V_\Delta )/EV_\Delta \approx 0.8$| for main shocks.

Clusters in the synthetic ETAS(P) seismicity: The $V_\Delta$ distributions, Δ = 2, for (a) arbitrary initial events, m0 ≥ 4 and (b) main shocks, m0 ≥ 4.
Figure 2.

Clusters in the synthetic ETAS(P) seismicity: The |$V_\Delta$| distributions, Δ = 2, for (a) arbitrary initial events, m0 ≥ 4 and (b) main shocks, m0 ≥ 4.

A good agreement of the empirical data Figs 1 and 2 demonstrates the robustness of conclusions about the tail behaviour of |$V_\Delta$| distribution.

4 CONCLUSION

The concept of the Δ-productivity |$\nu _\Delta$| is associated with a priori ideas about the structure of a cluster of seismic events in the form of a random branching tree. Such a representation is not the only possible one. Therefore, the Geometric distribution of |$\nu _\Delta$| can be considered as a useful alternative within the framework of the generalized ETAS(F) model. The other characteristic of the Δ-productivity, namely |$V_\Delta$| for main shocks, is more objective and stable.

For large class of the ETAS(F) models, we answered the question when the total Δ-productivity distribution has light tails for main shocks and heavy ones for arbitrary events. The real data demonstrate the possibilities of this kind.

Note the difficulties of substantiating the mentioned geometric laws for |$\nu _\Delta$| and aftershock’s |$V_\Delta$|⁠. They are associated both with the declustering of seismicity and with the effect of averaging the distributions of |$\nu _\Delta$| and |$V_\Delta$| when grouping data.

The first difficulty can be judged by experiments of Zaliapin & Ben-Zion (2013) with the nearest-neighbour method. This method adapted to the ETAS structure has 40 per cent error of incorrectly determining the parent of an event. The stability of the conclusions in this case may depend on the strict similarity in the real hierarchical structure of the seismic cluster.

The second difficulty is expressed in the fact that the mixture of geometric distributions retains the original property of monotonicity, but ceases to be geometric, that is loses the original G-type. This is unavoidable if the distribution parameter is dependent on location or the magnitude of the initial event (see eq. 6). Therefore, we can speak more confidently about the light tail of the |$V_\Delta$| distribution for the main shocks.

Finally, the theoretical result for ETAS(F) model shows that the types of distributions of |$\nu _\Delta$| and |$V_\Delta$| cannot be the same due to the branching structures of the clusters.

ACKNOWLEDGEMENTS

We are grateful to Prof J. Zhuang and another anonymous reviewer for their careful reading of the manuscript and comments that allowed us to significantly revise and improve the text.

DATA AVAILABILITY

The data (synthetic earthquake catalogue) underlying this paper will be shared on reasonable request to the corresponding author.

REFERENCES

Baiesi
M.
,
Paczuski
M.
,
2004
.
Scale-free networks of earthquakes and aftershocks
,
Phys. Rev. E
,
69
,
doi:10.1103/PhysRevE.69.066106
.

Baranov
S.
,
Narteau
C.
,
Shebalin
P.
,
2022
.
Modeling and prediction of aftershock activity
,
Surv. Geophys.
,
doi:10.1007/s10712-022-09698-0
.doi:

Baranov
S.V.
,
Shebalin
P.N.
,
2019
.
Laws of the Post-Seismic Process and the Forecast of Strong Aftershocks
, pp.
218
,
RAS
.

Baró
J.
,
2020
.
Topological properties of epidemic aftershock processes
,
J. geophys. Res.
,
125
(
5
),
e2019JB018530
,
doi:10.1029/2019JB018530
.doi:

Benali
A.
,
Peresan
A.
,
Varini
E.
,
Talbi
A.
,
2020
.
Modelling background seismicity components identified by nearest neighbour and stochastic declustering approaches: the case of Northeastern Italy
,
Stoch. Environ. Res. Risk. Assess.
,
34
,
775
791
.

Harris
T.E.
,
1963
.
The Theory of Branching Processes
,
Springer-Verlag
.

Harte
D.S.
,
2013
.
Bias in fitting the ETAS model: a case study based on New Zealand seismicity
,
Geophys. J. Int.
,
192
(
1
),
390
412
.

Hawkes
A.G.
,
Oakes
D.
,
1974
.
A cluster process representation of a self-exciting process
,
J. Appl. Probab.
,
11
(
3
),
493
503
.

Kagan
Y.Y.
,
2010
.
Statistical distributions of earthquake numbers: consequence of branching process
,
Geophys. J. Int.
,
180
(
3
),
1313
1328
.

Kagan
Y.Y.
,
2014
.
Earthquakes: Models, Statistics, Testable Forecasts
, pp.
283
,
AGU, Wiley
.

Ogata
Y.
,
1998
.
Space-time point-process models for earthquake occurrences
,
Ann. Inst. Stat. Math.
,
50
,
379
402
.

Saichev
A.
,
Sornette
D.
,
2017
.
Statistics of seismic cluster durations, preprint
, (
arXiv:1708.06970
).

Saichev
A.
,
Helmstetter
A.
,
Sornette
D.
,
2005
.
Power-law distributions of offspring and generation numbers in branching models of earthquake triggering
,
Pure appl. Geophys.
,
162
,
1113
1134
.

Shebalin
P.N.
,
Baranov
S.V.
,
2021
.
Statistical laws of post-seismic activity
, in
Statistical Methods and Modeling of Seismogenesis
,
Chapter 3
, pp.
63
103
., eds
Limnios
N.
,
Papadimitriou
E.
,
Tsaklidis
G.
,
John Wiley & Sons, Ltd
.

Shebalin
P.N.
,
Baranov
S.
,
Dzeboev
B.A.
,
2018
.
The law of the repeatability of the number of aftershocks
,
Doklady Akademii Nauk
,
481
,
963
966
.
doi: 10.1134/S1028334X18070280

Shebalin
P.N.
,
Narteau
C.
,
Baranov
S.V.
,
2020
.
Earthquake productivity law
,
Geophys. J. Int.
,
222
(
2
),
1264
1269
.

Soloviev
S.L.
,
Solovieva
O.N.
,
1962
.
Exponential distribution of the total number of subsequent earthquakes and decreasing with the depth of its average value
,
Izv. AN USSR (geophys.)
,
12
,
1685
1694
.

Zaliapin
I.
,
Ben-Zion
Y.
,
2013
.
Earthquake clusters in southern California I: Identification and stability
,
J. geophys. Res.
,
118
(
6
),
2847
2864
.

Zaliapin
I.
,
Ben-Zion
Y.
,
2016
.
A global classification and characterization of earthquake clusters
,
Geophys. J. Int.
,
207
(
1
),
608
634
.

Zaliapin
I.
,
Gabrielov
A.
,
Keilis-Borok
V.
,
Wong
H.
,
2008
.
Clustering analysis of seismicity and aftershock identification
,
Phys. Rev. Lett.
,
101
,
doi:10.1103/PhysRevLett.101.018501
.

APPENDIX

A1 Proof of Statement 1

Let |$E\left(z^{\nu (m_0)}\mid m_0\right)=\varphi (z, \lambda (m_0))$| be the generating function of the ν(m0) with the characteristic λ(m0) = Eν(m0). If {mi: i = 1, …, ν(m0)} are magnitudes of the direct descendants of the m0 event, then the |$\nu _\Delta (m_0)$| has the same distribution as |$\sum _{i=1}^{\nu (m_0)} \varepsilon (m_i)$|⁠, where |$\varepsilon (m)=\mathbf {1}_{m\ge m_0-\Delta }$| is the logical 1-0 function. Since {mi} are independent,
(A1)
where |$\bar{\pi }_{m-\Delta }=P(m\ge m_0-\Delta \mid m_0\ge \Delta )$| and |$\pi _m=1-\bar{\pi }_m$|⁠.
Recall that φ(z, λ(m)) = (1 − p(m))/(1 − p(m)z) = [1 − λ(m)(z − 1)]−1 for F = G and φ(z, λ(m)) = exp(λ(m)(z − 1)) for F = P. Substituting any of the functions φ(z, λ(m)) in the right-hand part of eq. (A1), we get the same function with the replacement of λ(m0) by
(A2)
Assume that the right-hand part of eq. (A2) is independent of m0 in the interval Δ ≤ m0 < M. Then λ(m) and F1(m) are differentiable functions simultaneously. Since |$\lambda _\Delta (m_0)={\rm const}$| for m0 ≥ Δ, we have
Here we used condition |$\bar{F}_1(0)=1$|⁠. Solving the equation |$\dot{\lambda }(\Delta )-\lambda (\Delta )f_1(0)=0$| in the interval 0 < Δ ≤ m0, we get λ(m) = λeam, m ∈ (0, M), where a = f1(0). Substitute in eq. (A2) λ(m) = λeam, |$\lambda _\Delta (m_0)=C(\Delta )$| and Δ = 0, to get |$\bar{F}_1(m)=C(0)\lambda ^{-1}e^{-am}$|⁠. Since |$\bar{F}_1(0)=1$|⁠, we have |$\bar{F}_1(m)=e^{-am}$|⁠.

A2 Proof of the statement 2

A2.1 General remarks.

We will need the following notation: NM(m) is the total number of events of magnitude ≥M triggered by an event of magnitude m. In particular, the total Δ-productivity |$V_\Delta (m_0)$| of the initial event with magnitude m0 is |$N_{M(m_0)}(m_0)$|⁠, M(m) = (m − Δ)+, where a+ = max(a, 0). The notation E(*∣m) denotes the conditional mean given m.

The spatio-temporal components of the ETAS model do not affect cluster size, since the law of generating new events depends only on the parent magnitude, and the problem of declustering does not arise in our theoretical analysis. Therefore, the spatio-temporal components in theoretical analysis can be any and the distribution of NM(m) is the same for any x = (t, g, m).

By definition, if {mi:  i = 1, …, ν(m0)} are offspring magnitudes of m0, then {NM(mi): i = 1, …, ν(m0)} are independent for a given m0. Moreover, they are identically distributed together with NM(m0) since the magnitudes mi, i ≥ 0, have a common distribution f1(m). Let us consider a new variable
(A3)
where the symbol |$\mathbf {1}_{a\ge b}$| denotes the logical function 1-0. Obviously, eq. (A3) is nothing more than counting events with magnitude ≥M in the cluster through first-generation events and its triggered events of magnitude ≥M. The component |$\mathbf {1}_{m_i\ge M}$| takes into account the ith event of the first generation if its value is greater than M. Therefore NM(m0) and |$\tilde{N}_M(m_0)$| are identically distributed, that is |$N_M(m_0)\overset{law}{=}\tilde{N}_M(m_0)$|⁠.
The moments |$\mathbf {EV_\Delta ^n}$|⁠.
We will use the notation:
Mean value: |$\mathbf {EV_\Delta }$|⁠.
Averaging eq. (A3) given m0 = m, we have
After averaging over m, we obtain the equation for |$b_M^{(1)}$|⁠:
For finite |$b_M^{(1)}$|⁠, this equality can be true if Λ1 < 1. Hence
(A4)
Since |$V_\Delta =N_{M(m)}(m)$|⁠, M(m) = (m − Δ)+, we have
that is, |$V_\Delta$| has finite mean value in conditional and unconditional situations if Λ1 < 1.
The case |$\mathbf {n\gt 1}$|⁠.

 

(a) Fixed/random initial |$\mathbf {m_0}$|
It is easy to see that for any λ(m) ≥ 0 there exists K ≤ ∞ such that Λk = Eλk(m) is finite for kK and is infinite for k > K. Let us show that there are finite constants Bk such that
(A5)
We have shown (see above) that eq. (A5) is true for k = 1. We will prove eq. (A3) by the method of induction on k. Assume that all values in eq. (A5) are finite for kn − 1. Putting |$\xi _i=N_M(m_i)+\mathbf {1}_{m_i\ge M}$| in eq. (A3), one has
(A6)
where |$R_M^{(n)}$| represents the secondary component of |$N_M^{n}(m)$|⁠. The number of terms in the first sum is ηn = ν(m)(ν(m) − 1)…(ν(m) − n + 1). By the RT property of the F-distribution, Ezν(m) = φ(z, λ(m)) = ψ(λ(m)(z − 1)), where φ(z, λ(m)) has all derivatives at z = 1 because all moments of ν(m) are finite. Hence
(A7)
where |$a_n(F)=\left(\frac{\rm d}{{\rm d}z}\right)^n \left.\psi (z)\right|_{z=0}\lt \infty$|⁠. Therefore
Similarly
(A8)
where
and
Since for any random variable ξ ≥ 0:
we get from eq. (A8):
(A9)
where Cn is some finite constant. The expectation of eq. (A6), given m, is
(A10)
where |$\delta _M^{(n)}=\tilde{b}_M^{(n)}-b_M^{(n)}$| is a secondary term of |$\tilde{b}_M^{(n)}$|⁠. By eq. (A8),
(A11)
Averaging eq. (A10) over m, we get
(A12)
The relations (A11)–(A12) show that |$b_M^{(n)}$| is finite if and only if Λn < ∞. Similarly, eqs (A10)–(A11) show that |$b_M^{(n)}(m)\lt \infty$| if and only if |$b_M^{(n)}\lt \infty$|⁠. Hence, at the nth step of induction, these three quantities are finite simultaneously, that is eq. (A5) holds. By setting M = (m − Δ)+, we get |$E(V_\Delta ^n \mid m)=b_{(m-\Delta )_+}^{(n)}(m)\lt \infty$| only when Λn < ∞.It remains to consider |$E(V_\Delta ^n)$|⁠. By eqs (A10) and(A12),
Therefore |$E(V_\Delta ^n)=\infty$| if Λn is infinite.Let Λn < ∞. Using (A10) again, we have
Due to eq. (A9), we have |$E\left(R_{M(m)}^{(n)}\mid m\right)\le \tilde{C}_n~\max(1,\Lambda _1,\dots ,\Lambda _{n-1})$|⁠; by eq. (A9), |$E\lambda (m)\delta _{M(m)}^{(n)} \le D_n\Lambda _1$|⁠. Finally, |$E\lambda (m)b_{M(m)}^{(n)}\le B_n\Lambda _1$|⁠, because we have proved (A3) for index n. Hence, |$E(V_\Delta ^n)\lt \infty$| if Λn < ∞. The proof is complete.
(b) Fixed dominant |$\mathbf {m_0}$|

Let us consider the ETAS cluster with a dominant initial magnitude m0. Let |$\hat{N}_{0M}(m)$| be the number of events of magnitude >M in the cluster with the initial event mm0 in which all events have magnitude <m0. In such cluster, direct descendants of any initial magnitude m are realized under the condition Ω(m) = {mi < m0: i = 1, …, ν(m0)}.

In term of the generating function Ezν(m) = φ(z, λ(m)), the probability of Ω(m) is
(A13)
Therefore, the conditional probability density of the direct descendants is
where
(A14)
(A15)
Substituting Geometric (eq. 1) and Poisson (eq. 2) distributions here, we get the distributions of the same type
(A16)
where
(A17)
Therefore, given the dominance of the initial magnitude m0, we are again dealing with an ETAS model in which the magnitude m is transformed into |$\hat{m}$| with a distribution (eq. A14), and the productivity ν(m) into |$\hat{\nu }(\hat{m})$| with the same distribution (Poisson/Geometric), but with new parameters |$\hat{\lambda }_0(\hat{m})$| as in eq. (A17). Therefore, as above, we will have the following relations:
(A18)
where |$\left\lbrace \hat{N}_{0M}(\hat{m}_i): i=1,\dots , \hat{\nu }(\hat{m})\right\rbrace$| are independent and identical distributed given |$\hat{\nu }(\hat{m})$| and |$\left\lbrace \hat{m}_i: i=1,\dots , \hat{\nu }(\hat{m}) \right\rbrace$|⁠.
As a result
(A19)
Since m0 is a parameter, in the random case of initial event, m0 has distribution |$F_1(m)/\bar{F}_1(\Delta )$|⁠.As we can see, eq. (A18) is a special case of eq. (A3), in which the magnitude range is finite. By eq. (A17), |$\hat{\lambda }_0(\hat{m})\le \lambda (m_0)$| and therefore |$\hat{\lambda }_0(\hat{m})$| has all moments. Consequently, the total Δ-productivity will have all the moments at any fixed m0.
(c) Random dominant |$\mathbf {m_0}$|
We need additional notation:
(A20)
Using eq. (A17) and non-decreasing function λ(m), we have for the P-model
(A21)
The same is true for the G-model. Really, by eq. (A17) we have the following representation:
where
Since λ(m) is a non-decreasing function, ψ(m) ≤ 1 on the interval (0, m0). Therefore, the integrand in square brackets is less than 1, which proves eq. (A21).As above, the relation eq. (A18) gives,
(A22)
Setting |$\xi _i=\hat{N}_{0M}(\hat{m}_i)+\mathbf {1}_{\hat{m}_i\ge M}$| in eq. (A18) and using eq. (A6), we get an analogue of eq. (A10):
(A23)
where the summation is over all integer vectors |$\mathbf {r^{(k)}}=(r_1,\dots ,r_k)$|⁠: r1 + … + rk = n, ri ≥ 1. Averaging eq. (A23) over |$\hat{m}$| we will also have
(A24)
Let’s extract the term |$\hat{\Lambda }_{01} \hat{b}_{0M}^{(n)}$| from |$\hat{\Lambda }_{01} \tilde{b}_{0M}^{(n)}$| and move it to the left-hand part of eq. (A24); then apply eq. (A21) to |$\hat{\Lambda }_{0k}$| and take into account the relation |$\hat{\Lambda }_{01}/(1-\hat{\Lambda }_{01}) \le \Lambda _1/(1-\Lambda _1)$|⁠. Then multiplying eq. (A24) by |$\hat{\lambda }_0(m_0)$|⁠, we get
(A25)
Since |$\tilde{b}_{0M}^{(n)} = E\left( \hat{N}_{0M}(\hat{m}_i) +\mathbf {1}_{\hat{m}_i\ge M}\right)^n$|⁠, the elements of the right-hand part of eq. (A25) can be obtained by summing and multiplying elements of the form: |$\hat{\lambda }_0(m_0)\hat{\pi }_{0M}$| and
(A26)
where |$D_{k,\varepsilon } \le \hat{\lambda }_0(m_0) \hat{b}_{0M}^{(k)}$|⁠.We have assumed that |$\lambda (m)\displaystyle \int _0^\Delta f_1(m-x)dx \le C$|⁠. Hence
and by virtue of eq. (A22)
Using eq. (A25) and doing induction over n, we can conclude that |$\hat{\lambda }_0(m_0) \hat{b} _{0M(m_0)}^{(n)}\le C_n$| for any n. But then, due to eq. (A19), |$EV_\Delta ^n = E\hat{b}_{0M(m_0)}^{(n)}(m_0) \le K_n\lt \infty$| for any n.

The proof is complete.

A3 Proof of the statement 3.1

Let ξ be a random variable with a geometric distribution. Then
(A27)
Consider |$\xi =V_\Delta (m_0)$|⁠, where m0 is the initial dominant magnitude in the ETAS(G) cluster. We will show that the relation (A27) is possible for at most one parameter Δ ∈ (0, m0). For simplicity of notation, we will temporarily proceed from eq. (A3). Recall that
Above we show that
(A28)
According to eq. (A10) for the ETAS(G) model, we have
(A29)
By eq. (A28),
(A30)
Averaging eq. (A29) over m, we can find |$b_M^{(2)}$|⁠; using eq. (A30), we will have
(A31)
If we substitute eq. (A31) into eq. (A29), we get
As a result,
(A32)
The condition (A27) is satisfied only if
(A33)
where |$\Lambda _2-\Lambda _1^2 = \sigma ^2(\lambda )\gt 0$|⁠.The eq. (A33) with respect to M has unique solution if πM is strictly monotone.Now we can return to the ETAS(G) cluster with dominant initial magnitude m0. In this case the main equation is (A18). It is identical to eq. (A10), but requires the following replacement of the characteristics used (see A14 and A17):
(A34)
(A35)
Since |$V_\Delta (m_0)=N_{m_0-\Delta }(m_0)$|⁠, we have to consider eq. (A33) given the substitutions (A34) and (A35) and M = m0 − Δ. Given f1(m) > 0, the function (A34) will be strictly monotonic with respect to Δ. Hence, the relation (A27) is possible for at most one value of Δ = Δ(m0).

The proof is complete.

A4 Proof of the statement 3.2

Let ν(m) and V(m) be the sizes of clusters of direct descendants and all descendants, respectively, associated to an initial event with magnitude m. Let ϕV(z, λ(m)) = EzV(m) and |$\phi _V(z)=\displaystyle \int _0^\infty Ez^{V(m)} f_1(m) dm$| be the conditional and unconditional generating functions of V(m). Due to the RT property of ν(m) distribution, we have
Then by eq. (A3)
and after integration over m we get
(A36)
Since it is assumed that ∫λ2(m)f1(m)dm < ∞ and Eν2 < ∞, it follows EV2 < ∞ from Statement 2. This means that |$\phi _V^{\prime \prime }(1)\lt \infty$|⁠. Set |$\mu =EV=\phi _V^{\prime }(1)$|⁠, |$\Lambda _k=\int _0^\infty \lambda ^k(m) f_1(m) dm$|⁠. Differentiating eq. (A36) at point 1, we get
(A37)
Differentiating eq. (A36) twice at point 1, we get
Given eq. (A37), we have
(A38)
Assume that the distribution types of ν and V are the same. Then ϕV(z) = ψ(μ(z − 1)) and |$\phi _V^{\prime \prime }(1)=\psi ^{\prime \prime }(0)\mu ^2$|⁠. Combining this equality with eq. (A38) we have
(A39)
By definition, all terms in this equality are non-negative, |$\Lambda _2\Lambda _1^{-2}\ge 1$|⁠, and (1 − Λ1)−1 > 1. Therefore, eq. (A39) is contradictory.
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)