Robust Bayesian estimator for S -wave spectra, using a combined empirical Green’s function

We propose a new fully automatic and robust Bayesian method to estimate precise and reliable model parameters describing the observed S -wave spectra. All the spectra associated with each event are modelled jointly, using a t -distribution as likelihood function together with informative prior distributions for increased robustness against outliers and extreme values. The model includes the observed noise and a combined empirical Green’s function. It captures source-, receiver- and path-dependent terms in the description of the observed spectra by combining a physical source and attenuation model with a spatially and event-size dependent empirical compensation. The proposed method propagates estimation uncertainties along the entire processing chain starting from the hypocentre location and delivers reliable uncertainty description of the estimands. The objective is to automatically provide robust and valid de-scriptions of the observed S -wave spectra generated from an earthquake source in a noisy and heterogeneous environment. The efﬁciency of the method is tested with synthetic seismograms, and the model is calibrated and cross-validated using 31 640 mining induced seismic events in a iron ore mine (in north of Sweden) with an comprehensive seismic network. The model is evaluated using both posterior predictive checks and residual analysis and we found no evidence that indicates any model deﬁciencies with respect to central tendency, dispersion and residual trends.

R ijk Model residual.
V Expected S-wave velocity between jth sensor andp.
r j Expected distance between jth sensor andp.
t j Expected travel time between jth sensor andp.
K j Set of event indices triggered by the jth sensor in the calibration data set.
ν Normality parameter, parameter controlling the tail width in a t-distribution.

I N T RO D U C T I O N
Seismic source parameters like seismic moment, seismic energy, source radius and stress drop are important parameters for understanding the physics of earthquakes (Prieto et al. 2007;Van Houtte & Denolle 2018). Additionally, these parameters are commonly used in subsequent analysis, for example in hazard assessments (Cornell 1968;Wesseloo 2018) or aftershock analysis (Kozlowska et al. 2020), but their estimation uncertainties and systematic deviations are often overlooked. For instance, in sequential analysis using cumulative measures (see e.g. Thelen et al. 2010;Nordström et al. 2020) there is a risk of aggravating the biases as a result of the accumulation of systematic errors. For induced seismicity, like mining, the seismic waves are significantly influenced by heterogeneities along the propagation path and ambient noises causing uncertain estimates of the source parameters. While this is also valid for tectonic seismicity, it is especially challenging for mining induced seismicity as investigated here in LKAB's underground iron-ore mine located in Kiruna, Sweden (see Fig. 1 for a horizontal overview). In this mine, the rock mass contains large heterogeneities relative to the lengths of the ray paths between the events and the sensors (e.g. extracted rock mass, zones of fractured rock, large geological and stress variations as explained in Sjöberg et al. 2012;Vatcher et al. 2014;Nordqvist & Wimmer 2016;Vatcher 2017;Svartsjaern & Saiang 2017;Troll et al. 2019). Also, the earthquakes are usually small in size (99 per cent with M w < 1) with low signal-to-noise ratio (SNR) and the recorded waveforms are commonly contaminated with disturbances from mining operations (e.g. blasting, scaling, drilling, crushing, mucking, pumps, ventilation and heavy vehicles). Additionally, the sensors are placed Orange triangles represent the sensors in seismic measurement system. The coloured dots represent a subset of events used for model validation (further information in Section 3) and are coloured according to the estimated event size 0 . The histograms in subfigure to the right-hand side shows the depth (z) distribution of the events and sensors in magenta and orange, respectively. within active event clusters as shown in Fig. 1 where multiple sensors may end up within the source radius, causing near-field effects like complex waveforms, flat spectra and resonans patterns (see e.g Mikumo & Miyatake 1978;Vidale et al. 1995;Legrand & Delouis 1999;Yamada & Mori 2009). Therefore we claim that it is of equal importance to estimate the corresponding uncertainties as the source parameter values themselves. If we do not have any confidence in a value, that value is meaningless, and may impair subsequent analysis and conclusions. For example, estimates of stress drop may vary several orders of magnitude between individual studies (as discussed in Abercrombie 2015; Abercrombie et al. 2016) and if used for estimation of strong ground motions (see e.g. Oth et al. 2017) for hazard assessments it may have severe consequences. Especially, if the estimates lack confidence or the sequential models do not account for it. Source parameters can be estimated by fitting the source model to attenuation-corrected spectra (see e.g. Oye et al. 2005;Shearer et al. 2006) or by fitting the observed spectra where the model describes the attenuation (see e.g. Ide et al. 2003;Supino et al. 2019). The benefit with attenuation-corrected spectra is that they can be stacked together to estimate a mean spectrum for every sensor and therefore partly compensates for radiation-pattern effects (Oye et al. 2005). It is a computationally fast method that results in a smooth sourcespectra, but it relies on removing the frequency coefficients that are under the influence of noise to obtain good model fit. Additionally, stacking based on sample mean is also sensitive to outliers (see e.g. Gelman et al. 2004;Martinsson 2012;Kruschke 2014).
Challenges such as mining disturbances, extreme values and outliers, associated with arrival times obtained from automatic location procedures (see e.g. Martinsson 2012) in heterogeneous environments increase the demand on robust automatic source parameter estimation with respect to uncertainties and outliers in the input data (Ross & Ben-Zion 2016). Estimates of the uncertainty can be obtained in different ways. For example, it can be estimated by: taking the standard deviation of the estimates for each spectrum (Caprio et al. 2011); removing one sensor at a time using the jackknife method (Prieto et al. 2007); taking the variance weighted uncertainly (Supino et al. 2019) where the variance for each spectrum are obtained using Gaussian assumptions and estimated with Markov Downloaded from https://academic.oup.com/gji/article/227/1/403/6273640 by guest on 15 July 2021 chain Monte Carlo (MCMC) methods (Gelman & Hill 2006;Kruschke 2014); evaluating the skewness of the misfit function (Denolle & Shearer 2016).
Instead of iterative methods, we rely on the benefits provided by a joint estimation procedure (see e.g. Gelman et al. 2004;Kruschke 2014;Martinsson & Törnman 2019) and propose a MCMC estimation method similar as Supino et al. (2019) but here the observed spectra Y ij are modelled jointly for the ith frequency coefficient in the jth site. A valid description of all the observations are an important part of model validation (Pintelton & Schoukens 2001;Gelman et al. 2004;Kruschke 2014;Martinsson & Törnman 2019). If a model is not able to properly describe all the observations, in terms of central tendency and dispersion, then inference obtained using the model can not be trusted. For robust methods, this description must also include disturbances, outliers and extreme values. As opposed to using a Gaussian distribution for the likelihood (Supino et al. 2019), we propose instead a Student's t-distribution due to its robustness against outliers and extreme values (Gelman et al. 2004;Martinsson 2012;Kruschke 2014). For example, in Martinsson (2012) it has been shown that the Gaussian distribution in particular, performs poorly and is very sensitive to a single outlier value when estimating the hypocentre. Using it in presence of possible outliers, will result in biased estimates and unrealistic uncertainty regions; favouring heavy-tailed distributions. The t-distribution is symmetric and bellshaped like the normal distribution but has heavier tails. The tail mass is controlled with an additional parameter ν called the degrees of freedom when used in the Student's t-test or the normality parameter when used in the proposed model to increases the robustness against outliers (see e.g. Gelman et al. 2004;Kruschke 2014). When ν = 1 the t-distribution takes the form of the Cauchy distribution and when ν > 30 it is similar to the Normal distribution.
To obtain realistic estimates of the source parameters, the model parameters need to be restricted to physically reasonable values (e.g. they need to be larger then zero). To obtain reliable estimates with corresponding uncertainties, the method needs to capture the cumulative uncertainties along the processing chain. We propose a fully automatic Bayesian method as in Pugh et al. (2016) that both restricts the estimates with prior information as in Martinsson (2012); Stähler & Sigloch (2014) and includes the estimated uncertainties prior to source inversion. For instance, we include the estimated uncertainties in the origin time σ t 0 and the source receiver distance σ r , both obtained from the hypocentre estimation algorithm (described in Martinsson 2012).
All these properties described above (e.g. robustness against outliers and extreme values, informative prior distributions to regularize the solution, uncertainty description of the input covariates for proper weighting) are needed to meet our objective: to derive a fully automated and robust method that provides a valid description of the observed S-wave spectra of mining induced events. A valid description needs to capture: the variations in the propagation medium, the variations at the source and the influence of temporal noises. We will introduce these more in the next section.
The rest of the manuscript is structured in the following way. In Section 2, we introduce the attenuation and source model considered here, together with a description of the background noise. Section 3 is devoted to the mine site, the measurement system and the data. A short introduction to Bayesian inference is given in Section 4 followed by a more detailed presentation of the Bayesian model in Section 5. The stepwise estimation procedure is described in Section 6 and the inferential results are presented in Section 7 followed by discussion and conclusion in Sections 8 and 9, respectively.

THEORY
For earthquakes, the source, the paths and site effects are challenging to distinguish into separate terms from the observations since the effect of each contribution is jointly connected. On the other hand, the additive noise characteristics N can be estimated directly from the recorded wave forms prior to the P-wave onset. From a modelling perspective, we need to describe the observed spectra Y using commonly accepted models describing the source S and the attenuation A. These models usually are an oversimplification, which leads to uncaptured behaviour embedded in the residuals. If these behaviours are systematic for certain ray paths, they can be estimated from training data by introducing an empirical compensation term E. The origin of E will remain unknown as Shearer et al. (2019) stated, and we can for simplicity assume that E describes site effects at the source (as in Shearer et al. 2006) or describes the propagation channel as in this study. However, due to the commutative property of these terms in the model Y = S(E)A + N, the result is the same regardless if the empirical compensation term E is devoted to the source, the propagation channel or a mix of both. In the following three subsections we will describe the source S, the attenuation A and the noise N more thoroughly.

Attenuation
The attenuation can be modeled in many ways and here we use a simple model A that involves intrinsic attenuation and geometric spreading and is defined as where the quality factor Q depends on the medium and usually increases with depth (Patanè et al. 1994). At our shallow depths we expect Q < 300 (O'Neill & Healy 1973;Johnson & McEvilly 1974;Wang 2004;Oye et al. 2005;Ge et al. 2009). The geometric spreading parameter β describes the decrease in spectral level when the wave propagates away from a source. For far-field body-waves in homogeneous medium the wavefront is spherical with β = 1 in comparison to surface waves where the wavefront is cylindrical with β = 0.5. For heterogeneous medium (Drouet et al. 2011) expect β > 1 due to scattering and reflections. Some empirical estimates of β > 1 are observed for example by Zhu et al. (1991), Atkinson (2004), Akinci et al. (2006), Edwards et al. (2011) and Drouet et al. (2011).
There exist more complex attenuation models for example where Q and β are frequency dependent (see e.g. Zhu et al. 1991) or Q increase with travel time (see e.g. Wang 2004). However, the mining environment is extremely heterogeneous (Martinsson 2012;Martinsson & Törnman 2019), containing excavated volumes, complex geology and it is continuously changing due to continuous mining (see e.g. Vatcher et al. 2015Vatcher et al. , 2018Nordqvist & Wimmer 2016;Troll et al. 2019). This changing environment causes frequencydependent spatiotemporal attenuation effects on the seismic waves. These complex effects are difficult to model but can be empirically estimated using traditional empirical Green's function EGF (Mueller 1985), multiple empirical Green's function MEGF (Hough 1997) or combined empirical Green's function CEGF (Shearer et al. 2006).
Traditional EGF and MEGF use smaller events in the proximity of the source with similar source time function to estimate the impulse response, assuming that the interevent-distance is much smaller than the event-receiver distance. The reason for small events is that Downloaded from https://academic.oup.com/gji/article/227/1/403/6273640 by guest on 15 July 2021 the source time function is shorter, generating a flatter frequency response that hopefully spans the frequencies of larger events. The CEGF proposed by Shearer et al. (2006) uses spatially homogeneous attenuation (i.e. the path effects are a simple function of travel time) together with an empirical correction for the unmodelled behaviours in the EGF events. The CEGF works for events spanning a broad range of magnitudes (Shearer et al. 2006), not limited to events with one or two magnitude units greater than the smallest EGF events (calibration events) as traditional EGF or MEGF, where the smallest events must be discarded.
In Shearer et al. (2006Shearer et al. ( , 2019, the empirical correction is based on stacking of a large number of EGF events, resulting in a smooth correction spectra, where each EGF event has equal impact due to stacking. While in Abercrombie (2015), Abercrombie et al. (2016) and Ruhl et al. (2017) the empirical correction is based only on EGF events in the proximity of the source (i.e. within distance related to the source radius) and that mimics the target event (i.e. with cross-correlation above a threshold). Abercrombie (2015) observed that, including distant or dissimilar EGF events resulted in biased estimates compared to using only similar EGF events in the proximity of the source. Shearer et al. (2019) observed that stacking of a large number of EGF events resulted in more consistent estimates for densely clustered earthquakes compared to traditional EGF approaches.
Here we propose a new CEGF approach that uses all EGF events, resulting in a smooth compensation spectra as in Shearer et al. (2006Shearer et al. ( , 2019, but it also favours similar (i.e. in size) and nearby EGF events as in Abercrombie (2015), Abercrombie et al. (2016) and Ruhl et al. (2017).
However, our method does not truncate calibration data with respect to size, correlation or distance. Instead, we interpolate with respect to both size and distance, where calibration events near the source and with similar size will have a larger impact in the estimate of the Green's function G. We investigate appropriate interpolation of the EGF events based on residual analysis using cross-validation and simply propose one that provides the best description of our observations at hand, in terms of central tendency and dispersion. Additionally, we also estimate the underlying uncertainties σ 2 E for the empirical compensation spectra E, which is part of G. The frequency-and sensor dependent uncertainty σ 2 E is used in the Bayesian inference for proper weighting.

Source model
Two commonly used seismic source models are Brune (1970) and Boatwright (1980), both built for circular cracks and based on farfield assumptions. The Brune model has a broader corner compared to the Boatwright model and this causes larger uncertainties in general for the estimates of the frequency that defines the corner (i.e. corner frequency). However, the broader corner may allow for more realistic estimates of the corner frequency near the maximum frequency, as observed by Ruhl et al. (2017), since the broader corner allows for more use of the curvature in spectra. There exist more complex source models that contain an additional source dimension like rectangular or elliptical rupture geometries (see e.g. Savage 1972; Denolle & Shearer 2016), or models that include directional effects (Kaneko & Shearer 2015;Abercrombie et al. 2017). Here we use a simple Brune-type source model expressed in velocity since it is the format of the measured data, with the parameters: corner frequency f c , seismic moment M 0 , and the spectral fit parameter n (also known as the high-frequency falloff rate for the displacement spectra). Brune (1970) proposed n = 2 which is the value most commonly used for source models (see e.g. Oye et al. 2005;Shearer et al. 2006). However, there are many cases where n needs to be adjusted to obtain reasonable fit (see e.g. Brune et al. 1986;Iio 1992;Patanè et al. 1994;Abercrombie 1995;Prieto et al. 2007;Zhang et al. 2011;Ross & Ben-Zion 2016;Van Houtte & Denolle 2018). Despite that n is mainly used as a model fit parameter, there are some attempts to define the physical meaning of n. For example, Frankel (1991) suggested that n depends on the strength along the fault zone, Brune et al. (1986) observed a positive correlation to stress drop and Lee et al. (2003) stated that the radiated energy remains finite only if n > 1.5. Here we evaluate two possible options of the source models Model A: Brune-type model with fixed n (i.e. n = 2.31) Model B: Brune-type model with event specific n, given some constraints (i.e. n > 1.5) More information about the value and constraints of n in Model A and Model B is defined in Sections 6.1 and 6.2, respectively.

Background noise
The background noise affects the observed spectra for both induced and natural earthquakes (Zamani & Murrell 1978). The recorded signal y t = s t + n t contains both the received signal s t from the source and the noise n t for time sample t. In a mining environment the characteristics of the noise n t are spatio-temporal (e.g. caused by vibrations from drilling, pumps, fans, traffic, etc.) and may significantly influence the estimated result if n t is not properly dealt with. A typical example of spatio-temporal mining noises is shown in Fig. 2 where the second seismogram contains periodic bursts not related to the seismic event.
The common way to handle noise in estimating S-wave spectra is either to only consider the frequency coefficients where the signalto-noise ratio (SNR) is above a specified threshold (usually set to 2 or 3, Shearer et al. 2006;Oth et al. 2011;Drouet et al. 2011;Denolle & Shearer 2016;Ross & Ben-Zion 2016;Van Houtte & Denolle 2018), or to weight the spectral coefficients by the SNR in the estimation process (Boatwright et al. 1991;Maranó et al. 2017). Truncating above a SNR threshold will neglect certain frequencies or distant sensors despite the fact that they still carry useful information. Also, applying SNR weighting increases the impact of sensors near the source, where part of the spectra may violate the far-field assumption and, additionally, these near-source sensors are more sensitive to location errors. Due to the above, near-source sensors should preferably be downweighted in the estimation process, especially those installed close to or inside the estimated source radius.
Estimation of the source model parameters is commonly done in the frequency domain, for example using Brune's model. A frequency-domain representation is often favourable from an estimation and a modelling point of view, providing not only a simple mapping of models between continuous-and discrete-domain using the discrete Fourier transform (DFT, Pintelton & Schoukens 2001). The DFT also has beneficial asymptotic statistical properties of the measurement noise (Brillinger 1981;Ljung 1985), such as uncorrelated and complex normal distributed DFT coefficients, particularly useful in the estimation step (Kay 1993 Fig. C1 in Appendix C. The first and third rows show examples of two seismograms at different sensors triggered by the event. The noise and S-wave signal windows are coloured in grey and black, respectively, and red parts (i.e. P-wave and coda) are not considered in the analysis. The model fit and the model residuals for both seismograms at different sensors are provided below the seismograms in the left-and right-hand panels, respectively. The left-hand panels show the resampled (logarithmically spaced) noise (N ij ) and signal spectra of S wave (Y ij ) in grey and black, respectively, and cyan and red curves represents the model spectra [exp (μ ij )] given the combined model in (5) with and without empirical compensation, respectively. The modelled noiseN i j and corresponding uncertainty N i j ± σ N j are in solid and dashed magenta lines, respectively. The right-hand panels show the model residuals (R ij ) for the two sensors with and without empirical compensation and noise description in cyan and red, respectively. The orange dots in both left-and right-hand panels are the estimated cut-off frequencies defined in Appendix A. Schoukens 2007; Martinsson 2008). The Fourier transform F is a linear transform where the observed spectra F(y t ) can be expressed as F(s t + n t ) = F(s t ) + F(n t ). Here the received spectrum from the source F(s t ) can be expressed as the source model S multiplied by the Green's function G. The additive noise spectra F(n t ) can be estimated from a time window prior to P-wave onset (shown e.g. in Fig. 2).
To handle the influence of the background noise, we propose a modelŶ = SG + N of the spectra that includes the noise N (Pintelton & Schoukens 2001) rather than to truncate or weight the spectral coefficients by SNR. With this approach we can properly model the spectral coefficients even under great influence from the noise (Söderström & Stoica 1989;Ljung 1987;Pintelton & Schoukens 2001).

T H E M I N E S I T E , M E A S U R E M E N T S Y S T E M A N D DATA
The data used in the current investigation is recorded in the LKAB's underground mine in Kiruna, located in northern Sweden. The mine contains the largest deposit of apatite-iron-oxide mineralizations in the world and is the main supplier of iron ore in Europe (Troll et al. 2019). The ore body is irregular and approximately 100 m thick, ≈5 km long (in north-south direction) and plunging ≈65 • to the east (see e.g. Kozlowska et al. 2020, for more details). The rock mass is heterogeneous with large variations in material density (Vatcher et al. 2014), rock strength (Svartsjaern & Saiang 2017), wave velocity (Vatcher et al. 2018) and stresses (Sjöberg et al. 2012;Vatcher 2017). Additionally, the rock mass contains extracted volumes, altered zones, clay and crush zones (Vatcher 2017), all together influencing the wave propagation in the medium. Additionally, the sensors are usually placed where the events occur, causing small separation between the waves (i.e. P and S wave) and multiple sensors may end up inside the estimated source radius, especially for larger events.
The seismic measurement system shown in Fig. 1 is dense especially on the footwall side where most infrastructure is placed) and mine wide (i.e. covers the entire length of the orebody) with approximately 250 active geophones (totally 287 installed and some depleted thru time). Approximately half of the geophones are triaxial and the other half are uniaxial. The frequency responses also differ, depending on sensor type, and roughly 47 per cent of the geophones in the system have a natural frequency at 4.5Hz, another 49 per cent at 14Hz and finally 4 per cent at 30 Hz. The 30 Hz sensors correspond to the oldest sensors and these are not included in this study. The majority of the sensors are installed inside drilled holes, mainly vertically oriented, and grouted into the rock mass. For the three-axial sensors, we only use the sensor component that is parallel with the hole which is for the moment the only known component direction. The system is provided by a commercial company (Institute of Mine Seismology) and is likely one of the largest underground in-mine seismic systems in the world (Dineva & Boskovic 2017;Kozlowska et al. 2020).
The complete data set contains 31 640 seismic events recorded between the time period from 2018.12.10 to 2019.01.24. The events in the complete data set are chosen to have well defined locations, with 95 per cent confidence volume of the hypocentre distribution (Martinsson 2012) of less than 10 5 m 3 (equivalent to a sphere with a radius of 28.8 m). The hypocentre locations and corresponding uncertainties are automatically estimated (see Fig 3), including the arrival times, according to the technique described in Martinsson (2012). The events are randomly divided into two data sets where 70 per cent of the events are used for calibration (defined as the calibration data set) and 30 per cent for validation (defined as the validation data set and shown in Fig. 1). For each trace (illustrated in Fig. 2) we use the noise window n t (in grey) obtained prior to P-wave onset, and the signal window y t (in black) defined from the onset of the S wave with a length up to 500 ms. The S-wave windows  Fig. 2, where the cyan star is the posterior mode estimate of the hypocentre. Red areas are extracted rock-mass at the depth of the hypocentre (i.e. z ≈ 820 m), used as hard prior restrictions in the Bayesian hypocentre estimation algorithm (Martinsson 2012). Orange triangles are the triggered sensors by the event. The total number of sensors are visible in Fig. 1). The lines in light and dark grey are the tunnels at two different depths, where the depths are marked as triangles in the z-axis on right-hand side. considered here are slightly larger than those reported in Oye et al. (2005) for a mine with a similar seismic measurement system. The reason is to capture more energy for larger events. We choose to keep the observed spectra as untampered as possible following the guidelines in Pintelton & Schoukens (2001) using rectangular window functions (for both y t and n t ). This will also preserve the high energetic part in y t due to the uneven energy versus time distribution (i.e. the largest part of the energy is generally located in the beginning of the signal window). Furthermore, we are primarily interested in finding a model that describes our frequency-domain observations (Kay 1993;Pintelton & Schoukens 2001) rather than in, for example, detecting single frequency components by pushing down side lobes (see e.g. Oye et al. 2005,using cosine taper) or smoothing the observed spectra (see e.g. Thomson 1982;Prieto et al. 2007Prieto et al. , 2009 to reduce spectral variation. Adopting a statistical approach as we do here, there is generally no need to reduce spectral variations since this part is included in the model description as dispersion. The noise spectra N and signal spectra Y are first estimated using the DFT then resampled to 128 logarithmically spaced frequency coefficients in the interval from 14 to 1000 Hz, to reduce the number of coefficients and enhance the impact at low frequencies (see e.g. Ide et al. 2003;Lior & Ziv 2018). Additionally, to reduce possible electrical disturbances that may contaminate the sensor measurements, that is 50 Hz and overtones, we remove the corresponding DFT coefficients prior to resampling.

B AY E S I A N I N F E R E N C E
One of the advantages with a Bayesian method is that it can provide a solution even if the data is poor or inconsistent, by leaning on the support of informative prior distributions (see e.g. Martinsson 2012). The aim of adopting Bayesian inference here is to embed prior knowledge when estimating the source spectral parameters of a particular event. Informed prior distributions will regularize the Bayesian inference and provide the necessary robustness for dealing with, for example events triggered by only a few sensors or contaminated with outliers, inconsistent data or disturbances from mining noise.
Bayesian inference takes us from our prior beliefs to our posterior beliefs given some observations (i.e. likelihood) by using the Bayes' rule which can be expressed as where p is the probability density (or mass) function, θ represents the parameter vector we like to infer, and D is the input data that we will specify more in Section 5. The evidence p(D) scales the posterior distribution to be proper (i.e. to have unit probability mass). In reality we are mainly interested in the location and dispersion of the parameters given the observed data, and not the true probability density for a certain parameter value. We can therefore ignore the evidence p(D) term which is challenging to compute, and work with the following simplified expression For a more detailed introduction of Bayesian inference (see e.g. Tarantola

Inferential methods
We rely on Markov Chain Monte Carlo (MCMC) methods (Gelman et al. 2004;Kruschke 2014) providing realistic estimates of credible intervals of parameter values (Martinsson 2012) compared to, for example approximations based on derivatives. The Bayesian inference here is based on MCMC samples from the posterior distribution and the posterior predictive distribution (Gelman et al. 2004;Kruschke 2014), with converged MCMC chains. The MCMC samples are provided by a slice sampler (Neal 2003) that are here modified to slice in the directions along the eigenvectors of the estimated covariance matrix obtained from the burn in samples (Martinsson & Törnman 2019). This approach is more efficient compared to the traditional slice sampler or the commonly used Metropolis algorithm (Gelman et al. 2004;Kruschke 2014) or variations of it (Martinsson 2012). The efficiency of a sampler is usually defined by the effective sample size (ESS, see e.g. Kruschke 2014), which is a rough measure of number of independent samples in a chain. Events in both the calibration and the validation data set are evaluated based on 10 000 MCMC samples and with an average efficiency of a factor 0.64 (ranging between 0.31 and 1.0) in this study, it is more than enough to infer measures like the mean, the mode or the standard deviation of a trace (see e.g. Kruschke 2014). Individual events shown in this manuscript are not time critical, therefore they are evaluated with 100 000 samples to obtain smoother histograms. The estimated posterior distributions are summarised by the posterior mode as point estimate and Highest Density Interval (HDI) as credible interval. The HDI is the interval where all parameter values within the interval have higher probability density than parameter values outside the interval (see e.g. Gelman et al. 2004;Kruschke 2014).
Model validation is a central part in justifying the proposed model (Gelman et al. 2004;Gelman & Shalizi 2013;Kruschke 2014;Martinsson & Törnman 2019). Here we evaluate the models using both synthetic and real data. In order to trust inference obtained using the model, the predicted data generated using the same covariates should mimic to the actual observations. This kind of comparison is generally termed posterior predictive check (PPC) and is useful Upper row describes the prior distributions and corresponding hyperparameters (μ B , B , a B , b B , λ σ , λ ν ), for the parameters to be inferred (φ, σ 2 0 , ν), where the subscript B is used to denote Model B. The box in the middle describes the central tendency (μ ij ), and dispersion (σ 2 i j ) for the ith frequency coefficient in jth observed logged spectra log Y ij (μ ij and σ 2 i j correspond to the expected value and variance, respectively, under normal assumptions, i.e. when ν → ∞). The same diagram applies for Model A if we replace the Gamma distributed prior on n with a Dirac delta distribution centred at a fixed value and thereby treating n as a known constant (i.e. as we do for Q and β) and change the subscript to μ A , A . Here log to detect model deficiencies. Predicted data can also be used to generate confidence regions around the observed data (Gelman et al. 2004;Kruschke 2014). The confidence regions are important to determine the validity of the model and to highlight the impact of the underlying covariates in the model (discussed later in Section 7.1 and shown in Fig. 9).

T H E B AY E S I A N M O D E L
We use a Bayesian framework (see e.g. Gelman et al. 2004;Kruschke 2014) for joint estimation (i.e. the likelihood is a product over i and j) of the logged observed spectra log (Y ij ) triggered by the earthquake. The log-transformation provides more symmetric model residuals (Prieto et al. 2007). A graphical illustration is provided in Fig (Carpenter et al. 2017) or JAGS (Plummer 2003), to implement the proposed method with limited programming efforts (see e.g. Kruschke 2014,for an introductory explanation on how to do it).
The estimands of interest here are the source model parameters ( 0 , f c , n, where 0 ∝M 0 ), the attenuation parameters (Q, β) and the statistical parameters (ν, σ 2 0 ) which characterize the likelihood function. To simplify notations in the equations further on, we define a joint parameter vector φ = [ 0 , f c , n, Q, β], that contains the physical parameters describing the source (i.e. 0 , f c , n) and the attenuation (i.e. Q, β). The frameworks provides the posterior distribution p(θ|D) where θ = [φ, ν, σ 2 0 ] and the input data D is the EGF (i.e. events in training set) events, the heterogeneous velocity model, the hypocentre and origin time distribution and the observed noise and S-wave spectra for all sensors triggered by the earthquake.
In Bayesian statistics, the posterior probability is proportional to the likelihood function times the prior probability, and both terms are specified in the next two sections, respectively. After the model is defined we continue in Section 6 to estimate the hyperparameters and the empirical compensation data from the calibration data set, and finally we cross-validate the proposed models in Section 7.

The likelihood
To obtain robust parameter estimates, we propose a likelihood function [i.e. p(D|θ) in (4)] given by the Student's t-distribution for ith frequency coefficient at the jth sensor, where ν is the normality parameter (Kruschke 2014). The combined model μ ij (φ) in (5) and the propagating uncertainty σ 2 i j (φ, σ 2 0 ) in (6) depends on the additional parameters: where the source model and the Green's function are given by respectively, and G ij depends on the modelled attenuation The terms above depend on the model parameter vector φ = [ 0 , f c , n, Q, β]. We explain these parameters more in Sections 5.1.1 and 5.1.2 next. To simplify expressions we will, when possible, omit dependencies and use the more compact leftmost expressions in (5)-(9). Note that S i in (7) is proportional to S in (2) as the amplitude is proportional to the seismic moment ( 0 ∝M 0 ). We have also embedded 2π in this proportionality. Given the asymptotic statistical properties of the DFT (Brillinger 1981;Ljung 1985;Kay 1993;Pintelon & Schoukens 2007;Martinsson 2008), the joint likelihood is the product over i and j where I is a set of frequency coefficients (defined between the cutoff frequencies in observed spectra as shown in Fig. 2), and J is a set of spectra obtained from the sensors triggered by the event, as shown in Fig. 3.

The combined model (μ ij )
The combined model μ ij in (5) describes the expected logged observed spectra log Y ij and depends on the source model S i in (7), the combined empirical Green's function G ij in (8) and the observed noise spectra N ij obtained prior to the P-wave onset. G ij contains both modelled physical attenuation A ij in (9) and the empirical compensation E ij , where A ij combines both the intrinsic attenuation and geometric spreading where Q and β are treated as mine wide constants.

5.1.2
The propagating uncertainty (σ 2 i j ) The propagating uncertainty σ 2 i j in (6) depends on the uncertainties in: the origin time σ 2 t 0 , the ray-length σ 2 r j , the empirical compensation σ 2 E i j , and the uncaptured uncertainty σ 2 0 . The uncertainties σ 2 r j and σ 2 t 0 are obtained a priori by the location algorithm (Martinsson 2012) and the MCMC samples, where σ 2 r j is the variance of the distance between the jth sensor and the hypocentre distribution, and σ 2 t 0 is the variance of the origin-time estimate. See Fig. 3 for an illustration of the MCMC samples describing the posterior estimates of the hypocentre and the origin time. The last term σ 2 0 describes the uncaptured uncertainty and is treated as an additional parameter that is estimated jointly together with ν and φ for each event.

The prior probability
Prior distributions are important to restrict the estimates to realistic values.This is also frequently applied in similar studies using non-Bayesian methods simply by restricting the parameter space to or between certain values (e.g. 2 ≤ n ≤ 3 or 0 < f c ≤ 0.7f max as in Huang et al. 2016), which is in a Bayesian context the equivalent of uniform prior distributions spanning these domains. Restrictions are especially important in cases with dependencies between the parameters (Gelman & Hill 2006), as previously observed for the source model parameters (Van Houtte & Denolle 2018; Supino et al. 2019). For example, without restrictions we can obtain a reasonable model-fit with unrealistic parameter estimates (e.g. n < 1.5 as observed for the grey dots in Figs 6 and 7) since one can compensate for the other due to the pairwise correlation between the parameters.
Instead of assigning equally likely parameter values within these restrictions, Bayesian methods give us much more flexibility to shape the prior distributions within these restrictions based on relevant data. Relevant data can be, for example historical data as in Martinsson (2012), data obtained from similar studies, hierarchical approaches that estimates the prior distributions jointly with the model parameters as in Martinsson & Törnman (2019), or using a separate calibration data set as we do here.
To avoid unrealistic parameter estimates we define reasonable boundaries according to the literature of credible values for φ in Table 1, labelled as Region of Reasonable Values (RORV). These boundaries are not to be confused with the parameter space defined by the prior distributions and is simply used to remove unrealistic estimates for proper model calibration of the prior distributions used in the model. We return to it in Section 6 when we describe the estimation process.
Without the filtering step using RORV, occasional unrealistic parameter values and outliers would widen the prior distributions and prevent the regularization effect we are looking for by using informative prior distributions. One of the benefits of Bayesian models is to apply informative prior distributions and the necessary support in cases with noisy, inconclusive, and inconsistent data (see e.g. Martinsson 2012, for the benefits of informative prior distributions in hypocentre estimation).
The prior distribution can be factored into three different parts where p( 0 , f c , n) is the prior of the source model parameters, p(Q, β) is the prior of the attenuation parameters, and p(ν, σ 2 0 ) is the prior for the statistical parameters. These three parts are defined in the next three subsections, respectively.

Prior for the source model parameters p( 0 , f c , n)
There is a relationship between 0 and f c (Aki 1967) that holds for all event sizes (Sellers et al. 2003;Goodfellow & Young 2014), where f c decrease when 0 increase. To capture the relationship between the logged parameters we use a bivariate normal distribution as prior for [log( f c ), log( 0 )] ∼ N (μ m , m ), where m ∈ {A, B} for Model A and B, respectively. In comparison to Van Houtte & Denolle (2018), where the logged parameters are assumed independent and assigned a normal distribution for each parameter, we include this dependency with a joint distribution. For the falloff rate n we have two different priors that define the difference between the model structures defined as Model A and Model B (see Fig. 4). In Model A, n is fixed for all events which is equivalent to assigning a Dirac delta distribution as prior on n centred at the fixed value, while in Model B the falloff rate n is event specific and therefore we assign a Gamma distributed prior with shift of 1.5 to avoid non-physical estimates according to Lee et al. (2003). The estimation procedure for the priors based on the calibration data set is explained later in Sections 6.1 and 6.2.

Prior for the attenuation parameters p(Q, β)
The attenuation parameters Q and β are treated as known constants in the modelled attenuation A ij in (9). In Section 6.1, we describe how the values of Q and β are obtained. Treating them as known constants is equivalent to assigning Dirac delta distributed priors centred around their known values.

5.2.3
Prior for the statistical parameters p(ν, σ 2 0 ) For the statistical parameters ν and σ 2 0 in the likelihood, we use exponential distributed priors following (Kruschke 2014). The prior for the normality parameter ν is an exponential distribution shifted with one, using an expected value of 30 as proposed by Kruschke (2014). In Fig. 4 we adopt the simplified notations proposed by Kruschke (2014) for the shifted exponential distribution ν − 1 ∼ Exp(λ ν ), where λ −1 ν = 29. The shift of the prior for ν is to restrict the likelihood to not be more extreme in the tail width than a Cauchy distribution (i.e. when ν = 1 the t-distribution is equivalent to a Cauchy distribution). The expected value for the prior of ν is assigned as the boundary when the model residual can be assumed as Gaussian distributed (i.e. when the estimate of ν > 30).
The prior for the uncaptured uncertainty parameter is σ 2 0 ∼ Exp(λ σ ). As estimated values of σ 2 0 are close to zero for the majority of the events (see e.g. Figs 12 and D1-D7), we follow (Kruschke 2014) and choose to assign a weakly informed prior in relation to these values using an expected value of λ −1 σ = 1. This is to obtain a more robust Bayesian inference and to obtain converged MCMC chain in all cases.

T H E E S T I M AT I O N P RO C E D U R E
We are following the principles described in Gelman & Hill (2006); Step 1: Estimate mine wide constants (n, Q, β) or (Q, β) for Model A or Model B, respectively.
Step 2: Estimate informative priors p(Ω 0 , f c ) or p(Ω 0 , f c , n) for Model A or Model B, respectively.
Step 4: Model validation of Model A and Model B with (E = E) and without (E = 0) empirical compensation for different spatial (c 1 ) and size (c 2 ) weighting coefficients.
Model calibration, based on the calibration data set (70%) Model validation, based on the validation data set (30%) Figure 5. A summary of the estimation procedure. Steps 1-3 is model calibration based on the calibration data set and Step 4 is model validation based on the validation data set (i.e. cross-validation). In Step 1 we estimate the fixed values assumed as known constants in the models using weakly informative priors on φ defined in Appendix B. In Step 2 we estimate informative priors for the event specific parameters in the models using weakly informative priors on remaining parameters in φ, given the known constants in Step 1. In Step 3 we estimate the empirical compensation data necessary for the empirical model (E ij and σ 2 E i j ), given the estimates in Step 1, Step 2 and Model A. In Step 4 we validate the models with different weighing coefficients to determine proper spatial and size impact in the estimate, given Step 1, Step 2 and the empirical model relying on data from Step 3.   Step 4 is the validation performed on validation data. Note that in Steps 1-3 we use good quality events from the calibration data set (i.e. with the certain requirements defined in Table 1) to avoid extreme values and outliers that may affect calibration. In the final Step 4 we process and estimate source model parameters for all events in the validation data set, relying on the constraints provided by the informative prior distributions and the help from the empirical compensation given by the calibrations.
In the first step, we estimate the fixed values assumed as known constants in the models. These fixed parameters correspond to n, Q, β in Model A and Q, β in Model B. The second step is to estimate the event specific priors for both models, given the fixed values in Step 1. The third step is to estimate the data used for the empirical compensation E ij in the Green's function G ij , given the fixed values in Step 1 and the informative priors in Step 2. The final step is the model validation performed on all events in the validation data set. Here we evaluate both source models (Model A and Model B) and we determine the proper impact between size and distance for the training events used in the empirical compensation, given the fixed parameters (Step 1), estimated priors (Step 2) and the empirical  Table 1). The corresponding grey dots are part of the rejected events where the estimated posterior mode of n is less than 1.5.

compensations (
Step 3). We describe each of the four steps in more detail next.

Step 1: Estimation of fixed parameters (n, Q and β)
The attenuation parameters (Q, β) and the source parameter n are often set to constants for the combined model μ ij (see e.g. Oye et al. 2005;Shearer et al. 2006). The parameters are dependent with respect to model fit, as observed in Fig. 8, and changing one parameter's value will affect the estimate of the other parameter's value. Therefore we estimate all parameters in φ simultaneously in this step (Gelman et al. 2004;Kruschke 2014), using weakly informative priors on φ defined in Appendix B. The priors p(ν, σ 2 0 ) are defined in Section 5.2.3.
Since the parameters are correlated and not well restricted in Step 1, there will be estimates where one parameter value will compensate for another causing non-realistic parameter estimates. Therefore, we only accept estimates of φ in the calibration set where 95 per cent HDI of φ are in the RORV defined in Table 1. The accepted events within the RORV are represented as the magenta coloured dots and magenta coloured histograms in Fig. 6. The fixed values (n, Q and β) are estimated from the accepted events as the weighted average, where the weights are the variance of each posterior trace. The resulting estimates in Step 1 of the mine wide constants to be used in Steps 2-4 are: Q = 88.33, β = 1.19 (and n = 2.31 in Model A).

Step 2: Estimation of event specific priors
In this step we narrow down credible parameter values further for the sequential steps, by estimating informative prior distributions for the event specific parameters ( , f c ) for Model A and ( 0 , f c , n) for Model B. These informative priors will increase both the robustness against outliers and extreme values and also the precision of the estimates by adding the necessary support needed in cases with noisy, inconclusive, and inconsistent data (Martinsson 2012) and also limit the effect of intercorrelation. The prior distributions are illustrated in the top panel in Fig. 4 together with the hyperparameters describing them. Here the hyperparameters are estimated from the calibration data set for both models, given the fixed parameters from Step 1 and weakly informative priors (see Appendix B) for  Table 1). The dashed black contour lines correspond to log density values for the bivariate normal, and the solid black line in the right-hand figure corresponds to the shifted gamma distribution. The dots are coloured by the J integral (approximated here using the Riemann sumĴ = ∀i S 2 i f i , for f i = 1, 2, . . . , 10 4 ) which is proportional to the radiated seismic energy (Ross & Ben-Zion 2016) and displays here the relationship betweenĴ , f c and 0 . The grey dots correspond to the events where 95 per cent HDI of n ⊂ RORV and these are not used for estimating the priors in Model B. The bottom figure shows the estimated prior for Model A based on all events in the calibration set (coloured dots) since we consider fixed values of n, Q and β within RORV. the remaining parameters in φ together with the statistical priors p(ν, σ 2 0 ) defined in Section 5.2.3. For Model A, where n is fixed, we estimate the expected value μ A and the covariance A based on posterior estimates obtained from the calibration data. These values will shape the prior distribution of the logged parameters [log (f c ), log ( 0 )] described in Section 5.2.1.
In Model B, where n is event specific, we estimate the prior parameters μ B and B as in Model A along with the shape a B and rate b B for the shifted gamma distributed prior for n explained in Section 5.2.1. For estimation of the priors in Model B, we also reject events where 95 per cent HDI of n ⊂ RORV (see Table 1). The rejected events are illustrated as grey dots in Fig. 7.
The estimated informative priors are compared with the estimated parameter values using weakly informed priors in Fig. 7 for Model B. The difference between the estimated values of μ m and m for different models m ∈ {A, B} is due to the difference in the flexibility between the models (i.e. Model B is more flexible as n is estimated for each event as opposed to being fixed as in Model A).

6.3
Step 3: Estimation of empirical compensation (E ij and σ 2 E i j ) Here we process the events in the calibration data set using Model A, given the fixed parameters in Step 1 and the informative priors in Step 2, to obtain data for the estimation of the empirical compensation E ij and corresponding uncertainties σ 2 E i j . The estimates obtained from Step 3 are: the model residuals R i jk = log(Y i j ) k − μ i j k , the size log(ˆ 0 ) k and the locationp k for the kth event. Here the size and the distance (between the target event and the events in the calibration set) are used to weight the kth impact of R ijk on the jth sensors. This approach will not truncate the calibration data with respect to size or distance as Shearer et al. (2006) proposed. Instead, all calibration events are used and those located near the target event with similar size will have a larger impact on the estimation of the empirical compensation E ij and the corresponding uncertainties σ 2 E i j . However, using a truncation free approach it is important to find the appropriate impacts (or weights) of the calibration events with respect to the size and distance when estimating the target event. Here we use the weighted average of R jk to estimate E ij and the weighted variance to estimate σ 2 E i j as The weights are jointly estimated from the distance and size related impact functions parametrized by the coefficients c 1 and c 2 . The impact function w 1k in (15) (15). For the size dependent impact function w 2k in (16) we apply likelihood weighting based on a fitted normal distribution, centred around the current target size (μ w = log ( 0 )) and with a standard deviation (σ w ) estimated from all events triggered by the jth sensor in the calibration set (i.e. the standard deviation of log( 0 ) k ∀k ∈ K j , where K j is the set of event indices triggered by the jth sensor in the calibration data set). In Section 7.2, we investigate the impact of different values of the coefficients c 1 and c 2 .

Step 4: Estimation of source model parameters
Here we process the source models based on the validation data set, using different combination of the compensation coefficients (c 1 , c 2 ). The input data for Step 4 are: the fixed values obtained in Step 1, the informative priors given in Step 2, and the underlying data for the empirical compensation gathered in Step 3.
Downloaded from https://academic.oup.com/gji/article/227/1/403/6273640 by guest on 15 July 2021 The data obtained from Steps 1-3 needed to estimate the source model parameters for one event consists of: the observed spectra (Y ij ), noise spectra (N ij ), calibration data ([R i jk ,φ k ,p k ]), ray-lengths (r j ), traveltimes (t j ), hypocentre (p), uncertainty in origin-time (σ 2 t 0 ) and uncertainties in ray-lengths (σ 2 r j ). Here we use the modelled travel times (t j = r j /V (p, j)) to reduce the sensitivity to inconsistent arrival-time estimates, where the expected velocity along the ray path V (p, j) is both sensor and spatial dependent and explained in Martinsson (2012).

I N F E R E N T I A L R E S U LT S
Inferential results using different variations of the proposed models are presented here and the main focus is to address our aim, declared in Section 1, to obtain: robust, reliable and precise source model parameter estimands. We will start by using synthetic data to investigate the: accuracy, precision, pairwise correlation, credible regions, and the impact of underlying covariates. We then continue with real data and cross-validation to: assess the two alternative models; find appropriate values of the compensation coefficients; investigate model deficiencies with respect to size, frequency and distance using residual analysis and PPC; evaluate necessity of a robustness against extreme values and outliers indicated by the normality parameter.

Synthetic model evaluation
With synthetic data we evaluate the Bayesian inference in terms of accuracy, precision, pairwise correlation between parameters and the ability to capture the synthetic data within the estimated credible regions. Synthetic data are also used to evaluate the impact of the covariates (i.e. σ 2 t 0 , σ 2 r j and N ij ) using PPC (Gelman et al. 2004;Kruschke 2014). Fig. 8 shows the Bayesian inference of 10 synthetic spectra generated at different distances r j from the source, where the upper triangular figures are scatter plots of the MCMC samples and depict the pairwise correlations (ρ) of credible values of the parameter vector φ. The 95 per cent Confidence Interval (CI) of the correlation coefficient ρ is based on the Fisher's Z transform. The diagonal figures are the histograms of posterior samples of φ, with corresponding 95 per cent HDI. The centre plot in the last row shows the true (i.e. φ = φ true ) and the estimated (i.e. φ =φ) source spectra S i (φ) in cyan and black, respectively, together with the estimated 95 per cent HDI in grey. The HDI regions are calculated for each frequency coefficient given the source spectra S i (φ) in (7)  In Fig. 8, we can see that the true parameters are within estimated confidence regions and that there are statistically significant correlations ρ (i.e. CI of ρ excludes zero) of different amounts between the credible parameters in φ. The largest correlation is observed between the credible values of 0 and β, suggesting either to treat β as a fixed parameter value or to assign informative prior distribution to increase inferential robustness (Gelman & Hill 2006). The model residuals of the estimates can be assumed Gaussian due to the large estimated values of the normality parameter ν. This makes sense as the synthetic spectra are generated using Gaussian assumptions.
Also, for synthetic data generated from the model, we expect credible values of σ 2 0 around zero as there is no additional uncertainty term in the description of the total propagating uncertainty σ 2 i j , since we assign σ 2 E i j as the variance of the model residual. We additionally observe that the true source spectra S i (in cyan) are close to the estimatedŜ i = S i (φ) (in black) and within the 95 per cent credible regions (see e.g. Kruschke 2014). Fig. 9 shows the credible regions of the posterior predictive distribution (see e.g. Gelman et al. 2004;Gelman & Hill 2006;Kruschke 2014) for the same synthetic data as in Fig. 8, illustrating the impact of the underlying covariates like σ 2 t 0 , σ 2 r j and N ij . Synthetic spectra from top to bottom represent: the closest sensor, the middle sensor, and the most distant sensor, respectively. The spectra are coloured by the distance r j to the hypocentre. Three different 95 per cent HDIs (shown in red dashed, light grey area and dark grey area) obtained from the replicated data drawn from the posterior predictive distribution (see e.g. Gelman et al. 2004;Kruschke 2014), illustrate the separate impact of the propagating uncertainty σ ij in (6), given different values of σ t 0 and σ r j .
In Fig. 9 we also observe that an increase in σ t 0 results in larger uncertainties at higher frequencies and an increase in σ r j results in larger uncertainties for sensors near the hypocentre. The impact of the added noise description N ij in the combined model μ ij is clearly observed when the received signal hits the noise floor, as the credible region describes the observed spectra, in comparison to a model neglecting the noise (e.g. as the model represented by the red curves in describing the real event in Fig. 2). As mentioned in Section 2.3, by including a description of the noise in the model, we can properly model the spectral coefficients under great influence from the noise (Ljung 1987;Söderström & Stoica 1989;Pintelton & Schoukens 2001).

Compensation coefficients c 1 and c 2
The compensation coefficients c 1 and c 2 adjust the weights in the empirical compensation E ij and the corresponding uncertainty σ 2 E i j , where appropriate values will improve the general model-fit. The impact of the compensation coefficients are explained in Section 6.3 and illustrated in Fig. 10, and their effect on the model-fit is summarized in Table 2. Fig. 10 illustrates the impact of different values of c 1 and c 2 , event sizes and locations on the unit-less empirical compensation E ij and corresponding uncertainties σ E i j . Here E ij is represented as solid lines and E i j ± σ E i j as dotted lines coloured by event size ( 0 ∈ {10 −6 , 10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 −1 }) for the jth sensor, and evaluated at the average hypocentre locationp for the calibration events that is triggered by the jth sensor. The top six figures are plotted using the same sensor j = 122 but with different values of c 1 (related to distance) and c 2 (related to the size). The bottom three figures use the same c 1 and c 2 but evaluated at different sensors j ∈ {140, 241, 246} to illustrate the differences at the sensor specific response.
In Fig. 10, the upper left-hand figure (where [c 1 , c 2 ] = [0, ∞]) corresponds to the stacked average of the model residuals spectra R ijk for the jth sensor, and for these values there are no dependencies on either distance or size. The weighting in (14) will therefore solely act as a sensor-dependent compensation. However, neglecting dependencies on distance and size will increase the residuals in Table 2   observe flatter empirical compensations for smaller 0 and more frequency dependent compensation for larger 0 . The left-hand and middle figures in the second row show the empirical compensation using only the dependency on distance, neglecting the dependency on event size. We observe a slight difference between the inverse distance (i.e. c 1 = 1) and the inverse squared distance weighting (i.e. c 1 = 2). The right-hand figure in the middle row shows the compensation for the same sensor using the values that provides the best model-fit given in Table 2. Finally, the bottom three figures use the same and proposed values c 1 = 2 and c 2 = 1, but evaluated at different sensors j ∈ {140, 241, 246} to illustrate the large sensor specific differences observed between sensors, like: resonances at different frequencies, high-pass or low-pass compensation. Table 2 summarizes the validation of the model fit in terms of dispersion of the residual, given the two different source models and different values of the empirical compensation coefficients. The inference is based on dispersion measures of the model residuals R ijk for all frequency coefficients in the validation set. Model A use fixed n = 2.31 while in Model B the parameter n is estimated for each event. The columns are: Model type, size of the 25 per cent HDI of R ijk , size of the 75 per cent HDI of R ijk , and the standard deviation σ R i jk of R ijk .
In Table 2, we observe that the compensation coefficients c 1 = 2 and c 2 = 1 result in the lowest dispersion of the model residuals, therefore indicating favourable values in terms of model-fit relative to the other evaluated alternatives. These values correspond to the inverse square distance weighting, combined with default likelihood weighting with respect to the size (i.e. using simply the standard deviation σ w in (16)).

Residual analysis
Residual analysis is conducted by investigating the histograms of all residuals in the validation data set. The main purpose is to detect general remaining model deficiencies and asymmetries with respect to: frequency, distance and size (similar as Drouet et al. 2011). Residual analysis can be done on individual seismograms as shown in Fig. 2, but is too comprehensive for the large data set and would not detect general model deficiencies and trends. Also, quantitative measures like those in Table 2 are important in determining the model fit, but they don't indicate remaining dependencies in the residuals. Fig. 11 shows the histograms for all frequency coefficients of the model residuals R ijk in the validation set using the less flexible Model A. The histogram in a) shows the residual with (E = E) and without (E = 0) empirical compensation in magenta and cyan colours, respectively. The figures in b), c) and d) represent 2D histograms estimated with empirical compensation, to highlight remaining dependencies of R ijk on either: sensor's hypocentre distance, frequency or event size, respectively. The colours of the bin counts in b)-d) are in log-scale to enhance tail differences.
In Fig. 11(a), with empirical compensation (E = E) we obtain significantly less dispersion in the model residuals compared to  Fig. 8, illustrating the impact of the underlying covariates like σ t 0 , σ r j and N ij . Synthetic spectra from top to bottom represent: the closest sensor, the middle sensor and the most distant sensor respectively. The spectra are coloured by the distance to the source r j . Two different 95 per cent HDI (in light and dark grey) are obtained from the replicated data drawn from the posterior predictive distribution (see e.g. Gelman et al. 2004;Kruschke 2014), showing the separate impact of the propagating uncertainty σ ij given different values of σ t 0 and σ r j in our model defined in (6). The combined impact (i.e. including both uncertainties) is shown as dashed red lines. without compensation (E = 0). This was also observed in the qualitative measures in Table 2. A measure of the residual dispersion gives an indication of how well the model describes the observations in the validation data set and provides a value of the "left-overs" that the model could not describe. In Fig. 11(a)-(d), the histograms are centred symmetrically around zero, indicating no significant model deficiencies with respect to: b) distance, c) frequency, or d) size, despite the use of the less flexible Model A.
For individual seismograms, as those in Fig. 2, the residuals are pushed towards zero, as the combined model includes both the empirical compensation as well as the noise spectra in the description of the observations. The effect of modelling the noise using the proposed method is also visible, where the impact of high frequency noise components are reduced. The more precise model fit for individual seismograms is also observed in Fig. 12 and in Figs D1-D7 in Appendix D, showing the model's ability to describe sensor-specific frequency-dependencies and extreme values.
For model validation, we assess replicated data from the distribution to detect aspects of the data not captured by the model (see e.g. Gelman et al. 2004). For both challenging events, such as the one in Figs 2, 3, and 12, and for the representative events in the data set Figs D1-D7, we observe that the posterior predicted distribution (summarised by the mode and the HDI in the figures) captures the observed spectra and includes extreme values such as resonances at different frequencies, high-pass or low-pass compensation.

Extreme values and outliers
The likelihood described in Section 5.1 and Fig. 4, is given by a Student's t-distribution with the ability to adjust the tail width using the normality parameter ν to handle extreme values and outliers. In cases where the estimate of ν < 30 the data contains either extreme values (e.g. sensor resonances) or outliers (e.g. poorly associated seismograms) with respect to the model and underlying covariates. In Fig. 12 we observe both extreme values like sensor specific resonances (top right-hand panel) and outliers caused by wrongful arrival times (see e.g. bottom left-hand panel and low amplitude spectra that is located near the hypocentre). Using a normal distribution as a likelihood function for the events where the estimate of ν < 30 would result in biased parameter estimates and a poor model fit. Extreme values are here described by a more complex model (e.g. with the empirical compensation) and the outliers are handled by adjusting the tail width in the likelihood function, improving the robustness and model fit of the proposed model (see e.g. Gelman et al. 2004;Martinsson 2012;Kruschke 2014). Table 3 shows the estimates of the normality parameter ν for the validation set, with and without empirical compensation, and for Model A and Model B. The columns are: the model, the compensation coefficients, the percentage of events where the estimates of ν < 30 with 95 per cent confidence level, and the percentage of the events where the mode estimateν < 30 In Table 3, we can see a vague difference of the estimates of ν between Model A and Model B in row two and three, respectively, in comparison to the estimates with and without empirical compensation in row one and two, respectively. Without empirical compensation 50 per cent of the events have a mode estimate of ν < 30 in comparison with empirical compensation, where the corresponding result was less than 15 per cent of the validation events. An explanation is that many of the observed extreme values are caused by sensor specific resonances (see e.g. Fig. 10) that are captured by the distribution of the empirical compensation described by E ij and σ E i j .

Model A or Model B
Here we evaluate the less flexible Model A compared to Model B where n is not fixed but instead event specific. In Section 7.4 the focus was on outliers and extreme values and here the evaluation is based on quantifying the distribution of the model residuals in terms of dispersion that is summarized in Table 2. We also observe the impact of model choice on the estimates of f c and 0 in Fig. 13. Fig. 13 compares the estimates of f c and 0 obtained using Model A with and without empirical compensation, and Model A and Model B both with empirical compensation. Different panels show posterior mode estimates of f c (with confidence regions) and 0 for the events in the validation set. In Figs 13(a) and (c), we observe the relationship (Aki 1967) between f c and 0 . In panel (c), with Model B, we observe a increased number of low frequency events (f c < 10) and less high frequency events (f c > 1000) compared to Model A. In panels (b) and (d), the impact on f c for both models with and whiteout empirical compensation are evaluated. We observe that for events where the estimates between the models differ more, the confidence regions are usually larger as indicated by the thin whiskers. Panel (d) Here E ij is represented as solid lines and E i j ± σ E i j in dotted lines, coloured by size, respectively. Each figure shows the empirical compensation for different sizes 0 = [10 −6 , 10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 −1 ] for the jth sensor, evaluated at the average hypocentre location for all triggered events in K j (i.e. set of event indices triggered by the jth sensor in the calibration data set). The top six figures are plotted using same sensor j = 122 but different compensation coefficients [c 1 , c 2 ], where c 1 is related to distance and c 2 to log ( 0 ). The bottom three figures use the same c 1 and c 2 but evaluated at different sensors j ∈ {140, 241, 246} to show the differences at the sensor specific response. (The empirical compensation is unit-less). Table 2. Validation of the model-fit, evaluated with and without empirical compensation using different compensation coefficients (c 1 , c 2 ) and source models (A,B). The inference are based on different dispersion measures of the model residuals R ijk for the entire validation data set. The columns are: Model, size of the 25 per cent HDI of R ijk , size of the 75 per cent HDI of R ijk , and the standard deviation σ R i jk of R ijk . shows a difference in the estimated corner frequencies between the two models. In panel (e), where we compare the estimates of 0 between Model A and Model B, we only observe minor differences in comparison to the estimates of f c . A possible reason is that there is a stronger correlation between f c and n compared to 0 and n as indicated in Fig. 12  The lower left-hand corner depicts the observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i (in black) and the 95 per cent HDI (in grey). The plots in the right-hand column are the observed spectra for closest sensor, middle sensor and furthest sensor from top to bottom coloured by distance, together with the estimate of the observed spectraŶ = exp(μ i j (φ)) (in black) using the posterior modeφ and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided to the left of the diagonal. (2, 1) 2.7 % 11.5 % (see e.g. Gelman et al. 2004;Martinsson 2012;Kruschke 2014). However, in panel (a) we can observe that with empirical compensation the most extreme dots are contracted towards the centre, indicating more robust estimates due to the increased description obtained with the empirical compensation as previously shown in Section 7.3. In panel (f) we observe truncation effects in the histogram of n using Model B for approximately 10 per cent of the events, these events seem to favour a flatter model than is allowed by the prior restrictions (i.e. n < 1.5). In panels (g) and (h) we observe that the corresponding uncertainties σ 0 and σ fc (i.e. the standard deviation of the posterior samples of 0 and f c ) increase with the estimated size for both 0 and f c , respectively, for both models. The thin whiskers (i.e. HDI 95 per cent) in panel (b) Table 2 shows different dispersion measures of the model residuals as an indication of model fit. We see that given same c 1 and c 2 for both models, Model B is always favourable in terms of model fit. Also, a similar model fit is given using Model B without size compensation (c 2 → ∞ in row 8), compared to using Model A with size compensation (c 1 = 1 in row 7). The best model fit are obtained using both size and distance compensation (c 1 = 2, c 2 = 1) together with the more flexible Model B. spectra recorded in challenging environments. The model uses distributions to handle uncertainties and unknowns together with physical models for the central tendency in combination with a spatial and earthquake size dependent empirical compensation to capture systematic deviations. The proposed model is trained and evaluated on two separate data sets, consisting of 31 640 mining induced events in total. The processing is fully automatic and is applied without manual interaction. Processing of mining induced events is especially challenging due to the extreme environment containing excavated volumes, complex geology, stresses that is continuously changing (see e.g. Sjöberg et al. 2012;Vatcher et al. 2014;Vatcher 2017;Svartsjaern & Saiang 2017, for further description of the complexities in the mine) along with recorded waves influenced by mining noises (e.g. shown for multiple seismograms for the event in Fig. C1). Also, due to the extreme environment, it is difficult to obtain and set reliable physical properties such as material densities, velocities and stiffness in the rock mass in the studied region. Therefore, we mainly focus here on a description of the source model parameters (i.e. 0 , f c and n) that can be estimated directly from the observed spectra and is not dependent on material properties as opposed to: seismic moment, radiated seismic energy, and stress drop (Kijko 1994).
Stress drop is often assumed to be independent of source size (Aki 1967) and log-normally distributed (Baltay et al. 2011) and is commonly used as a quality measure of source parameters. Despite this, it seems to depend on the source model (Abercrombie et al. 2016), frequency bandwidth (Ruhl et al. 2017) or what EGF events are included (Abercrombie 2015) and may therefore be a poor measure to quantify variations in earthquake source spectra (Shearer et al. 2019). In Fig. A3, we explore the values of 0 × f 3 c (which is proportional to stress drop) with respect to: location, 0 , f c and n. We find no statistical significant correlation (i.e. CI 95 per cent of ρ includes zero) with respect to 0 and we have no evidence that would suggest size dependency and reject standard scaling (i.e. 0 ∝ f 1/3 c Aki 1967). We observe some correlation with respect to f c and n, together with some spatial patterns (i.e. cluster of events with larger stress drop) as in Shearer et al. (2006). The positive correlation with respect to f c may indicate that stress drop is overestimated for larger frequencies (which is the opposite of the observations made by Ruhl et al. 2017) and the positive correlation with respect to n coincides with the finding made by Brune et al. (1986). The histogram in the bottom right rejects the log-normal assumptions, since the data is slightly skewed towards higher estimates in log-scale. This may be related to limited bandwidth since the high frequency events, especially those above f max , have significantly larger stress drop.

The Bayesian approach and possible model extensions
To avoid additional complexity, the Bayesian framework is presented here using a well known and simple description of the source (Brune 1970) and the attenuation (Shearer et al. 2006). It is straightforward to replace the components in μ ij (φ) to more complex ones. Modelling errors (in the path, at the site, and of the source) are always inevitable and the emphasis here is on capturing the errors adequately using a combination of empirical compensation terms and statistical parameters.
Our model empirically captures the unmodelled errors by favoring similar (in terms of earthquake size and spatially closeness) EGF events with respect to the target event. The empirical compensation terms (described in Step 3) capture both path and site and sometimes source related terms (e.g. if a cluster of similar sized events has similar rupture behaviour as observed in Folesky et al. (2016)). In general, systematic errors that are not captured by our source and attenuation model in μ ij (φ) will end up in the empirical compensation term E ij . Non-systematic errors will end up in the uncertainty terms (σ 2 E or σ 2 0 ) in the statistical description and therefore expand the HDIs for the inferred parameters and spectra in different ways depending on their origin.
This does not mean that the modelling errors are lost and if we would like to extend the model, the information is still available in the residuals (see e.g. Fig. C2) and the empirical compensation spectra (see e.g. Fig. 10). With a more complex model we would reduce the systematic errors related to the source and consequently also reduce these unmodelled source characteristics embedded in the EGF (e.g. as we do with the more complex Model B).
An interesting extension is to include directional effects, e.g. by fitting some directivity function (Boatwright 2007) to the variations in spectral amplitude (Kane et al. 2013;Calderoni et al. 2015). However, we do not observe any evidence of directional effects such as spectral splitting (i.e. between the modelled and observed spectra) in the representative events in Figs D1-D7. This may be since 99 per cent of the events in the validation data set are less than magnitude one which was the detectable limit for rupture effects in Folesky et al. (2016).
Another aspect we need to consider when extending the model, is that an increased number of degrees of freedom generally leads to increased estimation uncertainties for the individual model parameters. With additional parameters and possible multicollinearity (i.e. intercorrelation), it is essential to restrict the parameters with informed prior distributions to provide the necessary robustness as mentioned in Section 4. This was observed when investigating the estimation uncertainty σ fc of f c between Model A and Model B in Section 7.5. In general, model parameters in physical models [e.g. eqs (1) and (2)] are often prone to intercorrelation, as we are constrained to a specific parametrization. One of the strengths with Bayesian modelling in this regard is that we have an inbuilt mechanism to deal with overparametrization. This opens up the possibility to apply complex models with many parameters in combination with insufficient or inconsistent data, as long as these parameters are constrained by informative prior distributions (see e.g. Gelman & Hill 2006).
An important additional value associated with the full probability description proposed here, is that the uncertainties in the input data are propagated all the way to the resulting source model parameters, providing realistic uncertainty descriptions of the model parameters and observed spectra. This means that they can in turn be propagated to the next step when deriving source parameters. In the next step, the uncertainties in density and velocity can easily be incorporated and combined with those from the source model parameters to provide realistic uncertainties of, for example seismic moment, seismic energy and stress drop.
The extreme environment means that the model needs to be robust with respect to the input data and corresponding uncertainties. This is particularly important in this work when we rely on autoprocessed input data in the form of hypocentre location, origin time and arrival times (described in more detail in Martinsson 2012). Here we depend on the unique strengths of Bayesian statistics using informed prior distributions of the source model parameters in combination with robust likelihood functions that incorporates the uncertainties delivered together with the input data (Martinsson 2012) as shown in Fig. 3.
The estimation procedure could be addressed differently, for example by correcting the observed spectra for attenuation and fit the stacked average spectrum to the source model (Oye et al. 2005;Shearer et al. 2006). The stacked average alternative is computationally fast and results in a simple source spectra. However, using our proposed joint estimation method includes several benefits in comparison. For example: (i) The possibility to include additive model terms such as the measured noise N ij in the combined model μ ij in (5). By including a description of the noise in the model, we can properly model the spectral coefficients under great influence from the noise (Pintelton & Schoukens 2001;Ljung 1987;Söderström & Stoica 1989). Without this description a stacked method requires large SNRs, resulting in valuable information being discarded in the estimation process.
(ii) Obtaining a valid description of the underlying individual observations that can be tested using standard model validation procedures (Ljung 1987;Söderström & Stoica 1989;Pintelton & Schoukens 2001;Gelman et al. 2004;Kruschke 2014) such as residual analysis and PPCs described in Section 4.1 and illustrated in Figs 2, 9, 12 and Figs D1-D7. (iii) Proper uncertainty weighting of ith frequency coefficient in the jth spectra, depending on the underlying covariates (see e.g. Figs 9, 10 or 12). Standard stacking based on sample mean results in equal weights for all spectra and over all frequencies.
(iv) Increased robustness against outliers and extreme values using heavy-tailed likelihood functions (see e.g. Gelman et al. 2004;Martinsson 2012;Kruschke 2014). Stacking based on sample mean follows Gaussian assumptions and is therefore sensitive to outliers (Ross & Ben-Zion 2016).

Prior knowledge
There are different alternatives to obtain informed prior distributions. One option is to adopt a full hierarchical Bayesian model, where the source model parameters for all the events in a study are jointly estimated along with the parameters in the prior distribution (see e.g Gelman & Hill 2006;Kruschke 2014;Martinsson & Törnman 2019). This approach is theoretically appealing but unfortunately not practical in this study with 31 640 events. Instead, we take an empirical Bayesian approach (see e.g. Gelman et al. 2004) where the prior knowledge that will be described by our prior distributions is based on parameter estimates obtained from a separate calibration data set. Here parameter estimates from the calibration data set will shape our prior distributions used for estimating the source model parameters of an event. Additionally, we also incorporate hard restrictions of the parameters that are physically motivated (see e.g. Table 1). These restrictions should be considered using any of the above options to obtain informative prior distributions.
Another direction is to adopt non-informative or vague prior distributions (Gelman et al. 2004;Kruschke 2014) to reduce the influence of the prior distribution and consequently neglect available prior information. The estimates of each source model parameters would be driven by the data observed for that particular event only. This would, however, suffer from the same limitations as with classical inference methods, with estimates available only for events providing adequate data. To increase robustness, reduce intercorrelation, and limit the model's flexibility, model parameters are often set to constant fixed values (e.g. n = 2 as in Shearer et al. 2006) or restricted by measurement limitations (e.g. f c restricted to frequency bandwidth as in Huang et al. 2016) based on optimization strategies prior to inference. These approaches can also be seen as empirical Bayesian but with very informative prior distribution (i.e. a Dirac delta distribution centred at the particular parameter value).
The approach taken here is not set parameters to fix values (e.g. n = 2), but instead let the calibration data set shape our prior distributions for the model parameters as we describe in Step 1 and Step 2 (see e.g. Fig. 5)

Empirical compensation
The empirical compensation term is sensor, frequency, spatial and event-size dependent, to account for the heterogeneous rock-mass and unique sensor effects (shown e.g. in Figs 10 and 12). This term reduces the uncertainties of credible intervals of the parameters as well as prediction intervals for the observed spectra, and consequently decreasing the estimated value of σ 2 0 that describes uncaptured variations. Without the empirical term, the proposed model is still applicable but the uncaptured variations (parametrized by σ 2 0 ) would increase in order to describe the unmodelled variations in observed spectra caused by the heterogeneous rock-mass and sensor effects. This is also the case when the combined model (5) is not flexible enough (e.g. when the spectra is flatter than the model allows) to capture the observed spectra, as shown in Fig. E1 where σ 2 0 is significantly larger then zero. This can be compared to the posterior distributions of σ 2 0 in, for example the representative events of the data set shown in Figs D1-D7 when all the 95 per cent HDIs include zero. This is an indication that the model describes the data well and also an indication that data originates from genuine mining-induced seismic events. Here σ 2 0 would likely act as a good additional classification variable to reject non-genuine events like, for example ore-pass noises, mucking, drilling or crushing that are commonly observed in the mine.
Typical challenges in a mining environment such as: induced noises, resonances and wrongful arrival times are all shown for the event in Figs 2, 3, 12 and C1. The results show the strengths of our model in coping with these challenges by using the spatialand size-dependent CEGF and including the observed noise spectra N ij in the combined model, together with proper weighting based on underlying covariates. In Appendix D, we provide additional modelling results in Figs D1-D7 that are more representative of the entire validation data set and for different event sizes ( 0 ). The events are not picked to show challenges and strengths, but instead correspond to the specific event indices matching exactly the: 1st, 5th, 25th, 50th, 75th, 95th and 99th percentiles with respect to size ( 0 ) for all events in the validation data set. These results will provide a more representative picture of the validity and modelling performance in general. For model validation, we assess replicated data from the distribution to detect aspects of the data not captured by the model (see e.g. Gelman et al. 2004) and we found no evidence that would indicate model deficiencies or flaws as seen from a statistical modelling perspective.
In a mining environment there is likely to be considerable Q variations in the rock mass, with lower values in more disturbed rock. These variations are likely to be systematic for certain ray paths and should manifest as slope variations in the residual spectra R ijk if not accounted for. Therefore, these variations are likely to be captured by our spatial and size dependent empirical model E ij , in a similar manner as seen e.g. in the lower left-hand subfigure in Fig. 10.
By observing the residual spectra in Fig. C2, or the model fit for the representative events shown in Figs D1-D7, we can not see any signs of uncaptured Q (i.e. slope) variations in the proposed method. Anyhow, these variations in the slope for the residual spectra R ijk obtained in the model calibration step (see Step 3 in Fig. 5) could be studied further to see, for example, if the attenuation being less per unit travel time for more distant sensors (e.g. shown in Wang 2004;Eberhart-Phillips 2016) or if the slope of the residuals (i.e. R ijk obtained in Step 3) for the ray paths are included in tomographic models the result may prevail interesting anomalies in the rock mass.

Near-field effect and P-wave residues
Source parameter estimation is usually done in the frequency domain. Here we use the S-wave spectra due to their relatively large energy content compared to P-wave spectra in general (see e.g. Haskell 1964;Hofstetter & Shapira 2000;Oye et al. 2005). However, the method could easily be extended to apply for P-wave spectra as well, at least for more distant sensors from the source where the measurement window for the P wave is larger and the P-wave near-field effects are less prominent (see e.g. Vidale et al. 1995;Yamada & Mori 2009). Other common near-field effects like: complex interference patterns and resonances (see e.g. Fig E2) have been reported in Bouchon & Aid (1977), flatter frequency responses (see e.g. Fig E1) in Mikumo & Miyatake (1978) observing n = 1, and larger geometrical decays (Legrand & Delouis 1999) proposing β = 2; may be present in our data set as well. These effects would cause poor model behaviour for observed spectra in the proximity of the source if not compensated for.
The remaining part of the P wave in the observed S-wave spectra is usually neglected due to its low energy content (see e.g . Haskell 1964;Hofstetter & Shapira 2000). However, it will have a notable effect for sensors near the source (Kijko 1994) as the P wave usually contains higher frequencies than the S wave and may cause poor model fit at higher frequencies if not compensated for.
The model we propose here reduces the impact of these effects on estimated parameters with the following measures: (i) Weights the estimates from ith frequency coefficient in jth spectra in (6) with the underlying uncertainties (σ r j and σ t 0 ) obtained from the posterior samples (see Fig. 3) of the hypocentre and origintime distribution (Martinsson 2012), where: σ t 0 represents the uncertainty in origin time (see Fig. 3), will down-weight high frequencies in spectra (6) as shown in Fig. 9, σ r j is the uncertainty in ray length between sensor and hypocentre.
This uncertainty is given as the standard deviation of all credible distances between the jth sensor and the credible hypocentre locations given by the posterior MCMC samples shown in Fig. 3. For sensors near the source, this uncertainty is large relative to its distance, and will therefore downweight the estimates in (6) from spectra in the proximity of the source as shown in Fig. 9.
(ii) Spatial-and size-dependent empirical compensation will partly compensate for these challenges (i.e. P-wave residues and Downloaded from https://academic.oup.com/gji/article/227/1/403/6273640 by guest on 15 July 2021 near-field effects). If similar sized training events near the target reveal similar behaviour then E ij will describe them (see e.g. Figs E1 or E2). Otherwise, if there is large discrepancy between these training events then σ Eij will increase and therefore downweighted in the estimate.
(iii) Truncate the spectra above the upper cut-off frequency (explained more in Appendix A). This is primarily to increase the processing speed, but it also reduces possible impact of high-frequency artefacts from P-wave residue and uncaptured noises.
Additionally, for a properly configured measurement system, the sensors closer to the source are outnumbered by more distant sensors, relative to the source radius (Haenggi 2005;Martinsson & Jonsson 2018). Instead of empirically compensating the observed spectra, with respect to distance and size, an alternative approach for dealing with some of these challenges would be to remove sensors in the proximity of the source in the estimation process as in Oth et al. (2017), at the cost of losing valuable information. A more appealing approach would be to include the source radius as an additional term in (6) that depends on the distance to source, similar to σ 2 r j , parametrized by f c and V (p) (Kijko 1994). This would downweight the nearest sensors even more and therefore reduce the impact of possible near-field effects. Possible near-field effects are today compensated by σ r j together with the size-and distance-dependent empirical compensation where we rely on the outnumbered assumption above. However, we have no evidence that supports increasing the uncertainties for sensors near the source further, since the confidence regions are already valid for these spectra (see e.q. Figs D1-D7). On the contrary, a valid description of the close-to-source spectra suggests that the measures taken in the list above are sufficient.

High frequency falloff rate n
In the calibration Step 1 and Step 2, when n is estimated using a flat positive prior, we obtain a fair amount of events with estimates of n < 1.5 shown as grey dots in Figs 6 and 7. These dots deviate significantly from the trend in the scatter between f c and 0 with respect to the majority of the dots. They are also unrealistic according to Lee et al. (2003), arguing that the radiated energy remains finite only when n > 1.5 under certain assumptions. In the histogram of n in Fig. 13(f), we also see a fair amount of events (≈10 per cent) favouring a flatter spectra with values of n at its lower bound (i.e. n = 1.5). If these low estimates of n are due to: near-field effects (Mikumo & Miyatake 1978); correlation between n and f c (Ross & Ben-Zion 2016;Van Houtte & Denolle 2018); the rupture propagation (e.g. direction or velocity of the cracking) in relation to sensor placement (Kaneko & Shearer 2015); deficiencies in the measurement system [e.g. high-frequency artefacts for these sensors reported in Cochrane et al. (2014) and observed in Figs E3 and D7], we leave to further debate.
The model could be extended further in a hierarchical structure as in Van Houtte & Denolle (2018), to incorporate, for example near-field effects on n over distance (Mikumo & Miyatake 1978). Instead of treating n as a fixed effect it would be more informative to consider it as a random effect and applying, for example group-level regression (Kruschke 2014;Martinsson & Törnman 2019) where the location of the prior distribution of n could depend on distance. Although, in terms of fit we only expect minor improvements by increasing the complexity as proposed, since part of these behaviours will be captured by the empirical compensation.

Spatial and size interpolation coefficients (c 1 , c 2 )
In Table 2, we see that using both size and distance interpolation in the calculation of the weights w k in (14) improves the overall model fit compared to using the average log residual spectra over all calibration events for each sensor (i.e. corresponding to c 1 = 0 and c 2 → ∞). However, using the average is still a good choice compared to no empirical compensation (as shown in Table 2 labelled 'None'), since it will results in memory efficient code and fast processing without the need to store all log residual spectra and avoid computationally heavy interpolation in (14). Using a more complex source model (e.g. Model B) will also improve the general model fit (see Table 2) but it will also impact the behaviour of the source model estimates (see Fig. 13), with increased number of low f c estimates.
A few common alternatives of c 1 and c 2 are evaluated due to study limitations. We use inverse distance weighting because of the simplicity to implement and understand, remarking that further investigation on type of interpolation method would be beneficial to improve the model-fit. Here we simply evaluate c 1 ∈ {0, 1, 2} corresponding to: no distance weighting (i.e. average over all), and the commonly used inverse distance and inverse squared distance interpolation, respectively (see e.g. Martinsson 2012;Wettainen & Martinsson 2014, applying these in similar context). The compensation coefficients seem to favour large values for c 1 and small values for c 2 in terms of model-fit (see Table 2), providing more weight on similar-sized events (c 2 ) and spatially close events (c 1 ) in the calibration set compared to the target event. Since the weighting is jointly combined in (14), and going to the extreme in the proposed direction for either c 1 or c 2 may cause poor fit for individual events, since the weighting may only rely on the closest or the most similar sized event in the calibration set. More complex alternatives like Gaussian process regression (or Kriging, Krige 1951) could also be considered including covariance dependencies on distance and size. Less complex alternatives like the nearest neighbour interpolation could also be considered to reduce computational costs.

Robustness and extreme values
The data used in both calibration and validation are of higher quality compared to the majority of the triggered events observed in the mine (see Section 3). Despite this, we still observe spikes, resonances and poor frequency behaviour in many individual spectra, indicating the importance of a robust likelihood function. Part of these poor behaviours is captured by including the noise N ij and the empirical compensation E ij in the combined model μ ij in (5) and its uncertainty (6). However, we still observe estimates of the normality parameter ν < 30 (see Table 3) for some events, implying the necessity of allowing the data to adjust the tail width in our likelihood function. Robust likelihoods are needed here to reduce biased parameter estimates that would otherwise be the result of extreme values and outliers in the data set. Another alternative to the Student's t-distribution used here, is to consider a mixture of two likelihoods with different dispersion as in Kruschke (2014), Martinsson & Gustafsson (2018) and where the mixing coefficient has the role of the normality parameter and is often called the contamination parameter. This approach may be beneficial from a computational perspective, avoiding the computationally expensive gamma functions in the t-distribution.

C O N C L U S I O N
We present a fully automatic approach for estimating the model parameters describing the S-wave spectra. In comparison to traditional empirical Green's function approaches (Mueller 1985;Hough 1997), our method works for earthquakes of any size where all spectra associated with an event are modelled jointly within a Bayesian framework.
For precise estimates we use a combined empirical Green's function (CEGF), similar as proposed by Shearer et al. (2006), consisting of both the modelled attenuation together with the spatial and event-size dependent empirical compensation, where the model fit significantly improved when favouring data from training events of similar size in the proximity of the source. In addition to Shearer et al. (2006) our new CEGF: (1) do not truncate the calibration data but instead interpolate it with respect to distance and size between target event and calibration events and (2) describes the variation in the estimates of the empirical compensation for proper weighting of the spectral coefficients.
To obtain reliable estimates, we additionally include the underlying uncertainties in ray-length and origin-time obtained from the hypocentre and origin-time distribution (Martinsson 2012) in the Bayesian model. The proposed method is first tested on synthetic data and then calibrated and cross-validated using real mining induced events. The model gives a valid description of observed spectra and we found no evidence of model deficiencies with respect to frequency, distance or size of the earthquakes, using both PPCs and residual analysis. The model and underlying uncertainties are incorporated in a heavy-tailed likelihood function together with informative prior distributions (estimated from calibration data), that together provide a Bayesian solution for robust, precise and reliable estimates of S-wave spectra and corresponding model parameters.

A C K N O W L E D G E M E N T S
The authors would like to thank Rachel Abercrombie, an anonymous reviewer, and Andrew Valentine for their constructive comments and suggestions, which significantly improved the manuscript. The authors acknowledge the financial support from Luossavaara-Kiirunavaara AB (LKAB, Sweden) and strategic innovation program STRIM, a joint venture of VINNOVA, Formas and the Swedish Energy Agency.

DATA AVA I L A B I L I T Y
The data underlying this paper is obtained from the seismic monitoring system in LKAB's underground mine in Kiruna. The data are owned by the mining company and can be accessible by signing an NDA. This can be initiated by contacting either Mikael Myrzell or Matthias Wimmer, both employed at LKAB.

A P P E N D I X A : E S T I M AT I O N O F C U T -O F F -F R E Q U E N C I E S
The cut-off frequencies shown as magenta dots in Fig. 2 defines the frequency intervalf i j in (A1) for jth spectrum whereφ j ,ξ j andσ N j are estimates given bŷ using μ ij in (5) and where ξ j = [ξ 1j , ξ 2j ] is a parameter vector.

A P P E N D I X B : W E A K LY I N F O R M AT I V E P R I O R S U S E D I N E S T I M AT I O N S T E P 1 A N D 2
In the model calibration steps (i.e.
Step 1 and Step 2) we use weakly informative priors to find the informative priors used in the sequential steps (i.e. Step 3 and Step 4). The priors for the parameters in φ = [ 0 , f c , n, Q, β] are here weakly informative with respect to: the event size (see Fig. 6), the literature (see Table 1), and the frequency bandwidth (with f max = 1000 Hz).
where U(a, b) denotes a uniformly distributed parameter between a and b and Exp(λ) denotes exponentially distributed parameter with the expected value at λ −1 .

A P P E N D I X C : S U P P O RT I V E F I G U R E S
To provide additional evidence to manuscript we include four supportive figures (i.e. Figs C1, C2, C3 and C4). Figs C1 and C2 containing all the waveform and residuals, respectively for the events shown in Fig. 2, Fig. 3 and in Fig. 12. Fig. C3 addresses scaling and stress drop with respect to 0 , f c and n. Fig. C4 provide an additional synthetic example using heterogeneous assumptions for Model B.

A P P E N D I X D : R E P R E S E N TAT I V E E V E N T S
Here we provide additional modelling results in Figs D1-D7 that are more representative of the entire validation data set and at different event sizes. The events are not picked to show challenges and strengths, but instead correspond to the specific event indices matching exactly the: 1st, 5th, 25th, 50th, 75th, 95th and 99th percentiles with respect to 0 in the entire validation data set. These results will provide a more representative picture of the modelling performance in general.

A P P E N D I X E : A D D I T I O N A L E V E N T S
Here we provide additional modelling results in Figs E1-E3 that are picked to show additional challenges and strengths with the data and proposed method.  Fig. 2, Fig. 3 and in Fig. 12. The seismograms are plotted on the vertical axis at distance r between sensors and hypocentre and coloured by peak amplitude. Black vertical line marks the origin time and the red seismograms highlight both traces shown in Fig. 2 Figure C2. Residual plot using Model B for the event in Figs 2, 3, 12 and C1 where the radial and tangential axis in the polar plot shows the distance r j and azimuth (where 0 • and 90 • correspond to the x-and y-axis, respectively, in the mine's coordinate system shown in Fig. 1), respectively with respect to the hypocentre of the triggers sensors (see Fig 3). Upper right-hand figure show the elevation between hypocentre and sensor (negative values correspond to sensors below the hypocentre), with respect to distance. Lower right-hand figure shows the residual spectra R i j = log Y i j − μ i j (φ) for the event, wherê φ is the posterior mode of φ. All figure are coloured by the mean of the residual spectra (i.e. R j = mean(R i j ) = |I| −1 ∀i∈I R i j ) for corresponding jth sensor.   Figure C4. Bayesian inference of 10 synthetic spectra generated from an arbitrary point p = [2223, 6247, −1048] to arbitrary sensors in the mine, with our heterogeneous velocity model V(p, j) and empirical rock mass model described by E(φ, p, j) and σ 2 E i j (φ, p, j). The inference is provided with Model B. The lower left-hand subfigure depicts 10 noisy synthetic spectra Y ij coloured by distance r j to source, given predefined model parameters and noise characteristics (φ, N ij ) defined in the legend. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The centre plot in the last row show the true source spectra S i (in cyan) shown together with the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey). The plots in the right-hand most column are the synthetic spectra for closest sensor (this sensor is not triggered by the training data set, since E = 0), middle sensor and furthest sensor from top to bottom coloured by distance, together with the estimate of the observed spectraŶ = exp(μ i j (φ)) (in black) using the posterior modeφ and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal.  Figure D1. Bayesian inference of the event index corresponding to the 1st percentile with respect to 0 in the entire validation data set. Estimated using Model B, where c 1 = 2 and c 2 = 1 for the weighting of E ij . The lower left-hand corner depicts the observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey), whereφ is the posterior mode. The plots in the right-hand most column are the observed spectra for closest sensor, middle sensor and furthest sensor coloured by distance, together with the estimate of the observed spectrâ Y (in black) and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal.    Figure D3. Bayesian inference of the event index corresponding to the 25th percentile with respect to 0 in the entire validation data set. Estimated using Model B, where c 1 = 2 and c 2 = 1 for the weighting of E ij . The lower left-hand corner depicts the observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey), whereφ is the posterior mode. The plots in the right-hand most column are the observed spectra for closest sensor, middle sensor and furthest sensor coloured by distance, together with the estimate of the observed spectrâ Y (in black) and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal.   Figure D4. Bayesian inference of the event index corresponding to the 50th percentile with respect to 0 in the entire validation data set. Estimated using Model B, where c 1 = 2 and c 2 = 1 for the weighting of E ij . The lower left-hand corner depicts the observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey), whereφ is the posterior mode. The plots in the right-hand most column are the observed spectra for closest sensor, middle sensor and furthest sensor coloured by distance, together with the estimate of the observed spectrâ Y (in black) and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal.   Figure D6. Bayesian inference of the event index corresponding to the 95th percentile with respect to 0 in the entire validation data set. Estimated using Model B, where c 1 = 2 and c 2 = 1 for the weighting of E ij . The lower left-hand corner depicts the observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey), whereφ is the posterior mode. The plots in the right-hand most column are the observed spectra for closest sensor, middle sensor and furthest sensor coloured by distance, together with the estimate of the observed spectrâ Y (in black) and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal.   Figure D7. Bayesian inference of the event index corresponding to the 99th percentile with respect to 0 in the entire validation data set. Estimated using Model B, where c 1 = 2 and c 2 = 1 for the weighting of E ij . The lower left-hand corner depicts the observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey), whereφ is the posterior mode. The plots in the right-hand most column are the observed spectra for closest sensor, middle sensor and furthest sensor coloured by distance, together with the estimate of the observed spectrâ Y (in black) and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal.   Figure E2. Bayesian inference of one validation event containing resonances for the closest sensor to source. Estimated using Model B, where c 1 = 2 and c 2 = 1 for the weighting of E ij . The lower left-hand corner depicts the observed observed spectra Y ij coloured by distance r j to the estimated hypocentrep. The scatter plots to the right of the diagonal shows the MCMC samples, and also depict the pairwise correlations ρ (with 95 per cent CI) of credible values of 0 , f c and n in φ. Diagonal figures are histograms of posterior samples with corresponding 95 per cent HDI. The middle plot in the last row shows the estimated source spectraŜ i = S i (φ) (in black) and 95 per cent HDI (in grey), whereφ is the posterior mode. The plots in the right-hand most column are the observed spectra for closest sensor, middle sensor and furthest sensor coloured by distance, together with the estimate of the observed spectraŶ (in black) and 95 per cent HDIs (in grey). Histograms for the statistical parameters ν and σ 2 0 are also provided left of the diagonal. Downloaded from https://academic.oup.com/gji/article/227/1/403/6273640 by guest on 15 July 2021