Seismic wavefield imaging based on the replica exchange Monte Carlo method

Masayuki Kano,1 Hiromichi Nagao,1,2 Daichi Ishikawa,2 Shin-ichi Ito,1 Shin’ichi Sakai,1 Shigeki Nakagawa,1 Muneo Hori1 and Naoshi Hirata1 1Earthquake Research Institute, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan. E-mail: nagaoh@eri.u-tokyo.ac.jp 2Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan


I N T RO D U C T I O N
Rapid prediction of seismic damage to infrastructure, for example, tall buildings, water and gas pipelines, highways, and power plants, based on numerical simulations is one of the key next-generation disaster reduction technologies (e.g.Fujita et al. 2014).Rapid prediction of seismic damage facilitates the assessment of the overall damage caused by earthquakes, which facilitates rapid rescues and effective recoveries.The United States Geological Survey provides ShakeMaps, which provides rapid access to maps of the intensities and ground motions excited by large earthquakes, for this purpose (e.g.Wald et al. 1999Wald et al. , 2006)).Similarly, in Taiwan, the Taiwan Earthquake Loss Estimation System software is used to assess seismic intensities, infrastructure damage, and casualties (Yeh et al. 2006).
Infrastructure damage is sometimes evaluated by analysing the seismic responses of structures due to ground motions.This type of analysis requires seismic waveforms at the bases of target structures as initial conditions, for use in numerically computing the motions of the structures.Because direct observations of the ground motions at all structures are not realistic, spatially uniform waveforms based on past earthquakes are often assumed in practical computations.This assumption is obviously invalid for realistic evaluation of seismic disasters, so a method for estimating the input waveforms at every structure from seismograms at nearby observatories is required.Our problem can be formulated as 'seismic wavefield imaging,' which is estimation of seismic wavefields utilizing an array of seismometers that are distributed more sparsely than the structures.
A number of 'physics-based' approaches, which solve the wave equation numerically using hypocentre information and models of subsurface structures, have been proposed for seismic wavefield imaging (e.g.Boore 1972;Graves 1996;Aoi & Fujiwara 1999;Pitarka 1999;Hisada & Bielak 2003;Koketsu et al. 2004;Ichimura et al. 2007).For example, Ichimura et al. (2014) conducted highperformance computations related to seismic wave propagation and urban earthquake response analysis for Tokyo, Japan, based on highfidelity subsurface and infrastructure models.The seismic wavefield and responses estimated by such physics-based approaches would be the most reliable if the assumed models were accurate.However, the practical limitations on their accuracy result in significant differences between the simulation results and observations, especially in the high-frequency band.
Interpolation of observation data is often carried out for seismic wavefield imaging when physics-based approaches are unavailable or obviously unable to produce satisfactory solutions.Methods for interpolation are broadly classified as non-stochastic or stochastic approaches.As an example of the former, Kawakami (1989) constructed a non-stochastic interpolation method that solves an optimization problem, assuming that the optimum wavefield satisfies a given cross-correlation function and is strictly consistent with observed waveforms.In contrast, stochastic approaches treat observed waveforms as realizations of a stochastic field in space and time, and estimate the stochastic field conditional on observations (Vanmarcke & Fenton 1991;Morikawa & Kameda 1993;Kameda & Morikawa 1994).For example, Kameda & Morikawa (1992) developed a method that estimates a conditional probability density function (PDF) related to Fourier coefficients of waveforms at a place of interest.Sato & Imabayashi (1999) developed a method that combines a Kalman filter (Kalman 1960) and an autoregressive model to sequentially calculate ground motions in the time domain.We term such methods, which carry out seismic wavefield imaging based only on observation data without using physical models, as 'data-driven' approaches.
Our practical target area for seismic wavefield imaging is the Tokyo metropolitan area of Japan, in which a dense seismological network named MeSO-net (Metropolitan Seismic Observation network) has been in operation since 2007.The MeSO-net comprises 296 accelerometers, located at intervals of a few kilometres, that continuously record seismograms at a sampling rate of 200 Hz (e.g.Hirata et al. 2009;Sakai & Hirata 2009).Mizusako et al. (2014) proposed an imaging method using lasso (least absolute shrinkage and selection operator; Tibshirani 1996), which is capable of automatically selecting dominant explanatory variables in a linear regression model, owing to a regularization term given as an L1-norm of unknown coefficients added to the loss function.They showed that the estimated ground motions explained the real observations well at frequencies lower than 0.20 Hz.Data-driven approaches generally suffer from reproduction of the wave propagation at higher frequencies, so that they perform better when physical constraints, such as the wave equation and subsurface structure models, are incorporated.
In this study, we adopted a seismic wavefield imaging method that combines both physics-based and data-driven approaches.
Combined methods such as the method used in this study have been developed as full waveform inversions (e.g.Tarantola 1984;Pratt et al. 1998;Dahlen et al. 2000;Askan et al. 2007;Métivier et al. 2016), which deterministically estimate subsurface structures by reducing the computational cost based on the scattering integral method (e.g.Zhao et al. 2005;Chen et al. 2007) or the adjoint method (e.g.Tarantola 1988;Tromp et al. 2005;Tape et al. 2010).For example, Tape et al. (2009) applied an adjoint method to the estimation of a crustal model of southern California, using 6800 waveform forward calculations.However, these methods are very likely to go to one of the local maxima, depending on the initial values.Furthermore, an evaluation of the estimation errors for the subsurface structures is usually skipped to avoid the complex computations required.Full waveform inversion studies seek to estimate a fine subsurface structure model.In this study, in contrast, our objective is to obtain a seismic wavefield for use as ground motion input in the rapid evaluation of seismic hazards.Therefore, we employed an approach based on information science to obtain a PDF related to source information and subsurface structures.This approach provides not only optimum estimates of the model parameters but also their uncertainties.The seismic wavefield is reproduced based on a set of optimum model parameter values that attains the PDF maximum.The model parameters included are the seismic velocities and the thickness of each layer in a 1-D subsurface structure model, in addition to source information such as the location of the hypocentre and the origin time.The method adopted in this study was the replica exchange Monte Carlo (REMC) method (e.g.Swendsen & Wang 1986;Geyer 1991;Iba 2001;Earl & Deem 2005;Nagata & Watanabe 2008), which is one of the Markov chain Monte Carlo (MCMC) methods (e.g.Metropolis et al. 1953).MCMC is the generic name for algorithms capable of obtaining realizations, which we call 'samples', from a target PDF based on the Markov process.The Metropolis-Hastings (M-H) algorithm (Hastings 1970), which is an ordinary MCMC method, can easily be trapped in a local lobe if the PDF is multimodal.For example, Nagao et al. (2013) applied the M-H algorithm to an optimization problem involving observation design in reservoir engineering, and showed that a more advanced sampling method was required to avoid getting trapped in a local lobe of the multimodal posterior PDF.The new contribution made in this study was the application of the REMC method, which is capable of obtaining samples much more effectively than the M-H algorithm and therefore overcoming the trapping problem, to seismic wavefield imaging for the first time.The REMC method contributes to better seismic wavefield imaging than the previously proposed methods by estimating the PDF related to the source information and the local subsurface structure.The proposed method, based on both physics-based and data-driven approaches, produces a seismic wavefield that quantitatively explains observations that satisfy the wave equation.
The remainder of this paper is organized as follows.Section 2 presents the formulation of a Bayesian inversion for obtaining the posterior PDF related to the model parameters, and describes the sampling algorithm applied to the PDF.Section 3 presents the results of numerical experiments conducted to verify the proposed method.Section 4 demonstrates that a 1-D subsurface structure model is adequate for the purpose of seismic wavefield imaging and compares the seismic wavefield obtained using the proposed method, the Metropolis method, and ordinary kriging, to demonstrate the importance of incorporating physics-based approaches.Section 5 presents the conclusions drawn from the results of this study.

Bayesian inversion
Let m be a model parameter vector that contains all model parameters, and d be a data vector that includes all observation data.Bayesian estimation of the unknown model parameters conditional on the observations, is based on Bayes' theorem: where p(d) is a PDF of the observations, p(m) is a prior PDF of the model parameters, the conditional PDF p(d|m) is a likelihood function, and the conditional PDF p(m|d) is a posterior PDF of the model parameters.Note that p(d) is constant because observations are fixed values, and hence the posterior PDF is proportional to the product of the prior PDF and the likelihood function.The likelihood function measures the degree of fit between the observations d and the theoretical values calculated using the model parameters m.When the estimation error follows an M-dimensional normal distribution function, the likelihood function is defined as follows: where T denotes transposition, is a variance-covariance matrix, s is a state vector that contains all variables involved in a given numerical simulation model, and g(m,s) is a non-linear operator that transforms the state s into physical quantities comparable to the data vector d.The prior PDF p(m) represents a priori information about the model parameters, subjectively assumed in accordance with experience and intuition.Bayes' theorem (eq. 1) states that the prior PDF is updated to the posterior PDF by considering the observation data in the form of a likelihood function.

M-H method
MCMC methods, which are a group of methods for sampling from a target PDF p(x), are powerful techniques, particularly in that p(x) can be computed for a given x even though the functional form of p(x) is beyond comprehension, due to the complexity of its definition.MCMC methods ensure that a set of samples converges to p(x) if the detailed balance equation, holds for all x and x * , where π (x → x * ) is a transition probability that is expressed as the product of a proposal PDF q(x * |x) and an acceptance ratio.The proposal PDF, given in advance, generates a candidate for a new sample x * with respect to the current sample x, and acceptance or rejection of the candidate is determined in accordance with the acceptance ratio.Eq. (3) represents the sufficient condition for the target PDF p(x) to be stationary, conditional on the transition probability π .This paper proposes the REMC method, which is based on the M-H method (e.g.Liu 2008), for seismic wavefield imaging.The M-H method consists of the following five steps: Step 1: Set x (0) as an initial value of x.
Step 2: Propose a candidate x * according to a given proposal PDF q(x * |x (n) ) with respect to the current sample x (n) .For example, if the proposal PDF is a normal distribution function, that is, q(x * |x (n) ) = N (x (n) , diag(β 2 m )) (m = 1, 2, . . ., M), the candidate is represented as follows: That is, the standard deviation β m represents the scale of the distance between x (n) and x * in the m-th component.
Step 3: Calculate the ratio r using the proposal and target PDFs: Step 4: Determine whether the candidate x * is to be accepted or rejected, with a probability of min(1,r).If accepted, the current sample moves to the candidate, that is, x (n+1) = x * ; otherwise the current sample remains, that is, x (n+1) = x (n) .
Step 5: The sequence of x (n) (n = 0, 1, . . ., N) values obtained are the objective samples drawn from the target PDF.
When the proposal PDF is symmetric in form, that is, q(x * |x (n) ) = q(x (n) |x * ), the algorithm is called the 'Metropolis method'.The M-H method, including the Metropolis method, works sufficiently well if the target PDF is unimodal but works poorly in the case of a multimodal PDF because of the potential for being trapped in local lobes.The target PDF is commonly multimodal in practical MCMC applications.Therefore, this paper proposes the REMC method in place of the M-H method in order to avoid the trapping problem.

REMC method
We consider again the sampling process described above, that is, obtaining samples x (n) drawn from the target PDF p(x).Let { p i (x i |T i )} i=1,2,...,I be a family of PDFs defined as follows: where the subscript i is termed the 'chain number', and Note that the case of T 1 corresponds to the target PDF and that the functional form of the PDF becomes smoother as the chain number or the temperature increases.
The REMC method obtains the samples from the target PDF by the following procedure: Step 1: Set an initial sample x (0) i for the i-th chain.For n = 0, 1, . . ., N−1, repeat Step 2 to Step 4.
Step 2: Obtain the next sample x (n+1) i in each chain using the M-H method described in Section 2.2.
Step 3: After every several steps, randomly select a pair of consecutive i-th and (i+1)-th chains (1 ≤ i ≤ I − 1), and calculate the ratio r as follows: The denominator represents the probability of the current sample, whereas the numerator represents the probability when the current samples are exchanged between the two chains.
Step 4: Exchange the current samples between the two chains, that is, x i+1 , with a probability of min(1,r).
Step 5: Obtain the objective set of samples drawn from p 1 (x 1 |T 1 = 1), which is consistent with the original target PDF p(x).
Both the REMC and the M-H methods are theoretically ensured to produce a PDF estimated from the obtained samples that converges to the target PDF as the number of samplings increases.However, the M-H method sometimes requires an enormous number of iterations when the target PDF is multimodal.A significant advantage of the REMC method is that the samples can be obtained effectively even from a multimodal PDF because a parallel search in a wide parameter space becomes possible, owing to the PDF family being annealed by various temperatures.This advantage of the REMC method can drastically reduce the computation time needed to obtain a sufficient number of samples, compared to the M-H method.

Application example
We demonstrate the effectiveness of the REMC method in comparison to that of the Metropolis method for the case of a 1-D target PDF: where μ 1 = 10, μ 2 = 0, μ 3 = −10, σ 1 = 1/2, σ 2 = 1 and σ 3 = 3/4.This PDF has three modes at x = μ 1 , μ 2 and μ 3 (Fig. 1a).A family of eight PDFs used in the REMC method is defined as shown in eq. ( 6) in which the temperature is set to T i = 2 i−1 (i = 1, 2, . . ., 8).Fig. 1 shows the family of PDFs for the cases of T i = 1, 4, 16 and 64, indicating that the PDF becomes smoother as the temperature increases.Fig. 2 shows the sampling processes for the REMC and Metropolis methods, run for 5 × 10 5 and 2.5 × 10 9 MCMC steps, respectively, starting from the initial sample x (0) = −17.0.The sequence of samples in the REMC method continuously jumps when exchanges between chains occur (Figs 2a and b), which is rarely seen in the case of the Metropolis method (Figs 2c and d).The sequence of samples obtained before 100 MCMC steps are performed is clearly affected by the initial sample (Fig. 2b).This period is termed 'burn-in', and the samples obtained during the burn-in are generally not included in the final objective samples.The histograms produced from the sample sequences obtained by both methods (Figs 2a and d) show three lobes (Figs 3b and c), which indicates that both methods successfully achieve convergence to the target PDF (Fig. 3a) with a sufficient number of MCMC steps.However, the bimodal histograms shown in Figs 3(d) and (e) indicate that the Metropolis method fails to converge within N steps, where N (= 5 × 10 5 ) is the number of MCMC steps needed to achieve convergence with the REMC method, or even within N × I (= 4 × 10 6 ) steps, at which the number of forward computations is the same for both methods.This is because the sample is often trapped in a local lobe and needs a long time to approach the global maximum.A larger variance of the proposal PDF could help the current sample move to another lobe, but a dense sampling in the regions of three lobes would be difficult with such a large variance.The Metropolis method eventually reproduces the three lobes in 3 × 10 8 steps (Fig. 3f), but the histogram is clearly different from the target PDF (Fig. 3a).This example indicates that the REMC method is much more effective than the Metropolis method when sampling from a multimodal target PDF, owing to the exchange of the current samples between two PDFs with different temperatures.

Design of the twin experiment
We conducted numerical tests using synthetic observation data in order to verify the proposed method.The synthetic data at each observation site were theoretical seismic waveforms, to which white noise was added.Throughout this study, we used the code developed by Hisada (1995) to calculate the theoretical waveforms numerically, based on the discrete wave number method (Bouchon 1981), in which seismic waveforms are represented as wave number integrals of Green's functions.This code calculates the Green's functions of a layered half-space without numerical instabilities, even at higher frequencies, by introducing the reflection and transmission matrix (Luco & Apsel 1983;Hisada 1994).Therefore, theoretical waveforms are obtained with less computational cost, compared to the finite difference method or finite element method, assuming a 1-D subsurface structure model and given source information.This simple subsurface structure model is adequate for our purpose, seismic wavefield imaging, as discussed later in Section 4. The assumed subsurface structure model, source information, and computed waveforms are regarded as the 'true' model, the 'true' source and the 'true' waveforms, respectively.We then applied our  method to the synthetic data and determined whether or not the true model, the true source and the true waveforms were reproduced.This type of numerical test is termed a 'twin experiment' in this paper.
Table 1 summarizes three twin experiments conducted in this study, assuming a horizontally layered subsurface structure model.Test 1 involved the use of a half-space subsurface structure model (Fig. 4a), that is, the model parameters were the seismic velocities (Vp and Vs).This simple setting helped us to understand the fundamental characteristics of the problem.Test 2 involved the use of a multi-layered subsurface structure model consisting of three layers and a half-space (Fig. 4b).The model parameters were the seismic velocities (Vp and Vs) and the thickness (h) of each of the three layers, that is, nine parameters in total.The seismic velocities in the half-space were given a priori in this test because the waveforms at the surface were more sensitive to the shallower structure.The model was a 1-D approximation of the Kanto Basin, in Japan, in which the Tokyo metropolitan area is located, considering a future application of the MeSO-net data.Test 3 involved estimating the source information, that is, the source location and origin time in this case, together with the subsurface structure, using the same subsurface structure model as in Test 2 (Fig. 4b).Therefore, the total number of model parameters to be estimated was 13.
In each twin experiment, we computed acceleration waveforms at the 16 observation sites shown in Fig. 5, assuming a given hypocentre solution (Table 2) and a subsurface structure model.The true model parameters are summarized in Tables 3-5.The geometric setting of the observation sites refers to the actual distribution of the MeSO-net observatories.The synthetic acceleration data to which observation noise was added can be described as follows: where d true contains the true waveforms, ε denotes observation noise that is assumed to follow a normal distribution function with a mean of 0.0 m s −2 and a standard deviation of σ = 1.0 × 10 −4 m s −2 , and I is the unit matrix.The duration of the synthetic waveforms was 40 s, starting from the origin time of an earthquake, and the sampling rate was 25 Hz.The REMC method was applied to the synthetic data to obtain samples from the target posterior PDF p(m|d), starting from the initial values summarized in Tables 3-5.The part of the prior PDF p(m) related to the subsurface structures and the source information was assumed to follow a normal distribution function; (i) the mean seismic velocity was 10 m s −1 higher than the true value, and the standard deviation was 500 m s −1 for Vp and Vs, (ii) the mean layer thickness was 10 m greater than the true value, and the standard deviation was 1000 m for each layer, (iii) the mean source location was (X m , Y m , Z m ) = (−5, 15, 20 km), that is, 5 km away from the true source location (X 0 , Y 0 , Z 0 ) = (−10, 10, 25 km) in each direction, and the standard deviation was 10 km for each direction.This prior PDF related to a source location can be established immediately after the Japan Meteorological Agency (JMA) reports the source information for an earthquake, letting the source location (X m , Y m , Z m ) be the JMA solution.The posterior PDF is assumed to be proportional to the product of the prior PDF and the likelihood function p(d|m): where d cal (m) is a vector that contains the theoretical waveforms (Hisada 1995), using the model parameter vector m.In the REMC sampling process for Tests 1-3, the numbers of chains were 8, 16, and 16, respectively, and the temperatures in the i-th chain were T i = 4 i−1 , T i = 2 i−1 and T i = 2 i−1 , respectively (Table 1).All of the standard deviations β m (m = 1, 2, . . ., M) of the proposal PDF (eq.4) in the i-th chain were assumed to be equal to a single value β = √ T i .

Test 1: half-space subsurface structure model
In Test 1, the REMC and Metropolis methods were run for 20 000 MCMC steps to obtain samples from the posterior PDF related to the seismic velocities (Vp and Vs).Fig. 6 shows the sample sequences (Figs 6a and b) and the values of the posterior PDF (Fig. 6c) before 1000 MCMC steps, indicating that the sequences become stationary after the burn-in, which corresponds approximately to the initial 200 steps.The histograms in Fig. 7, which were estimated from the samples obtained, indicate that in the case of the REMC method (Figs 7a and c) almost all of the samples were concentrated near the true values (Table 3), which is clearly much better than the results obtained using the Metropolis method (Figs 7b and d).Fig. 8 shows how the current sample moves within the parameter space when the REMC or the Metropolis method is running, compared to the actual posterior PDF obtained by a grid calculation.Note that such a grid calculation becomes more difficult as the number of model parameters increases.The sample sequence obtained by the REMC method rapidly approaches the true model parameter values, that is, the REMC method easily finds the main lobe of the posterior PDF.In contrast, the sample sequence obtained by the Metropolis method wanders in a local lobe of the posterior PDF.If we could obtain an infinite number of samples by the Metropolis method, the ensemble of the samples would, in principle, reproduce the target PDF as shown in Section 2.4, and the optimum sample representing the true values would be found.However, this is unrealistic in the case of a multimodal PDF because of the difference in posterior values between the global maximum and the local maximum O(10 5 ) on a logarithmic scale, and the current sample has difficulty escaping from the local maximum using the Metropolis method.The results for Test 1 indicate that the exchange step in the REMC method is a powerful way to obtain samples from a broad parameter space, including the region of the global maximum, and avoid getting trapped in local lobes, as occurs with the Metropolis method.Note that the posterior PDF is multimodal, as shown by the colour contours in Fig. 8, even in the case of this simple subsurface  model.The horizontally elongated structure of the colour contours indicates that Vs is more sensitive to the posterior PDF than Vp is.The likelihood function (eq.10) is defined as the difference in amplitudes; therefore, the S-wave, which has a larger amplitude than the P-wave, is considered to have a greater effect on the posterior PDF.
Ground motion at a place of interest can be reproduced from the samples obtained of the posterior PDF.To reproduce the waveforms, we used a set of optimum model parameters (Table 3), obtained by averaging the top 50 samples, which provided the first to fiftieth largest values of the posterior PDF.Fig. 9(a) compares the reproduced waveforms at observation site #6 (Fig. 5), obtained by both sampling methods with the synthetic waveform.The figure shows that the REMC method successfully explains the observation.In contrast, the waveform reproduced by the Metropolis method is obviously different from the observation, shifted in time by one wavelength.This causes a local maximum in the posterior PDF because the phases of the two waveforms are relatively similar, although the parameters estimated by the Metropolis method are significantly different from the true parameters.Fig. 9(b) shows the waveforms obtained by both methods at point A (Fig. 5), at which no observation data are recorded, in comparison to the true waveform.This indicates that the REMC method accurately reproduces the waveform, even when no observation data are obtained at the point of interest, owing to the global search of the model parameters.

Test 2: multilayered subsurface structure model
In Test 2, the REMC and Metropolis methods were run for 40 000 MCMC steps to obtain samples from the posterior PDF for the case of the multilayered subsurface structure model (Fig. 4b).The histograms in Fig. 10 indicate that the REMC method successfully obtains the majority of samples close to the true values (Table 4) in the cases of Vs and the thicknesses for all layers, whereas the Metropolis method fails to do so.However, the REMC samples for Vp are not close to the true values.The reason for this is considered to be that Vp is less sensitive to the posterior PDF than Vs, as mentioned in Section 3.2.
Fig. 11 shows the theoretical waveforms at observation site #6 and point A (Fig. 5), reproduced from the optimum model parameters (Table 4), compared to the true waveforms.Note that the synthetic observation at site #6 is the true waveform contaminated by observation noise, whereas no observation data are recorded at point A. The waveforms reproduced by the Metropolis method appear to be coincident with the true waveforms (the top panels in Figs 11a and  b), although the optimum model parameters are not close to the true values (Table 4).However, in the case of the REMC method, the differences between the reproduced and true waveforms (the bottom two panels in Figs 11a and b) are much smaller than in the case of the Metropolis method.This finding suggests the important conclusion that usage of the Metropolis method is risky in this case because it is highly possible for the optimum model parameters obtained to be significantly different from the true parameters, even though the reproduced waveforms are consistent with the observations.In contrast, the REMC method is more successful than the Metropolis method in obtaining optimum model parameters that are close to the true values.

Test 3: multilayered subsurface structure model and source information
In Test 3, the REMC and Metropolis methods were run for 40 000 MCMC steps to obtain samples from the posterior PDF of the model parameters related to the subsurface structure model and the source information.The optimum parameter values obtained by the REMC and Metropolis methods, shown in Table 5, indicate that the estimated source location is close to the true one, while the subsurface structures are not well determined.The model parameters related to the source location approach the true values within 3000 MCMC steps (Fig. 12).This means that the source location is much more sensitive to the posterior PDF than the seismic velocities or the thicknesses of each layer, because the source location determines the direction of the wave front.be used to determine the source location well but not to detemine the subsurface structures.Waveforms due to multiple earthquakes, which propagate from various directions, are essential for estimation of the subsurface structure model.Although the subsurface structure model is not well estimated, both the REMC and Metropolis methods is found to successfully reproduce the true seismic wavefield, compared to a pure data-driven approach, by introducing the physics-based approach and estimating the true source information.This is demonstrated in Section 4.

D I S C U S S I O N
The proposed method employs a 1-D subsurface structure model to reproduce waveforms.This section shows that such a simple model is adequate for the purpose of seismic wavefield imaging needed for rapid evaluation of seismic hazards.The proposed method, based on a 1-D model, was applied to synthetic waveforms generated using a 3-D structure model.The assumed area was 200 km (X) × 200 km (Y) × 100 km (Z), discretized with a grid spacing of 100 m, and the subsurface structure model was the Japan Integrated Velocity Structure Model (Koketsu et al. 2008(Koketsu et al. , 2012)).The synthetic data set was a set of acceleration waveforms with a duration of 40.92 s and a time interval of 0.005 s, generated using the finite difference method (FDM, e.g.Furumura & Chen 2004, 2005;Maeda & Furumura 2013), assuming the source information summarized in Table 6.The REMC was applied to the waveforms within a frequency band of 0.05-0.30Hz, under the same experimental conditions as used in Test 3 (Table 1).The prior PDF was assumed to follow a normal distribution function with the mean shown in Table 7 and the same standard deviation as in Test 3, and the initial values for the REMC method were sampled from the prior PDF.
Fig. 13 compares the observed waveform with the theoretical one reproduced based on the optimum model parameters obtained by the REMC method.This indicates that the direct wave between 10 and 20 s is well reproduced, while the subsequent wave that arrives at after 20 s, which was reflected at the plate boundaries, is not coincident.Fig. 14 shows the seismic wavefields at t = 17.36 s and t = 26.96s, when the direct and reflected seismic wave pass through the target region, clearly indicating the above characteristic (see Fig. S1 in the Supporting Information for movies of the seismic wavefield imaging).The reason for this is that a 1-D subsurface structure model does not consider the plate boundaries.Considering that our objective was rapid evaluation of damage to infrastructures, Downloaded from https://academic.oup.com/gji/article-abstract/208/1/529/2417077 by guest on 26 December 2018  estimation of direct waves with larger amplitudes than the reflected waves is adequate for our purpose.
To demonstrate the importance of incorporating physics-based approaches in seismic wavefield imaging, we also conducted kriging (e.g.Stein 2012), which is a classical data-driven interpolation method for spatial data.The kriging assumes that the physical quantity of interest Z at a target location x 0 can be represented as a weighted average of data observed at other locations x n (n = 1, 2, . . ., N): where the weights satisfy N n=1 w n = 1.The weights are determined based on the spatial characteristics of data, termed the 'variogram' in the following procedure.
Step 1: Calculate the variogram γ , which is defined as a function of the distance between two observation sites x i and x j , for all pairs of observation sites:     Step 2: Estimate the parameters involved in a given model function that approximates the variogram obtained in Step 1.We adopted the following spherical model (e.g.Journel & Hujibregts 1978): where a and b are the parameters to be estimated.This model is referred to as the 'theoretical variograms'.
Step 3: Determine the weights using the theoretical variogram by solving the following linear equation: where λ is a Lagrange multiplier.This equation is derived by minimizing the prediction errors at the target location (e.g.Stein 2012).
Using the weights determined, eq. ( 11) is used obtain the objective physical quantity at the target location.
We carried out seismic wavefield imaging by applying kriging, in the frequency domain, to the synthetic observation data in Test 3. The kriging spatially interpolates the amplitudes and phases in each frequency component and then obtains the waveforms at the target location in the time domain.We then compared the results of the seismic wavefield imaging with those obtained using the REMC and Metropolis methods, as described in Section 3.4.
Fig. 15 shows the seismic wavefields in the frequency band of 0.05-0.30Hz at t = 11.92 s from the origin time and compares the results of the REMC and Metropolis methods and kriging with the true wavefield (see the Supporting Information Fig. S2 for movies of the seismic wavefield imaging).The kriging seems to approximately reproduce the overall amplitude features, although the shape of the wave front is irregular compared to the true wavefield.In contrast, the wavefields reproduced by the REMC and Metropolis methods are consistent with the true wavefield.To quantitatively evaluate the difference between each of the reproduced wavefields and the true wavefield, we calculated the spatial distribution of the root mean square errors (RMSE) (Fig. 16).The RMSE at a point (x, y) is defined as follows: where K is the number of data elements in one component at that point, that is, K = 1000 (= 40 s × 25 Hz) in this case; A est (k, x, y) contains the estimated amplitudes of all three components at the k-th time step; and A true (k, x, y) contains the amplitudes of the true wavefield.Fig. 16 clearly indicates that, even in this low frequency band, the REMC and Metropolis methods perform much better than kriging, which performs well only near observation sites.
When we focus on the full frequency band, the differences between the methods become more apparent.Fig. 17 shows the seismic wavefields of the P-wave for the up-down component at t = 7.52 s, and Fig. 18 shows those of the S-wave for the two horizontal components at t = 11.92 s (see Supporting Information Fig. S3 for the movies).The kriging clearly fails in the imaging; the wave fronts are nowhere to be seen.Further investigation of this result shows a striped pattern in the RMSE spatial distribution (Fig. 19c), which indicates that the estimation works well only near observation sites.In contrast, both the REMC and Metropolis methods appear to successfully reproduce wavefields that are consistent with the  true wavefield throughout the full frequency band (Figs 17 and 18).However, a striped pattern still remains in the RMSE distribution when the Metropolis method is used (Fig. 19b), while the REMC method successfully removes the striped pattern by performing much better at obtaining the parameters than the Metropolis method (Fig. 19a).
In summary, the results indicate that physics-based approaches are essential for seismic wavefield imaging because data-driven   approaches such as kriging fail easily, even in the low frequency band.When using physics-based approaches, unknown parameters involved in the physical models must be estimated from observation data.MCMC methods enable us to obtain a posterior PDF related to such parameters, and the REMC method in particular performs much better than the Metropolis method throughout the full frequency band.
The following issues should be considered in practical applications of the REMC method to the MeSO-net data.( 1 unexpectedly large number of samplings or misses the global maximum, even though the REMC is able to effectively obtain samples from a multimodal PDF.In fact, we could not find the global maximum in Test 2 within 40 000 MCMC steps when we used initial values that were twice as large as the true values.In practical applications, appropriate initial values should be selected based on prior knowledge, such as available standard subsurface models, for example, Koketsu et al. (2009), for the Tokyo metropolitan area of Japan.
(2) Computational cost: In all of the twin experiments described in Section 3, computation times of a few days were required to obtain sufficient numbers, that is, O(10 4 ), of samplings.To accomplish real-time seismic wavefield imaging, the number of samplings should be reduced to at least O(10 3 ).Therefore, we intend to apply the REMC to the waveforms related to a number of past earthquakes in advance and let it learn the appropriate model for the subsurface structures in the target region.This could produce a better initial model for use with the REMC in application to real-time data, and could reduce the number of forward calculations.(3) Source location: In Japan, JMA reports a preliminary solution for the source location soon after an earthquake takes place, and the uncertainty associated with the hypocentre location is on the order of several kilometres.This information can be used to produce a prior PDF that follows a normal distribution function, with the mean being the preliminary JMA solution and the standard deviation being 10 km, which is slightly greater than the actual uncertainty.

C O N C L U S I O N S
We have proposed a method for seismic wavefield imaging that combines physics-based and data-driven approaches, utilizing seismograms recorded by a dense seismological network.The new contribution of this research is the use of the REMC method for the first time to obtain a posterior PDF of the model parameters related to source information and a 1-D local subsurface structure model.We conducted twin experiments to verify the proposed method, applying it to synthetic observation data for two horizontally layered subsurface structure models.The following results were obtained.
(1) The posterior PDF is multimodal even in a half-space subsurface structure model, and the REMC method is able to obtain samples much more efficiently from a multimodal PDF because it can search a broad parameter space without getting trapped in local lobes of the PDF (Test 1).The Metropolis method, which is an ordinary MCMC method, rarely avoids this trapping problem.
(2) In estimating the model parameters related to the subsurface structure model, the REMC method is able to obtain optimum model parameters close to the true values and successfully reproduces a wavefield that is consistent with the true wavefield (Test 2).The Metropolis method often fails to estimate the model parameters sufficiently well, even when the reproduced seismic wavefield appears to be coincident with the true wavefield.
(3) In simultaneous estimation of both the source information and the subsurface structure model, the source location is much more sensitive to the posterior PDF than the seismic velocities or the layer thicknesses.Therefore, both the REMC and Metropolis methods succeed in obtaining model parameters related to the source information that are close to the true values, while the estimated parameters related to the subsurface structure model are not close to the true values (Test 3).The use of seismic waves propagated from various azimuths may be a way to overcome this problem.Nonetheless, the REMC method reproduces wavefields better than the Metropolis method.
(4) Integrated physics-based and data-driven approaches, for example, the REMC method, are powerful techniques for seismic wavefield imaging.Purely data-driven approaches, for example, kriging, cannot reproduce seismic wavefields that are consistent with true wavefields, even in the low frequency band.
The main purpose of this study was to develop a method for estimating a seismic wavefield in real time for use in rapid prediction of seismic damage to infrastructures as a result of a large earthquake.We used a forward simulation model to compute theoretical waveforms in a 1-D horizontally layered subsurface structure model in order to avoid having to perform a massive number of computations.As we demonstrated in Section 4, such an elementary model is necessary and sufficient for real-time estimation using the REMC method, which requires a number of iterative calculations.The advantage of the REMC method is that, in contrast to a deterministic full waveform inversion, it can provide not only optimum estimates of model parameters but also their uncertainties without getting trapped in local maxima, which permits stochastic estimation of both seismic wavefields and damage to infrastructures.
The following practical application of the proposed method are being considered.As mentioned in Section 4, source information can be determined, even for a single earthquake, much better than the subsurface structure model can be.Therefore, we plan to run the REMC method continuously and let it learn waveforms due to a number of earthquakes to iteratively update the subsurface structure model.Ultimately, we hope to obtain fine seismic wavefield imaging, with a tuned subsurface structure model, that can serve as input in the rapid prediction of damage to infrastructures when a large earthquake occurs.
The proposed method provides a realistic wavefield imaging technique that quantitatively considers both the wave equation and observation data.The imaging result obtained can be input into analyses of the seismic responses of infrastructures and enables more accurate assessment of the overall damage caused by earthquakes, thereby contributing to rapid rescues and efficient recoveries.

A C K N O W L E D G E M E N T S
Comments from two anonymous reviewers and the editor, Prof Ludovic Métivier, were helpful to us in improving the manuscript.Discussions with Prof Peter XK Song, Dr Jonggyu Baek, Prof Kei Hirose and Dr Kenji Nagata were valuable in the development of the REMC method.We thank Dr Takuto Maeda for a fruitful discussion and for providing us with a code to compute seismic wave propagation using a finite difference method.We used the GMT (Generic Mapping Tools) by Wessel & Smith (1998) to generate several figures.This research was supported by the Special Project for Reducing Vulnerability for Urban Mega Earthquake Disasters from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Figure 1 .
Figure 1.A family of PDFs used in an application example of the REMC method.The PDFs correspond to the temperatures (a) T = 1, which is consistent with the target PDF (eq.8), (b) T = 4, (c) T = 16, and (d) T = 64.

Figure 2 .
Figure 2. Sample sequences obtained by the (a,b) REMC and (c,d) Metropolis methods.Note that the ranges of the horizontal axes in the figures are quite different.

Figure 3 .
Figure 3. (a) Target PDF (the same as in Fig. 1a) and the histograms produced from the sample sequences obtained by the (b) REMC and (c-f) Metropolis methods.The numbers in parentheses indicate the numbers of MCMC steps required to obtain the samples used to produce the histograms.

Figure 5 .
Figure 5. Geometric setting for the three twin experiments.The triangles indicate the assumed observation sites, located at intervals of a few kilometres, which is referred to the actual distribution of the MeSO-net observatories.Waveforms at observation site (#6) and a location where no observatory exists (A) are shown in Figs 9 and 11.

Figure 6 .
Figure 6.Sample sequences related to the model parameters (a) Vp and (b) Vs, and (c) values of the posterior PDF in twin experiment Test 1.The black and grey lines show the sequences obtained by the REMC and Metropolis methods, respectively.

Figure 7 .
Figure 7. Histograms of the model parameters (a, b) Vp and (c, d) Vs.These are estimated from the obtained samples of the posterior PDF in twin experiment Test 1.

Figure 8 .
Figure 8. Posterior PDF in twin experiment Test 1.The colour contours are obtained by a grid calculation in the parameter space.The white and black dots show the sample sequences obtained by the REMC and Metropolis methods, respectively.The star and rectangle indicate the true and initial values, respectively, of the model parameters summarized in Table 3.The arrows indicate the jumps in the sequence when exchanges of chains take place in the REMC method.The sequence obtained by the Metropolis method wanders into a local lobe of the PDF indicated by the dashed circle.

Figure 9 .
Figure 9. Waveforms reproduced from the optimum parameters in twin experiment Test 1 (Table 3) obtained by the REMC method (top) and the Metropolis method (middle) at (a) observation site #6 and (b) point A (Fig. 5).For comparison, the (a) synthetic observation and (b) true waveform are shown at the bottom of each figure.

Figure 10 .
Figure 10.Histograms of the model parameters (top) Vp, (middle) Vs, and (bottom) thickness of each layer.These are estimated from the samples of the posterior PDF obtained by the REMC method (black) and the Metropolis method (grey) in twin experiment Test 2.

)Figure 11 .
Figure 11.Waveforms reproduced from the optimum model parameters in twin experiment Test 2 (Table 4) obtained by the REMC method (top) and the Metropolis method (middle) at (a) observation site #6 and (b) point A (Fig. 5), compared with (a) a synthetic observation and (b) a true waveform shown at the bottom of the top panels.The bottom two panels show the differences between the waveform reproduced by each of the methods and (a) the synthetic waveform or (b) the true waveform.

Figure 12 .
Figure 12.Sample sequences related to the source locations (a) X, (b) Y, and (c) Z obtained by (top) the REMC and (bottom) Metropolis methods in twin experiment Test 3. The red lines indicate the true values.

Figure 13 .
Figure13.Waveform reproduced from the optimum model parameters obtained by the REMC method in the numerical test described in Section 4 at observation site #6 (Fig.5), compared with the synthetic waveform.

Figure 14 .
Figure 14.Seismic wavefields in the frequency band of 0.05-0.30Hz at (a) t = 17.36 s and (b) t = 26.96s from the origin time for the north-south components.These imaging results were obtained using (left) the FDM computation, assuming a 3-D subsurface structure model, and (right) the optimum model parameters estimated by the REMC method, assuming a 1-D subsurface structure model, in the numerical test described in Section 4. See Supporting Information Fig. S1 for a movie of seismic wavefield imaging in this frequency band.

Figure 15 .
Figure 15.Seismic wavefields in the frequency band of 0.05-0.30Hz at t = 11.92 s from the origin time of the assumed earthquake for the (a) up-down, (b) north-south and (c) east-west components.These imaging results are obtained using (left) the true model parameters, (centre left) the optimum model parameters estimated by the REMC and (centre right) Metropolis method (Table5) in twin experiment Test 3, compared with (right) the result obtained by kriging.See Supporting Information Fig.S2for a movie of seismic wavefield imaging in this frequency band.

Figure 16 .
Figure 16.Spatial distribution of RMSE (eq.15) in the frequency band of 0.05-0.30Hz between the true wavefield and the reproduced wavefield obtained by (a) the REMC method, (b) the Metropolis method in twin experiment Test 3, and (c) the ordinary kriging.

Figure 17 .
Figure 17.Seismic wavefields in the full frequency band for the up-down component at t = 7.52 s from the origin time of the assumed earthquake when the P-wave passes through the target region.These imaging results are obtained using (left) the true model parameters, (centre left) the optimum model parameters estimated by the REMC and (centre right) the Metropolis method (Table5) in twin experiment Test 3, compared with (right) the result obtained by kriging.See Supporting Information Fig.S3(a) for a movie of seismic wavefield imaging in the full frequency band.

Figure 18 .
Figure 18.Seismic wavefields in the full frequency band for the (a) north-south and (b) east-west components at t = 11.92 s from the origin time of the assumed earthquake when the S-wave passes through the target region.These imaging results are obtained using (left) the true model parameters, (centre left) the optimum model parameters estimated by the REMC and (centre right) the Metropolis method (Table 5) in twin experiment Test 3, compared with (right) the result obtained by kriging.See Supporting Information Figs S3(b) and (c) for a movie of seismic wavefield imaging in the full frequency band.

Figure 19 .
Figure 19.Spatial distribution of RMSE (eq.15) in the full frequency band between the true wavefield and the reproduced wavefield.Each panel corresponds to the case of (a) the REMC method, (b) the Metropolis method, in twin experiment Test 3, and (c) the ordinary kriging.

Table 1 .
Setup of the subsurface structure models and the REMC method used in the twin experiments.

Table 3 .
True and initial values of the model parameters assumed in Test 1.The optimum values are obtained by averaging the top 50 samples, which provide the first to fiftieth largest values of the posterior PDF.Downloaded from https://academic.oup.com/gji/article-abstract/208/1/529/2417077 by guest on 26 December 2018

Table 4 .
True and initial values of the model parameters assumed in Test 2. The optimum values are obtained by averaging the top 50 samples, which provide the first to fiftieth largest values of the posterior PDF.

Table 5 .
True and initial values of the model parameters assumed in Test 3. The optimum values are obtained by averaging the top 50 samples, which provide the first to fiftieth largest values of the posterior PDF.
The numerical test results indicate that waveforms excited by a single earthquake can Downloaded from https://academic.oup.com/gji/article-abstract/208/1/529/2417077 by guest on 26 December 2018

Table 6 .
Hypocentre solution assumed in the FDM seismic wave simulation.