Efficient Monte Carlo sampling of inverse problems using a neural network-based forward—applied to GPR crosshole traveltime inversion

Probabilistically formulated inverse problems can be solved using Monte Carlo-based sampling methods. In principle, both advanced prior information, based on for example, complex geostatistical models and non-linear forward models can be considered using such methods. However, Monte Carlo methods may be associated with huge computational costs that, in practice, limit their application. This is not least due to the computational requirements related to solving the forward problem, where the physical forward response of some earth model has to be evaluated. Here, it is suggested to replace a numerical complex evaluation of the forward problem, with a trained neural network that can be evaluated very fast. This will introduce a modeling error that is quantiﬁed probabilistically such that it can be accounted for during inversion. This allows a very fast and efﬁcient Monte Carlo sampling of the solution to an inverse problem. We demonstrate the methodology for ﬁrst arrival traveltime inversion of crosshole ground penetrating radar data. An accurate forward model, based on 2-D full-waveform modeling followed by automatic traveltime picking, is replaced by a fast neural network. This provides a sampling algorithm three orders of magnitude faster than using the accurate and computationally expensive forward model, and also considerably faster and more accurate (i.e. with better resolution), than commonly used approximate forward models. The methodology has the potential to dramatically change the complexity of non-linear and non-Gaussian inverse problems that have to be solved using Monte Carlo sampling techniques.


I N T RO D U C T I O N
For geophysical inverse problems, the forward problem refers to the problem of computing some data d, given a set of model parameters m (typically describing an Earth model) using a function g (typically based on knowledge about a physical relation) : The corresponding inverse problem consists of finding information about the model parameters m given some observed data d obs : For most inverse problems, the inverse operator g −1 either does not exist, or may be non-trivial to obtain. Further, as most inverse problems are underdetermined, the observed data are associated with uncertainty/noise, and g is based on some approximation to the correct physical relation (leading to modeling errors ), more than one model will be able to explain the data within its uncertainty. A probabilistic approach to inverse problems allows incorporating prior information (through the probability distribution ρ(m)) and information about measurement uncertainty and modeling errors (through the likelihood function L(m)). The solution to the inverse problem is then the a posteriori probability distribution (Tarantola & Valette 1982b): where the likelihood function is given by ρ D (g(m)) describes how well the forward response of a specific model m should match some observed data given measurement uncertainties. θ (d|m) represents the modeling error that describes any imperfections in solving the forward problem using eq. (1). The solution to probabilistically formulated inverse problems is in practice either an analytic description of σ (m), such as in the case of linear Gaussian inverse problems (Tarantola & Valette 1982a). Or, in the general case, it can be in form of a sample from σ (m) obtained using sampling methods, such as the extended Metropolis algorithm (Mosegaard & Tarantola 1995).
Using the extended Metropolis algorithm to sample the posterior distribution is attractive as it can be applied to, in principle, any inverse problem, with either a linear or non-linear forward operator g, as long as (1) the likelihood function can be evaluated and (2) the prior distribution can be sampled. That is, only the forward relation in eq. (1) needs to be evaluated, and the inverse operator in eq. (2) need not be known.
However, Monte Carlo-based sampling methods are computationally very costly, and for many geophysical inverse problems, sampling-based methods may not be useful in practice.
Evaluation of the forward model, eq. (1), is often the single most important contribution to the computational cost of running a sampling algorithm to sample the posterior distribution. Examples exist where a complex and realistic forward model is considered, but then the number of model parameters is either small and/or the number of considered data few. Most applications of using sampling methods to sample the posterior distribution of an inverse problem are based on quite simple forward models that are fast to evaluate (Mosegaard et al. 1997;Khan et al. 2000;Malinverno 2002;Minsley 2011;Hansen et al. 2012;Laloy et al. 2012;Cordua et al. 2012;Zunino et al. 2014). The choice of a simple (and fast) forward model is typically chosen as this is needed to practically make use of sampling methods, even though more accurate forward models most always exist.
The use of such approximate forward models introduce a modeling error that is quantified by θ (d|m) in eq. (4). Often the modeling error is ignored in which case the likelihood function, eq. (4), simply becomes L(m) = ρ D (g(m)) (Tarantola 2005). However, if the modeling error is ignored, the part of the data that should be treated as (modeling) errors will be treated as data. This can lead to mapping of modeling errors into the posterior distribution, which may lead to artefacts that appear as well-resolved features. Hansen et al. (2014) describe a methodology that allow sampling, that is, generating realizations of, the (usually unknown) probability density describing the modeling error, θ (d|m), related to a specific choice of a priori model ρ(m). Further they demonstrate how to infer a Gaussian model of the modeling error, θ (d|m) ∼ N (d t , C t ). This is convenient as, if the noise on the data is Gaussian ρ D (g(m)) ∼ N (d 0 , C d ) and the modeling error is Gaussian, then these Gaussian probability densities combine though addition of their mean and the covariances (Tarantola 2005). The full likelihood in eq. (4) is then also Gaussian distributed over d as This is important as the uncertainty of data is very often described by a Gaussian probability density, and because this allows taking into account modeling errors without having to evaluate the integral in eq. (4), which will in most cases be computationally intractable or expensive.
In the following, the main idea is to use this methodology of quantifying the modeling error to allow using a neural network as a computationally efficient approximation to the forward problem, while at the same time accounting for the errors due to the use of this approximation.

Solving the forward problem using regression networks
Both the forward and inverse problems in eqs (1) and (2) describe a (possibly non-linear) mapping between, which is usually, one set of continuous parameters to another set of continuous parameters (Krasnopolsky & Schiller 2003). Supervised machine learning provides a number of methods that allow inferring this non-linear mapping operator from a set of available training data that consists of a set of data and model parameters [D, M]. We will refer to these methods in general as regression networks. Examples of these methods are support vector regression (Smola & Schölkopf 2004), Gaussian processes (Williams & Rasmussen 1995), regression trees (Breiman et al. 1984) and neural networks (Bishop 1995) to name a few.
Regression networks can be used to estimate the inverse operator g −1 , eq. (2), such that a model (a solution to the inverse problem) can be directly estimated from data, as demonstrated by Hoole (1993), Amari & Cichocki (1998), Gagunashvili (2010), Maiti et al. (2012), Singh et al. (2013) and Li et al. (2015). It is, though, not trivial to properly account for the uncertainty related to data and the model, and not trivial to fully describe the posterior distribution (hence nor the uncertainty related to the estimated solution model).
Regression networks can also be used to estimate the forward operator g, eq. (1) (Poulton 2001;Krasnopolsky & Schiller 2003). g is typically parametrized using some model, and the parameters describing this model is inferred from an available training data set [D, M] representing d and m in eq. (1).
In the following, it is proposed to replace a computationally expensive forward problem, g in eq. (1), with a regression network. Subsequently, the associated modeling error θ(d|m) related to the use of this regression network is quantified using the approach of Hansen et al. (2014). The main idea is that this will allow a very fast evaluation of the forward problem and, hence, the likelihood L(g(m)), which again may allow for a more efficient approach for sampling the posterior distribution σ (m).
The developed methodology will be applicable for sampling of the posterior distribution related to inverse problems in general. As an example, the (forward and) inverse problem related to first arrival traveltime inversion of crosshole ground penetrating radar (GPR) data will be considered. An 'ideal' forward model will be based on 2-D full-waveform modeling followed by automatic first arrival picking. As an example, a multilayer perceptron neural network will be considered as an approximation of the forward problem, but note that in practice any regression-type network can be used. The modeling error related to the neural network-based forward will be quantified through a Gaussian probability distribution. Finally, the neural network-based forward model will be compared to using other widely used approximate forward models (based on physical approximations) in formulating and sampling of the posterior distribution of the inverse problem.

F I R S T A R R I VA L C RO S S H O L E T R AV E LT I M E I N V E R S I O N
Inversion of first traveltime data is a widely studied inverse problem, both with respect to elastic seismic and electromagnetic wave propagation (Bording et al. 1987;Scales 1987;Zelt & Smith 1992;Schuster & Quintus-Bosz 1993). The inverse problem consists of inferring information about the velocity field that has caused the observed decay in traveltime for specific phases in a wavefield. A traveltime-based inverse problem called 'crosshole first arrival time inversion', will be considered here, which is related to inversion of the first arrival time delay of a wavefield propagating from a (set of) source location(s) in one borehole, to a set of receiver locations in another borehole. The inverse problem consists of inferring information about the velocity distribution (described by m) between the boreholes. While initially being developed in relation to seismic traveltime data (McMechan 1983;Hole 1992), this type of traveltime inversion has also been widely studied using near-surface GPR methods, where the traveltime delay of a propagating electromagnetic wave is recorded between boreholes (Maurer & Green 1997).
Inversion of crosshole GPR data is a widely considered inverse problem using both deterministic [e.g. A synthetic case is considered, in which the reference model, m ref , of size 7 × 13 m (36 × 66 pixels of size 0.2 m × 0.2 m) is generated from a multivariate Gaussian probability distribution, with a mean velocity of 0.14 m ns −1 , a variance of 0.000215 m 2 ns −2 and a spherical-type covariance model with an isotropic range of 6 m. This is chosen similar to the parameters inferred in Looms et al. (2010) using the same geometrical setup as in Hansen et al. (2013). A set of 702 noise-free traveltime data is computed using 2-D finite-difference modeling of electromagnetic wave propagation followed by automatic first arrival picking, using the method described by Hansen et al. (2014), based on Ernst et al. (2007a), Ernst et al. (2007b), Molyneux & Schmitt (1999) and Cordua et al. (2012). This forward model is referred to as g fd and will, from heron, be considered the reference for solving the forward problem. The calculated noise-free data are then contaminated with uncorrelated zero-mean Gaussian noise with a standard deviation of 0.1 m ns −1 , N (0, 0.1 2 ), mimicking measurement uncertainty, to obtain a reference data set d ref . Fig. 1 shows the reference velocity model, m ref , and the locations of ND = 702 sets of source and receiver locations, as identified by connecting ray paths. Fig. 2 shows the corresponding simulated waveform data, as well as the automatically picked first arrival traveltime data.

Using a neural network as an approximation to the forward problem
As discussed, many types of supervised machine-learning regression algorithms exist that can be used to obtain an approximation to the reference forward mapping g fd . As an example, a simple twolayer feedforward neural network, that is, with one hidden layer, is considered to replace the accurate forward model g fd . This neural network-based approximation to the forward model will be referred to as g nn . Such a neural network that can estimate each of the ND = 702 data, d k , corresponding to a specific model m = [m 1 , m 2 , . . . , m NM ] can be formulated as where NM = 2376 is the number of model parameters and NH the number of hidden units (Bishop 1995). In this case, NH = 80 hidden units is considered. h 1 and h 2 refer to two activation functions, in this case both chosen to be of sigmoidal type. w 1 i j and w 2 jk refer to the weights of the units in the first and second layers. Given enough units in the hidden layer, any continuous mapping can be arbitrarily accurately represented (i.e. including g fd ), given a large enough training set (Bishop 1995). The main reason a two-layer feedforward network is considered, is that once trained, the evaluation of the network on a new model (i.e. solving the forward problem) is potentially very fast, and faster than, for example, using support vector regression.
The weights w 1 i j and w 2 jk must be chosen such that the neural network will be able to estimate the data d given a specific model m in an optimal way. This is obtained by 'training' the neural network based on a training data set consisting of NT sets of data and model parameters [D, M], such that D will be of size [ND, NT] and M of size [NM, NT]. Training a network in itself can be considered an inverse problem (Prato & Zanni 2008). Most often it is treated as an optimization problem, that is traditionally solved using the backpropagation algorithm.
The quality of the inferred forward operator is closely related to the size of the available training data set, NT. In some applications of neural networks obtaining a large training data set can be a challenge. However, in the context of probabilistically formulated inverse problems, the prior distribution ρ(m) is, by construction, chosen prior to inversion. Further, it can be assumed that realizations from this prior distribution can be generated, as this is needed by the extended Metropolis algorithm (Mosegaard & Tarantola 1995). Therefore, in a probabilistic setting, it is trivial to generate an arbitrarily large sample of the prior as M consisting of NT realizations. Then, a corresponding set of data D can be simulated by evaluating the forward model (which may be computationally expensive) NT times.
Five training data sets based on NT = [1000, 5000, 10 000, 20 000, 40 000] sets of model and data parameters, [D, M], are considered. All the considered models are generated as unconditional realizations from the prior distribution ρ(m), and the corresponding data computed using the exact, but expensive, forward model based on finite-difference modeling, g fd , described previously.  The network can be trained directly on the full [D, M] training set. However, acknowledging that the traveltime between a source and a receiver is mostly sensitive to an area between, and in the vicinity of the ray path between the source and receiver, the 702 data are split into eight subsets of data and corresponding model parameters. A unique neural network of the form in eq. (5) is then assigned to each subset of the data. Fig. 3 illustrates which model parameters and sets of sources and receivers that are considered for each subset.
The weights for each of the eight neural networks are found by minimizing the mean-squared error between known data D and data predicted by the neural network, using the Levenberg-Marquardt algorithm for backpropagation. The Matlab Neural Network Toolbox (Beale et al. 2016) is used for this.
This results in five forward models referred to as g 1000 , g 5000 , g 10000 , g 20000 and g 40000 , where the index defines the sample size used to train the neural network.
Note that the specific type of, and layout of, the neural network chosen here is just an example. Any method that can be used to estimate the forward model mapping between data and model parameters can in principle be used in the following framework.

Quantifying the modeling error
As the number of units in the neural network, and the size of the training data set, is finite, g nn will provide an approximation of g fd . In other words, using g nn as opposed to g fd will lead to a (forward) modeling error. A sample of the forward modeling error can be obtained by comparing the difference in the forward response of g nn and g fd from a set of Nr realizations of the prior model . Here, Nr = 6000 realizations from the prior has been generated as M * = [m 1 , m 2 , . . . ., m Nr ]. For each model in M * , a reference data set has been computed using g fd to obtain D fd , and using g nn to obtain D nn . Then, a sample of the modeling error θ (d|m) can be obtained as Finally, a Gaussian model of the modeling error, N (d t , C t ), can be estimated from D θ using the methodology described in Hansen et al. (2014). This allows treating the Gaussian modeling error as Gaussian noise on the data (e.g. Mosegaard & Tarantola 2002). For comparison, two widely used approximations for computing the first arrival are also considered. g ray refers to the linear straight ray approximation, and g eik to the non-linear bended ray     approximation based on the eikonal solution to the wave equation (Vidale 1990;Linde et al. 2006;Looms et al. 2008a, see more details on these types of forward models in Hansen et al. 2014). Fig. 4 shows the 1-D marginal probability distribution of the modeling error, obtained as the histogram of the sample of the modeling error, D θ . Fig. 5 shows the covariance model of the Gaussian model describing the modeling error, C t , for each of the seven approximations (five neural network based and two ray based) to the forward models. The mean d t and standard deviation of the inferred Gaussian modeling error is given in Table 1. Figs 4 and 5 and Table 1 (first four rows) reveal the following: (1) The use of any of the neural network-based forward model, g nn , results in a modeling error bias, d t , very close to zero. This is not surprising as the neural network is optimized to minimize mean-squared error.
(2) The magnitude of modeling error related to using g nn decreases as the size of the training set used for learning the neural network increases. Using a training set larger than N t = 5000 results in a smaller standard deviation of the prediction error than using both g ray and g eik . When the training set is larger than N t = 20 000 no significant change in the magnitude of the modeling error is noted (in the present case the modeling error due to using g 40000 is slightly higher than using g 20000 ).
(3) The use of g nn leads to a less correlated covariance model as opposed to using g eik and g ray , as seen in the (lack of) off-diagonal elements of the estimated covariance model (Fig. 5).
(4) Evaluating any of the neural network-based forward models is more than three orders of magnitude faster than evaluating the 'exact' g fd . This is the case even as the full-waveform modeling step of evaluating g fd is performed in parallel on four separate threads (using the implementation by Cordua et al. 2012). Further, it is about 15 times faster than using g eik (which is the most widely used methods for computing first arrival traveltimes) and about three times faster than using the simple linear g ray (see Table 1).
(5) The standard deviation of the modeling error becomes close to the standard deviation of the noise added to the reference data using g 20000 and g 40000 (0.13 ns versus 0.1ns). Fig. 6 shows the sensitivity kernel, that is, the first-order Fréchet derivative, for the reference velocity model for one specific sourcereceiver location for all the considered approximate forward models as well as the reference forward model (Fig. 6h). Visible 'noise' is present for all neural network-based forward models (Figs 6a-e) decreasing as the size of the training data set increases. Therefore, it may be surprising that using, for example, g 20000 and g 40000 performs better than using g eik . The 'noise' in the Fréchet derivatives when using the neural network-based forwards seems to be spatially uncorrelated and, therefore, the effect in the computed traveltime from the noise may tend to cancel out. Thus, while the forward model based on a neural network seems to predict traveltimes with relatively high accuracy, one should not use the forward model to, for example, estimate the first-order Frechet derivative and assign any physical meaning to these. One should use the network for the purpose it was trained, that is, in this case traveltime calculation.
These results are encouraging as they suggest a neural networkbased forward model can be constructed that is both faster to evaluate, and results in less modeling error, than using any of the widely used approximations g eik and g ray . However, the main point of interest is how much, and how efficient, information can be obtained from the posterior probability density using Monte Carlo sampling methods, based on the different forward models, and corresponding Gaussian models for describing the modeling error.

E F F I C I E N T M O N T E S A M P L I N G O F T H E P O S T E R I O R P RO B A B I L I T Y D I S T R I B U T I O N
The extended Metropolis algorithm is used to sample the posterior distribution, σ (m), for the crosshole GPR inverse problem using the seven considered types of approximate forward models and the corresponding estimated Gaussian model for the modeling errors using the SIPPI Matlab toolbox (for details see e.g. Hansen et al. 2016). Each Metropolis sampler is run for 10 6 iterations. The first 10 5 iterations are discarded and considered as part of the burn-in phase of the extended Metropolis algorithm. Fig. 7 shows eight independent realization of the posterior distribution σ (m) using each of the seven considered forward model approximations. Fig. 8 shows the corresponding pointwise mean and standard deviation of all realizations from the posterior distribution.
Even when using the worst neural network-based forward approximation, namely the neural network based on a training set of size 1000 g 1000 , provides results consistent with the reference model, in that no noise is apparently mapped into the posterior distribution as resolved features. In other words, the features that appear well resolved (that stand out in the mean model) are consistent with features in the reference model. This may be somewhat surprising considering the high degree of modeling error (Fig. 5a), and the very noisy Fréchet derivative (Fig. 6a).
As the accuracy of the used approximation increases, so does the resolution, which can be seen by the increase in details in the mean model (Figs 8a-h), along with a decrease in posterior variability (Fig. 7). This is also reflected in the pointwise standard deviation of the posterior distribution (Figs 8i-o), which decrease as the accuracy of the neural networks increase. Only the neural network based on training set of sample size 1000 provides a 1-D marginal posterior standard deviation lower than using g eik or g ray . This suggests that more information about the 'exact' forward model g FD is provided by the neural network-based forward models (except g 1000 ) than by the ray-based approximations. This is of course only interesting if the apparent increase in resolution is due to actual data and not simply an effect of mapping, for example, modeling errors from the data space into the model space. As a synthetic case is studied, this can be analysed by comparing realizations from the posterior distribution to the reference model. Table 1 (row 5) lists the average correlation coefficient between realizations of the posterior distribution and the reference model, revealing that g 20000 and g 40000 provide more information, than using either g eik or g ray . A higher number indicate better performance. A similar property is reflected by the mean absolute error between the reference model and 100 realizations from the posterior distribution listed in Table 1 (row 6).
These results demonstrate that using a neural network-based forward as an approximation to the 'exact' forward, f fd , allows inferring more information from a GPR crosshole traveltime inversion than using the more widely used forward models g eik and g ray . In addition, it is important that this increase in resolution, compared to using g eik and g ray , is consistent with features in the reference model. At the same time, the computational requirement to use any of the neural network-based forward models is about 15 times faster than using g eik .

D I S C U S S I O N
The results presented above demonstrate the potential of using regression-type networks, here a neural network, as a replacement for computational demanding forward models in 1530 T.M. Hansen and K.S. Cordua  physics/geophysics. The proposed methodology is general, and can potentially be applied to a wide range of forwards for inverse problems, in which the posterior distribution can be sampled as follows (i) Estimate an approximation g app to the full forward g ref using a regression network. (2) Compute the forward response of the NT realizations of the prior model, as D = [d 1 , d 2 , . . . , d NT ] using a reference forward model, g ref .
(3) Estimate an approximation to g ref as g app from the training data set [D, M] using a regression network (for example, a neural network). (ii) Estimate a Gaussian model of the modeling error θ (d|m).
(1) Generate a large sample of the prior M * (independent of M above).
(2) Compute the forward response for all models in M * , using both the correct, g ref , and approximate, g app , forward models, as D g ref and D gapp . A sample of the modeling error is then given by (3) From D θ estimate a mean and covariance model describing the modeling error as a Gaussian probability density through θ(d|m) ∼ N (d t , C t ) (iii) Accounting for the modeling error as a Gaussian uncertainty on data.
(iv) Use the extended Metropolis algorithm to sample from the posterior distribution. This approach will potentially allow using sampling-based approaches to some inverse problems that has until now not been feasible due to computational demands.
The main benefits of using the neural network-based forward model is that it provides an alternative forward that is both faster and more accurate than traditional ray-based forward models. Another key advantage of using the neural network-based forward models is that they are specifically designed for whatever strategy is used to pick the first arrival. The strategy for picking the first arrival time need not to be chosen to match a specific choice of forward model, but can be chosen from what method for picking arrival time is most suited for the data that provides the most robust estimate. It does not matter if the traveltime is chosen to be the first break, the arrival of the maximum amplitude, or any other attribute, which will directly affect the validity of any of the ray-based approaches (Jacobsen et al. 2010). If only the same strategy is used to pick the arrival time, then the neural network will learn a forward model specifically for that picking strategy.
The main challenges using the methodology are: (1) To generate a large training data set. The full forward model needs to be evaluated for as many times as the size of the training data set. This may be time-consuming, but need only to be done once, and typically the Monte Carlo sampling algorithm will need many more iterations than the size of the training data set. (2) Choosing a network design. In the present case, a very simple neural networks design is chosen, and applied successfully. However, other regression network types and designs could be chosen for better performance, better suited for other types of forward models, and other types of prior models.
(3) Training the neural network. It takes time to train the neural network. However, recently highly efficient methods for training neural networks on graphical processing units have been developed, which allow handling larger networks more efficiently.
In addition, the neural network obtained using the procedure above is applicable only to the specific recording geometry given in Fig. 1. If the recording geometry changes a new training data set needs to be computed, and a new neural network needs to be trained.
In the example considered here, the training data set was generated from a specific prior distribution. Thus, if the prior distribution changes, a training data set needs to be created and a new neural network needs to be trained. It would, however, be possible to generate a training data set, where models are generated from a much broader prior model, that is, a prior model that consist of models with more spatial variability than considered here. Such a training data set could be used as the base to learning a neural that can be used for many different types of prior models (i.e. Earth model variabilities) without the need for a specific training data set. This may be a subject for future research.

C O N C L U S I O N S
A general framework has been demonstrated that allow replacing a computationally complex forward model with a computationally efficient (fast to evaluate) forward model using a neural network. At the same time, the forward modeling error associated with the approximate neural network-based forward has been quantified. This has the potential to allow using more realistic forward models, on more realistic sized models, using otherwise computationally expensive Monte Carlo-based sampling methods to solve non-linear inverse problems.
The approach has been demonstrated on a GPR crosshole traveltime inverse problem. A realistic, but computationally heavy, forward model based on full-waveform modeling has been replaced by a neural network, which turns out to be more than three orders of magnitude faster to evaluate than the full 'exact' solution to compute the first arrival traveltime. It has been demonstrated that if the sample size of the training data set is large enough (NT > 5000), the forward model based on a neural network can be evaluated faster, with less error (in terms of the magnitude of the modeling error), compared to using other widely used approximations to compute first arrival times (i.e. the eikonal solution and the linear straight ray assumption). Moreover, using the neural network based forward model leads to inversion results faster and with better resolution. As the modeling errors are accounted for in a probabilistic model, the use of approximate forward models does not lead to any (observed) bias in the final inversion results.
In the present case, the proposed methodology has allowed an otherwise hard sampling-based inverse problem to be replaced by a much simpler and easier sampling problem. The presented methodology is general, and has potential to change the scale and complexity of non-linear inverse problems that have to be solved through Monte Carlo sampling, using, for example, the extended Metropolis algorithm.
The results demonstrate that a neural network can be used to describe a (relatively) complex forward model (g fd ), and that it can be both relatively accurate, and fast to evaluate. For inverse modeling, it provides a better resolution of the posterior probability density than two other commonly used approximate forward models, obtained with significantly less computational usage.

A C K N O W L E D G E M E N T S
We thank everyone at the Climate and Computational Geophysics group at the Niels Bohr Institute for a daily encouraging and appealing research environment. Thank you to Jacques Ernst and ETH Zurich, for making the finite-difference modeling code available, such that a parallel Matlab interface could be developed (https://sites.google.com/site/kscordua/downloads /tomographic-full-waveform-inversion). Matlab code to reproduce the results are available at https://github.com/cultpenguin/sippi/tree /master/examples/papers/hansen_and_cordua_2017_nn/.