Bayesian tomography using polynomial chaos expansion and deep generative networks

intricate

Bayesian inversion methods can account for data and modeling uncertainties as well as prior knowledge, thus, representing an attractive approach for tomography and its uncertainty quantification.Nevertheless, the difficulties in specifying appropriate prior distributions and the high computational burden associated with repeated forward model evaluations often hinder proper implementations of Bayesian tomography (Chipman et al., 2001).In geophysical settings, prior distributions have traditionally been specified by assuming the subsurface to be represented by a Gaussian random field.More advanced options are made possible by relying on the information content of training images (TI), that is, large gridded 2-D or 3-D unconditional representations of the expected target spatial field that can be either continuous or categorical (Mariethoz and Caers, 2014;Laloy et al., 2017Laloy et al., , 2018)).Bayesian inversion needs not only a parameterization of the prior that accurately represents the prior information, but also one that is easy to manipulate and one with a an underlying distribution for which small perturbations lead to comparatively small changes in the data response.In many practical implementations, it is advantageous to rely on another parameterization than the one used for physics-based modeling and visualization, that is, typically a pixel-based parameterization (Asher et al., 2015;Alizadeh et al., 2020).Physical media are typically associated with points in R N , where N is the number of elements in the corresponding pixel-based representation.While allowing easy implementation of forward modeling schemes (e.g., Finite Difference (FD) based on partial differential equations), pixel-based N -dimensional parametrizations are often not suitable to effectively parametrize the prior distribution, as N can be very large.When prior knowledge suggests constrained spatial patterns, such as covariance or connected spatial structures, the prior-compatible models populate manifolds embedded in R N .If this manifold can be locally assimilated to a subset of R M , with M ≪ N , the actual prior distribution reduces to a function of M variables only, which effectively implies an M -dimensional forward/inverse problems.
Various approaches can be employed to achieve manifold identification through dimensionality reduction.Among these techniques, principal component analysis (PCA) and related methods are the most common linear dimensionality reduction methods (Boutsidis et al., 2008;Jolliffe and Cadima, 2016).Based on a set of prior model realizations and the eigenvalues/eigenvectors of the corresponding covariance matrix, PCA provides optimal M -dimensional representations in terms of uncorrelated variables that retain as much of the sample variance as possible.PCA has found widespread application in geophysics, using both deterministic and stochastic inversion algorithms, with recent advancements offering the potential to reconstruct even discontinuous structures (Reynolds et al., 1996;Giannakis et al., 2021;Meles et al., 2022;Thibaut et al., 2021).In the context of complex geological media with discrete interfaces (Strebelle, 2002;Zahner et al., 2016;Laloy et al., 2018)  space for both data reconstruction and generation; Generative Adversarial Networks (GANs) employ adversarial training to create synthetic output that closely resembles real reference data; Diffusion Models are parameterized Markov chains, trained using variational inference, to generate samples that converge to match the underlying data distribution (Kingma and Welling, 2013;Jetchev et al., 2016;Goodfellow et al., 2020;Ho et al., 2020).In the context of Bayesian inversion, DGMs possess a crucial property: they generate realizations that exhibit patterns consistent with the TI when sampling an uncorrelated standard normal or uniform distributed latent space (Laloy et al., 2017).The incorporation of such a low-dimensional parameterization representing the prior distribution simplifies the sampling process, thus making DGMs well-suited for MCMC schemes.
Employing an effective parametrization for the prior (e.g., via PCA or DGMs) does not alleviate the computational cost associated with repeated forward modeling, which can often limit practical application of MCMC schemes.However, it opens up the possibility of employing surrogate modeling.Various classes of surrogate models, such as those based on Gaussian process models or Kriging (Sacks et al., 1989;Rasmussen, 2003) and polynomial chaos expansions (PCEs) (Xiu and Karniadakis, 2002;Blatman and Sudret, 2011), can be employed to effectively solve low-dimensional Bayesian inverse problems (Nagel, 2019;Higdon et al., 2015;Marzouk and Xiu, 2009;Marzouk et al., 2007;Wagner et al., 2020Wagner et al., , 2021a)).
Recently, Meles et al. (2022) presented a Bayesian framework that employs a shared PCA-based parameterization for characterizing prior information and modeling travel times through PCE.In this paper, we show that the direct application of that methodology produces suboptimal results when the input is parametrized in terms of latent variables associated with a DGM.Indeed, a prerequisite in this framework is that the parametrization chosen to describe and explore the prior allows for accurate surrogate modeling of the physical data used in the inversion.However, the relationship between the parametrization provided by DGMs and the corresponding physical output is typically highly non-linear, and training a surrogate capturing such complex relationship can be extremely challenging or infeasible.
We present a strategy that allows the use of a DGM parametrization to define the prior distribution and explore the posterior distribution while still enabling accurate surrogate modeling.We achieve this goal by employing surrogate modeling with either global or local PCA decompositions of the input generated by the underlying DGM during the MCMC process.Through the decoupling of the MCMC parameterizations and the parametrizations, we preserve the advantageous prior representation capabilities of DGMs while simultaneously achieving precise PCA-based surrogate modeling, thereby significantly speeding up forward calculations.

Bayesian inversion
Forward models are mathematical tools that quantitatively evaluate the outcome of physical experiments.We refer to the relationship between input parameters and output values as the 'forward problem': (1)   Here, F, u, y and ϵ stand for the physical law or forward operator, the input parameters, typically representing local media properties, the n-dimensional output of a physical experiment and a noise term, respectively.Given the manuscript's significant emphasis on modeling, we deviate from the standard formalism, and position the forward operator on the left-hand side and related subsequent equations, while placing the observed data and the noise term on the right-hand side.The goal of the 'inverse problem' is to infer properties of u conditioned by the data y while taking into account any available prior information about u.For a constant model dimension, a general solution to this problem can be expressed in terms of the unnormalized posterior distribution: Here, P (u|y) is the posterior distribution of the input parameter u given the data y, P (y|u) (also indicated as L(u) and known as 'the likelihood') is the probability of observing the data y given the input parameter u, while P (u) and P (y) are the prior distribution in the input parameter domain and the marginalized likelihood with respect to the input parameters (also known as evidence), respectively.In case of Gaussian noise, the likelihood takes the following form: where the covariance matrix C d accounts for data uncertainty.Note that throughout this paper, we adhere to the common formalism employed in Geophysics, utilizing the same symbol to denote both individual instances of a random variable and the random variable itself (Tarantola, 2005;Aster et al., 2018).To draw samples from the unnormalized posterior in Eq. ( 2), common practice is to use a Markov chain Monte Carlo (MCMC) methods to draw samples proportionally from P (u|y) (Hastings, 1970).However, computing L(u) requires the solution of a forward problem, which can be demanding in Bayesian inversions as this evaluation needs to be repeated many times.In the following sections we discuss how this problem can be approached by using a low-dimensional latent representation and surrogate modeling to evaluate P (u) and approximate L(u), respectively.

Bayesian inversion in latent spaces
Parametrizations are well suited for surrogate modeling when they encode the prior distribution and effectively simplify the input-output relationship within the problem under investigation.Meles et al. (2022) used variables defined in terms of Principal Components to (i) represent the prior distribution related to a random Gaussian field on a low-dimensional manifold and (ii) learn an accurate surrogate to compute the forward problem.However, it is not generally granted that a parameterization can achieve both (i) and (ii).For representing the prior distribution, we can utilize manifold identification using DGMs.This involves utilizing a DGM to characterize a latent space using a set of coordinates (here indicated as z) with a statistical distribution defined prior to the training.The DGM allows mapping between this latent space to the physical space through a decoder/generator, denoted here as GDGM .For a given random realization z, the decoder operation GDGM (z) produces an output u in the physical space that adheres to the characteristics of the prior distribution.
coordinates z casts the inverse problem on the latent manifold as: P (z|y) ∝ P (y|z)P (z). (4) While formally identical to Eq. (2), Eq. ( 4) involves significant advantages.Not only is z typically low-dimensional (consisting typically of a few tens of variables instead of many thousands of variables) but we can also design the corresponding statistical prior distribution P (z) as desired.Here, we impose during training that P (z) is a multivariate standard Gaussian distribution.

Decoupling of inversion and forward modeling domains in MCMC inversions
We now rewrite the forward problem in Eq. ( 1) using the new coordinates: where MDGM = F • G DGM , • stands for function composition, and we assume no error induced by the DGM dimensionality reduction, which in turns implies that the corresponding likelihood is the same as that in Eq. 3. The complexity and non-linearity of GDGM imply that the forward operator MDGM exhibit considerable irregularity, making it difficult to learn a surrogate model due to typical issues of neural networks, such as those associated with limited data availability and overfitting (Bejani and Ghatee, 2021).Consequently, we investigate alternative approaches that avoids using the latent parametrization for surrogate modeling while retaining it for the prior representation.
Building upon Meles et al. (2022), we explore surrogate modeling based on PCA-inputs spanning the Global spatial extent of the input.
Without any loss of generality, we consider a complete set of Global Principal Components (in the following, GPCs) for realizations of the DGM (implemented via GGP C (x f ull GP C ) = GDGM (z) = u) and rewrite Eq. (1) as: where GGP C , and MGP C are the physical distribution and the model associated with the GPCs and therefore MGP C = F • G GP C .We will show in that the linear relationship GGP C (x f ull GP C ) = u helps in implementing a surrogate of MGP C , provided that the input and the model can be faithfully represented as operating on an effective M -dimensional truncated subset xGP C of the new coordinates x f ull GP C , that is: where ε is a term including both observational noise and modeling errors related to the projection on the subset represented by xGP C .Due to the error introduced by the truncation, the likelihood associated with the GPC parametrization deviates from the one of Eq. 3 However, when a substantial number of GPCs are utilized, the extent of such difference tends to be minimal.
The formulation given above relies on the weak hypothesis that Global proximity in the input domain leads to proximity in the output domain u (i.e., the observed quantity under investigation).Critically, when the output functionally depends mainly on a subset L of the where LP C refers to Local Principal Components (in the following LPCs) and ↾L restricts the validity of these relationships to the subset L. However, because the area spanned by LPCs is smaller than that of GPCs, we expect to need fewer LP Cs than GP Cs to achieve a satisfactory approximation in the output domain.Note that this change of coordinates is not invertible even if a complete set of LPCs is employed.Note that also for the LPC parametrization, the corresponding likelihood differs from that in Eq. 3, but that using a proper number of components can render this misfit negligible.
Generally speaking, a surrogate model is a function that seeks to emulate the behaviour of an expensive forward model at negligible computational cost per run.Clearly, the function a forward solver has to model depends on the set of coordinates used to represent the input.
For simplicity, we discuss here a surrogate for MGP C (xGP C ), namely MGP C satisfying: Once available, surrogate models can be used for likelihood evaluation in MCMC inversions, with a modified covariance operator CD = C d + CT app comprising the covariance matrices C d and CT app accounting for data uncertainty and modeling error, respectively.In such cases, the likelihood not only shows a mild dependence on the parametrization but also on the surrogate model.This relationship is expressed as follows: where |CD| is the determinant of the covariance matrix CD (Hansen et al., 2014).Similar formulas for the likelihood can be derived for any other parametrization of the input space, such as those associated with DGMs of LPCs.The substantial differences in likelihoods associated with the DGM, GPC, and LPC parametrizations can potentially give rise to significantly different estimations of the posterior distribution.
Surrogate models all adhere to a fundamental principle: the more complex the input-output relationship, the higher the computational demands needed to build the surrogate model (e.g. in terms of training set).Furthermore, the efficiency of constructing a surrogate is significantly affected by the number of input parameters, and it can become unfeasible when the input dimensionality exceeds several tens of parameters, thus surrogate modeling often relies on some kind of dimensionality reduction.The dimensionality reduction step does not necessarily need to be invertible since what holds significance is the supervised performance, specifically the minimization of modeling error (Lataniotis et al., 2020).Surrogate models exhibit their peak potential when dealing with low-dimensional input spaces, provided that such simplicity does not entail a complex input-output relationship.
Based on these general considerations, we can qualitatively anticipate different surrogate performances when operating on different latent variables (i.e., MDGM (z)), GPCs (i.e., MGP C (xGP C )) and LPCs (i.e., MLP C (xLP C )).We can expect that using the latent variables of a DGM involves a lower dimensionality, albeit at the cost of a complex input-output relationship.On the other hand, employing GPCs likely results in a considerably simpler input-output relationship but necessitates a larger number of variables, as the entire input domain needs Local approach, respectively (central and right column).The Global approach employs a set x GP C of PCs defined across the entire domain, illustrated by the corresponding velocity field that extends throughout the entire medium.Conversely, the Local approach utilizes a set x LP C of PCs that cover only specific portions of the domain chosen based on physical considerations.This is exemplified by the corresponding velocity field, which is confined to a limited portion of the medium.In this illustration, as we consider travel time modeling, the Local domain is defined in terms of fat-ray sensitivity kernels (for more details, please refer to section 3.2 ).
to be approximated.In cases where the underlying forward model permits, the LPCs projection is likely to offer similarly straightforward input-output characteristics as the GPCs approach but with fewer parameters, thus facilitating the training of a surrogate.
To effectively combine MCMC methods with surrogate modeling, it is essential to ensure a high level of accuracy in the output domain.
To achieve this goal, we propose decoupling the parameterizations used for inversion and surrogate modeling.
provided by the DGM is used to evaluate P (z) and explore the posterior.Once a prior has been defined and a surrogate modeling strategy devised, a Metropolis-Hastings MCMC algorithm can be used to sample the posterior distribution P (z|y) (Hastings, 1970).A sample of the posterior in the physical space, P (GDGM (z)|y), is then also available via mere application of the GDGM to draws from P (z|y).For each step in the MCMC process, we consider three strategies with surrogates operating on latent variables, on GPCs and or LPCs associated with the field GDGM (z).We depict these three strategies for a travel time tomography problem in Figure 1.

APPLICATION TO GPR CROSSHOLE TRAVEL TIME TOMOGRAPHY
In the previous section, we briefly covered the basic principles of Bayesian methods and emphasized the significance of dimensionality reduction and surrogate modeling for their implementation.Now, we integrate these concepts to tackle GPR crosshole traveltime tomography with MCMC.GPR wave-propagation is governed by the distribution of dielectric permittivity (ϵ) and electric conductivity (σ) in the subsurface.
The wave propagation velocity mainly depends on permittivity, which is in turn related to porosity and water content through petrophysical relationships (Gloaguen et al., 2005).GPR data can be collected in a variety of configurations, with crosshole designs being particularly well suited for groundwater investigations (LaBrecque et al., 2002;Annan, 2005).

Experimental setup
We target lossless media represented by binary images with two facies of homogeneous GPR velocity (6•10 7 and 8•10 7 m/s) resembling river channels (Strebelle, 2002;Zahner et al., 2016;Laloy et al., 2018).For the representation of the prior and posterior exploration, we consider coordinates induced by a VAE (in the following the subscript V AE refers to this specific DGM parametrization) as recent research suggests that their lower degree of nonlinearity in the corresponding networks compared with GANs makes them more amenable for modeling and  (f-h) Corresponding exemplary traveltime gathers.

PCE modeling within MCMC
For surrogate-based modeling, we rely on PCE modeling due to its efficiency, flexibility and ease of deployment (Xiu and Karniadakis, 2002;Blatman and Sudret, 2011;Lüthen et al., 2021;Métivier et al., 2020).We here summarize the most relevant aspects of PCE modeling with a surrogate MGP C (xGP C ), but similar considerations apply to MDGM , F or MLP C , albeit with relevant caveats or advantages that will be discussed below.PCE approximate functions in terms of linear combinations of orthonormal multivariate polynomials Ψα: where M is the dimension of xGP C and A is a subset of N M implementing a truncation scheme to be set based on accuracy requirements and available computational resources (Xiu and Karniadakis, 2002).The training of the coefficients aα is computationally unfeasible when the input domain is high-dimensional (the case for a surrogate F(u) of F(u)).Moreover, when the imposed truncation pattern cannot fully account for the degree of non-linearity of the underlying model (the case for a surrogate MV AE (z) of MV AE (z)), the still unbiased PCE predictions are inevitably affected even if the input domain is low-dimensional.On the other hand, using tailored PCA decomposition as required by MLP C (xLP C ) of MLP C (xLP C ) could decrease the computational burden and increase accuracy.In any case, the surrogate forward modeling predictor can be evaluated at a negligible cost by direct computation of Eq. ( 11) and its accuracy estimated using a validation set or cross-validation techniques (Blatman and Sudret, 2011;Marelli et al., 2021).
We here test the three strategies discussed in section 2.3 using PCE for surrogate modeling in conjunction with the VAE parametrization for prior-sampling.For each strategy we build a corresponding PCE to model traveltime arrivals using the Matlab Package UQlab (Marelli and Sudret, 2014;Marelli et al., 2021).To offer a fair comparison, we employ the same training and validation datasets for all proposed schemes.

VAE-PCE strategy
In the first strategy, referred to as VAE-PCE, the input for the PCE modeling are 20-dimensional z vectors mapping the latent space into the physical one, that is: et al., 2021).This choice amounts to applying the strategy by Meles et al. (2022), as the same parameterization is employed for both prior characterization and surrogate modeling.

Global-PCA-PCE strategy
The second strategy, in the following Global-PCA-PCE, uses inputs of the PCE modeling defined in terms of projections on prior-informed PCA components spanning the entire domain.More specifically, in the Global-PCA-PCE approach we randomly create a total of 1000 slowness realizations GV AE (z) from the prior and compute the corresponding principal components (see Fig. 3).The input for PCE in the Global-PCA-PCE approach are then the projections of GV AE (z) on a to-be-chosen number of such GPCs.Following Meles et al. (2022), all PCA processes are defined in terms of slowness.
The effective dimensionality of the input with respect to MGP C , that is, the number of GPCs representing the input, is not a-priori to be compared to the expected level of GPR data noise of 1 ns for 100 MHz data (Arcone et al., 1998).The number of principal components (i.e., 90 PCs) required to approximate the forward solver below the expected noise level is larger than for the example considered by Meles et al. ( 2022) (i.e., 50 PCs).Building a PCE on such a large basis is challenging in terms of computational requirements and efficiency, and could lead to poor accuracy if a small training set is employed.To address this, one approach is to either reduce the number of components, which introduces larger modeling errors, or explore alternative parameterizations that offer improved computational efficiency and accuracy.
In this study, the Global-PCA-PCE approach utilizes 60 GPCs, while an alternative strategy is considered below that is based on physical considerations.

Local-PCA-PCE strategy
As mentioned in section 2.3, an improved parametrization for surrogate modeling can sometimes be found by considering the forward problem's specific characteristics.The GPCs in the Global-PCA-PCE approach refer to the input field in its entirety.However, the actual firstarrival time for a given source/receiver combination depends only on a sub-domain of the entire slowness distribution.This leads us to suggest a Local approach, in the following referred to as Local-PCA-PCE.Instead of using principal components describing the entire slowness field, we aim to employ output-specific LPCs that characterize only the sub-domains impacting the physics of the problem (Jensen et al., 2000;Husen and Kissling, 2001).We then expect that fewer LPCs are needed than GPCs to achieve the desired input/output accuracy.In practice, our construction of LPCs involves utilizing fat-ray sensitivity kernels, which capture the sensitivity of arrival-times to small perturbations in the subsurface model, thus providing valuable insights into the regions (corresponding to the L subset in ( 8)) that have the most significant impact on the observed data (Husen and Kissling, 2001).For a given source/receiver pair, the corresponding sensitivity kernel depends on the slowness field itself and its patterns can vary significantly (see Fig. 5(a)-(j)).The sought Local decomposition needs to properly represent any possible slowness field within the prior, thus it reasonable to define it based on a representative sample of input realizations.To reduce the overall PCE computational cost it is also convenient to limit the number of used output-driven decompositions.To achieve these goals, we assume that the prior model is stationary with respect to translation.Instead of focusing on each specific source/receiver pair (a total of 144), we can then consider each source/receiver angle (a total of 23).We then use a total of 1000 slowness realizations GV AE (z) from the prior and build the corresponding fat-ray sensitivity kernels using the reference eikonal solver for each of the 23 possible angles ( Bayesian tomography using polynomial chaos expansion and deep generative networks 13 2013).For any given angle, we consider a cumulative kernel consisting of the sum of the absolute values of each kernel (green areas in Fig.

5(k)-(t))
. Such a cumulative kernel cover an area larger than each individual kernel but is still considerably smaller than the entire slowness field.For any possible additional input model, the corresponding sensitivity kernel is then very likely to be geometrically included in the area covered by the cumulative kernel (see Fig. 5(k)-(t)).Based on this insight, we define principal components spanning only the area covered by such cumulative kernels or relevant parts thereof (e.g., a threshold can be taken into consideration to reduce the size of these sub-domains).
For the practical definition of the components, the cumulative kernels are either set to 0 or 1 depending on whether the corresponding value is below or larger than the threshold, respectively.We then multiply point by point the slowness distributions with the cumulative kernels, and consider the principal components of such products.
The first five LPCs are shown for given source/receiver pairs (Figs.6(a)-(e) and (f)-(j)).Note that the pattern variability is confined within the cumulative kernels, while in the complementary areas the values are 0. Note also that compared to the five principal components in Fig. 3, higher resolution details can be identified.Given the same number of principal components, we can then expect the input to be better presented in the physically relevant sub-domain when the Local-PCA-PCE rather then the Global-PCA-PCE approach is followed.For all source/receiver pairs corresponding to the same altitude angle, the same kernels and principal components are used, provided they are shifted to cover the appropriate geometry.
In Figs.More importantly, the modeling errors provided by using just 30 sensitivity-based LPCs is lower than what was previously provided by 90 standard components (i.e., rms ≈ 0.45 ns).By incorporating these tailored LPCs, we can attain enhanced output fidelity when utilizing truncated representations of the input.This enhanced fidelity proves particularly advantageous for the implementation of PCE, allowing for more precise and efficient modeling.Consequently, this approach holds substantial promise in achieving superior accuracy and computational efficiency in PCE-based analyses.
In summary, we have introduced three different parametrizations to be used as input for PCE.We consider coordinates inherited by the VAE, and principal components derived by considering either entire slowness fields or sensitivity-based sub-domains.We refer to these three parametrizations in what follows as VAE-PCE, Global-PCA-PCE and Local-PCA-PCE, respectively.

PCE performance
We here analyze the PCE performance of the different input parametrizations for surrogate modeling introduced in Section 3.2.In agreement with Meles et al. (2022), we consider for each surrogate a total of 900 training sets consisting of noise-free travel times calculated by the eikonal solver and a polynomial degree p of five.

VAE-PCE performance
When the VAE parametrization is used as input, the PCE performance is rather poor, with a rmse of 2.01 ns, which is well beyond the expected noise level in the data and is consequently considered unsatisfactory (Fig. 8(a)).The poor performance is due to the highly nonlinear map MV AE .In such a scenario, PCE does not provide accurate results even if the physical input of the validation set is exactly defined by GV AE (z) = u.In this scheme, note that the evaluation of GV AE (z) is not required to compute the corresponding PCE.Given the low accuracy of the PCE performance for this parametrization, we can expect the corresponding likelihood to differ significantly from that involving exact modeling.

Global-PCA-PCE performance
Despite the comparatively poor reconstruction of the input (e.g., G G/LP C (x) ≈ u) provided by the PCA approaches, the corresponding parametrizations perform well when used as input to build PCE surrogates, with the Global-PCA-PCE approach being outperformed by  corresponding likelihood to only slightly differ from that involving no truncation in the LPC projection and exact modeling.
Depending on the input parameterization, PCEs approximate eikonal and FDTD solvers to different degrees.Figure 9 represent the corresponding covariance matrices accounting for the modeling error of each surrogate model.
We now discuss the computational burden of each strategy when running on a workstation equipped with 16GB DDR4 of RAM and powered by a 3.5GHz Quad-Core processor running Matlab and Python on Linux.We emphasize that our goal with the present manuscript is to propose novel methods to conjugate VAE and PCE rather than offer optimized implementations.
There are up to three relevant computational steps in the execution of a forward run for the VAE-PCE, Global-PCA-PCE, Local-PCA-PCE, eikonal and FDTD simulations, namely the evaluation of the physical input, GV AE (z), its decomposition on either GPCs or LPCs and the actual evaluation of the forward or surrogate model.Not all methods require each of these steps.The VAE-PCE model, for example, is or by optimizing the calls to the Matlab/Python scripts in our code for greater efficiency.When in its native environment, evaluation of GV AE (z) is actually very fast, taking only ≈ 0.005 and ≈ 0.08s when operating on one or 35 inputs, respectively.Still, note that this cost is overall negligible even in our non-optimized setting when considering expensive physics-based forward solvers such as FDTD.Only the Global-PCA-PCE and Local-PCA-PCE strategies require PCA decompositions.The Global-PCA-PCE approach is faster, requiring only up to ≈ 0.002s and ≈ 0.05s when applied to one and 35 input elements, respectively, while the Local-PCA-PCE method is slower, taking up to ≈ 0.06s and ≈ 0.23s in the same situation.For the Global-PCA-PCE method the cost of a single forward run is ≈ 0.52s, which is significantly more than for VAE-PCE.The difference between these two PCE strategies can be attributed to the significantly larger number of input parameters of the Global-PCA-PCE with respect to the VAE-PCE scheme (i.e., 60 vs 20).Note that the PCE model evaluations are vectorized and, therefore, the cost is almost the same when applied to 35 input (≈ 0.57s).Moreover, the computational cost of the Global-PCA-PCE method could be reduced by applying a PCA decomposition of the output, akin to what is proposed in Meles et al. (2022).Despite involving fewer variables than the Global-PCA-PCE approach, the Local-PCA-PCE method is slightly more computationally demanding with a cost of ≈ 0.64s and ≈ 0.65s, respectively, when operating on one or 35 input, respectively.The increase in cost compared to the Global-PCA-PCE method depends on the fact that for each source-receiver angle (θ), the Local-PCA-PCE utilizes its own polynomial basis, denoted as Ψ θ in the following.
In comparison with the PCE methods, the cost of the reference eikonal solver is basically a linear function of the number of input distributions it operates on.A single run requires ≈ 0.05s, while for 35 velocity distributions the cost increases up to ≈ 1.67s.As such, its cost is either significantly smaller or slightly larger than what is required by the Global/Local-PCA-PCE approaches.Finally, the cost required by the reference FDTD code is ≈ 120s and ≈ 4200s if operating on either one or 35 velocity distributions, which is orders of magnitude longer than for the eikonal or PCE models.These results are summarized in Table 1, where we estimate the performance of an ideally-optimized Local-PCA-PCE method benefitting from (a) evaluating GV AE (z) in its native environment and (b) using a single family of polynomials Ψ θ for all angles.In numerical results not presented herein, we find that choosing any one of the Ψ θ families for all models provides nearly identical fidelity to what is achieved by using specifically tailored polynomials for each angle at the considerably smaller cost of ≈ 0.06 and 0.16s when applied to either one or 35 input, respectively.While such a result cannot be generalized, it is always possible to test the corresponding PCEs accuracy with a representative evaluation set.The option of relying on a single family of polynomials for the Local-PCA-PCE method is certainly to be taken into account when optimising the approach.
Table 1.Summary of the computational cost of the various steps for a single realization/batch of 35 input of the proposed algorithms.In addition to the strategies used in the MCMC examples discussed in the manuscript and summarized in the first four columns (i.e., the VAE-PCE, Global-and Local-PCA-PCE and eikonal schemes), we also consider the cost of the FDTD approach using the reference code (fifth column) and an optimized Local-PCA-PCE approach ideally benefitting from executing the VAE decoder in the same environment as the PCE model and based on a single polynomial family for all angles (sixth column).

Inversion results
We now explore the performance of the different input parametrizations used for PCE-based surrogate modeling, namely VAE-PCE, Global-PCA-PCE and Local-PCA-PCE, when performing probabilistic MCMC inversion.The inversions were carried out using the UQLAB Matlabbased framework (Marelli and Sudret, 2014;Wagner et al., 2021b).As reference model, consider the field shown in Fig. 10, which is used to generate a total of 144 traveltimes using the reference eikonal and FDTD solvers.Note that this field is not used to train the PCEs.
Uncorrelated Gaussian noise characterized by σ 2 = 1ns 2 was added to the data used for inversion.
We use a Metropolis-Hastings algorithm, and run 35 non-interacting Markov chains in parallel for 4 × 10 5 iterations per chain.During burn-in determined according to the Geweke method, the scaling factor of the proposal distribution is tuned such that an acceptance rate of about 30% is achieved for each experiment (Geweke, 1992;Brunetti et al., 2019).Finally, outlier chains with respect to the Inter Quartile-Range statistics discussed in Vrugt et al. (2009) are considered aberrant trajectories and are ignored in the analysis of the results.
We first present the results for training data generated by an eikonal solver.We compare VAE-PCE, Global-PCA-PCE and Local-PCA-PCE inversion results to those achieved by employing the eikonal solver, which represent the reference solution since the full physics solver is used in the entire MCMC process.This test provides an ideal benchmark as we compare the proposed PCE strategies with results derived from exact modeling.Note that such a comparison is not feasible for the FDTD scenario due to the computational burden associated with the corresponding MCMC process.Inversion results in terms of mean and standard deviations incorporating the model error (i.e., using the PCE-derived CT app in Eq. ( 10)) are shown in Figs.11(a-e).The mean of the posterior obtained using the VAE-PCE poorly resembles the reference velocity field, with relevant topological differences between the two (compare Fig. 10 to Fig. 11(a)).Note that the data misfit is particularly large (i.e., 3.49 ns).This poor performance is also obtained when considering other test models (see Appendix A).The mean of the posterior provided by the Global-PCA-PCE approach shares many features with the reference distribution, but differences are apparent in the lower and upper channelized structures.The similarity between the posterior mean and the true distribution increases significantly

DISCUSSION
Deep generative models offer a flexible framework for encoding complex spatial priors, enabling the description of intricate input distributions.However, proper parametrization of prior distributions alone does not ensure efficient estimation of posterior distributions when using MCMC methods.In many practical situations, the use of surrogate models becomes beneficial or even essential for evaluating the likelihood functions effectively.Surrogate modeling with PCE has become widespread in many disciplines.The massive decrease of the computational costs associated with PCE is achieved by approximating demanding computational forward models with simple and easy-to-evaluate  functions.A key requirement is that the number of input variables describing the computational model is relatively small (i.e., up to a few tens) and that the target model can be approximated by truncated series of low-degree multivariate polynomials.The number of coefficients defining the PCE model grows polynomially in both the size of the input and the maximum degree of the expansion.When the reference full-physics model response is highly nonlinear in its input parameters, the problem is typically non-tractable (Torre et al., 2019).Since the high-fidelity mapping of complex prior distributions provided by DGMs is based on highly non-linear relationships between latent variables and physical domains, replicating the performance of such networks and/or composite functions thereof (e.g., MV AE = F •GV AE in Eq. 5) using PCE is problematic.To circumvent this challenge, we have explored two PCA-based decompositions that facilitated the straightforward implementation of PCE.One decomposition was designed to encompass the entire input domain, while the other specifically focused on subdomains of particular physical relevance.While this latter concept is investigated here in the context of traveltime tomography, the integration of problem-related PCs operating synergistically with DGM-defined latent parametrizations has the potential for broader applications.
Whatever the choice of input coordinates, for example, based on PCA or local properties of the input, the determining criterion for evaluating the quality of the corresponding PCE should always be performance on a representative validation set.In case of PCA, the lower bound of prediction misfit rmse can be a priori estimated by comparing the accuracy of the reference model acting on the full and the compressed domains, that is, M G/LP C (x f ull ) and M G/LP C (x).In our case, such lower bounds for the Global-PCA-PCE approach operating with 60 components is 0.85 ns.different.This is in contrast with the Global-PCA-PCE method, for which each model operates via a single family of polynomials.Thus, the Local-PCA-PCE scheme is computationally more demanding than the Global-PCA-PCE (see Table 1).However, the use of a single family of polynomials can also be considered for the Local-PCA-PCE method, resulting in shorter run time.When considering computational performance, an optimal implementation of GV AE (z) should also be sought.
In this study, to determine the minimum number of G/LPCs for constructing an accurate PCE, we assess the lower bound of output prediction misfit rmse as a function of the number of G/LPCs used.We project the input onto subsets of G/LPCs, typically ranging in the tens.This process generates non-binary images, which are then utilized to compute the output using the reference forward modeling.
Alternatively, we could consider re-binarizing the reconstructed images as done in Thibaut et al. (2021).This approach would bring the projected input back into the prior, but this property is not necessarily relevant for the determination of PCE accuracy.Irrespective of the chosen reconstruction algorithm, the Local approach maintains a significant advantage over the Global method.When using an equal number of components, LPCs, in fact, consistently yield superior approximations of the relevant input compared to GPCs.
We have seen that once a DGM-based latent parametrization has been found to reduce the effective dimensionality of the input domain and, based on PCA decompositions, high fidelity PCEs have been trained, MCMC inversion can be efficient.Relying on PCE rather than advanced deep learning methods for surrogate modeling can be advantageous in terms of ease of implementation, as potentially complex training of a neural network is not needed.Many effective sampling methods, such as Adaptive Metropolis, Hamiltonian Monte Carlo, or the Affine Invariant Ensemble Algorithm (Duane et al., 1987;Haario et al., 2001;Goodman and Weare, 2010) could be considered in our workflow instead of the current use of a standard Metropolis-Hastings sampling algorithm.
Adaptation of the Global-PCA-PCE strategy presented here could easily be employed for other imaging problems such as active or passive seismic tomography at different scales (Bodin and Sambridge, 2009;Galetti et al., 2017).On the other hand, implementation of the Local-PCA-PCE schemes would depend on the properties of the corresponding sensitivity kernels, which would require more careful evaluation and problem-specific design.

CONCLUSIONS
Low-dimensional latent variables associated with deep generative models, such as VAE, conform well to complex priors, and provide an attractive basis to explore posterior model distributions using MCMC strategies.MCMC methods can also benefit greatly from surrogate modeling based on PCE, provided the forward model can be approximated by low-degree multivariate polynomials.However, this type of investigated in this study, PCA-based methods are ineffective in encoding the prior knowledge.Promising alternatives are offered by deep generative models (DGM) that learn the underlying input distribution and generate synthetic samples that closely resemble the statistics in a provided dataset based on TIs.For instance, Variational Autoencoders (VAEs) encode data patterns into a compact latent

OFigure 1 .
Figure 1.Graphical schematic of the three strategies discussed in this manuscript.The prior input distribution is always parametrized by the latent space of the DGM, with P (z) = N (0, I dim(z) ), while the input to the reference/surrogate model depends on the chosen strategy.When the DGM parametrization is chosen, draws of the latent variables are considered (left column).With the PCA strategies, Globally or Locally defined G/LPCs are used for the Global and

Figure 2 .
Figure 2. (a-e) Representative velocity fields generated by the decoder of the employed VAE; crosses and circles stand for sources and receivers, respectively.

OFigure 3 .
Figure 3. (a-j) The first five GPCs in the input domain corresponding to entire slowness fields.Crosses and circles stand for sources and receivers, respectively.

O
Downloaded from https://academic.oup.com/gji/advance-article/doi/10.1093/gji/ggae026/7577609 by guest on 01 February 2024 a similar approach toMeles et al. (2022), the effective dimensionality is here assessed by comparison with the reference solution in the output domain with respect to the noise level.InFigs.4(a)  and (e), two velocity distributions are shown next to the approximate representations (Fig.4(b-d) and (f)-(h)) obtained by projecting them on 30, 60 and 90 GPCs, respectively.As expected, the reconstruction quality improves as more principal components are included.To quantify the faithfulness of the various reduced parametrizations in terms of the output, we consider 100 realizations of the generative model, and compute the resulting histograms of the traveltime residuals using the reference forward solver.The root-mean-square error (in the following, rmse) of the misfit between the data associated with the original distribution and its projections on 30, 60 and 90 principal components, shown in Figs.4(i)-(k), are 1.60, 0.85 and 0.55 ns, respectively, that are

Figure 4 .
Figure 4. (a) and (e) Random velocity distributions and the corresponding representations using the first (b) and (f) 30, (c) and (g) 60, and (d) and (h) 90 GPCs defined with the Global approach; (i-k) corresponding histograms of traveltime residuals based on simulations on the true field and partial reconstructions for 100 velocity distributions.Crosses and circles stand for sources and receivers, respectively.
7(a)-(g) the two slowness distributions from Fig. 4 are shown next to the representations obtained by projecting them on 30 LPCs.In the areas complementary to the sensitivity kernels, the speed is set to 0.07 m/ns.Input reconstructions are remarkably improved with respect to when using the same number of GPCs (compare Figs. 7(a)-(g) and (h)-(l) to Fig. 4(b) and (f)) of the entire slowness field.

14Figure 5 .OFigure 6 .
Figure 5. (a)-(j) For the velocity fields in Fig. 2, the sensitivity-based kernels for two arbitrarily selected source/receiver pairs are shown superimposed on the corresponding velocity distributions.(k)-(t) The same sensitivity kernels as in (a)-(j), but superimposed on the cumulative kernels (green shaded areas) used to define the support of the sensitivity-based LPCs used in the Local-PCA-PCE approach.

OFigure 7 .
Figure 7. (a) and (g) Velocity fields with (b)-(f) and (h)-(l) the corresponding representations used for surrogate modeling based on the first 30 LPCs used in the Local-PCA-PCE approach.Different kernels are used for each source/receiver angle.

Figure 8 .
Figure 8. (a)-(c): Histograms of the model error with respect to the PCE prediction when using the VAE (20 input parameters), Global (60 input parameters) and Local (30 input parameters per angle) parameterizations of the input in the PCE-based surrogate modeling, respectively, using the eikonal solver to compute the training set.(d) Histogram of the model error using Local parameters and FDTD reference data.

Figure 9 .O
Figure 9. Model error covariance matrices associated with PCE-based surrogate modeling based on: (a) VAE-PCE, (b) Global-PCA-PCE and (c) Local-PCA-PCE schemes trained with 900 eikonal datasets.(d) Model error covariance matrix for the Local-PCA-PCE scheme trained with FDTD data.

20Figure 10 .
Figure 10.The reference velocity distribution used in the numerical inversion experiments.

Figure 13 .
Figure 13.Histograms of SSIM values across the posterior distributions for the various inversion strategies.

Figure 14 .O
Figure 14.Random samples from the posterior obtained by employing the (a) VAE-PCE, (b) Global-PCA-PCE, (c) Local-PCA-PCE, (d) eikonal and (e) FD Local-PCA-PCE strategies.The results in c-e are visually similar.
Downloaded from https://academic.oup.com/gji/advance-article/doi/10.1093/gji/ggae026/7577609 by guest on 01 February 2024 1 , Stefano Marelli 2 and Niklas Linde 1 The use of this new set of Giovanni Angelo Meles 1 , Macarena Amaya 1 , Shiran Levy 1 , Stefano Marelli 2 and Niklas Linde 1 physically-informed assumption, we can achieve this goal by means of a Local PCA decomposition restricted to L: entire domain of u, proximity in the output domain can be attained by approximating the input within this Local region.Based on this strong The latent representation Downloaded from https://academic.oup.com/gji/advance-article/doi/10.1093/gji/ggae026/7577609 by guest on 01 February 2024 (Irving and Knight, 2006);Levy et al., 2022b)., 2022b).The details of the VAE utilized in this study can be found inLaloy et al. (2017);Lopez-Alvis et al. (2021)with the dimension of the latent space being 20.As for the output, we consider arrival-times associated with the crosshole configuration displayed in Figs.2(a-e) with 12 evenly spaced sources and receivers located in two vertically-oriented boreholes.The distance between the boreholes is 4.6 m, while the spacing between sources/receivers is 0.9 m, leading to 144 traveltimes.We employ both an eikonal and a 2D Finite Difference Time Domain (FDTD) solver to simulate noise-free propagation of GPR waves(Irving and Knight, 2006; Downloaded from https://academic.oup.com/gji/advance-article/doi/10.1093/gji/ggae026/7577609 by guest on 01 February 2024 Hansen et al., 2013).For the FDTD code each source is characterized by a 100 MHz Blackman-Harris function, while perfectly matched layers surrounding the propagation domain are used to prevent spurious reflections from contaminating the data, while appropriate space-time grids are employed to avoid instability and dispersion artefacts.Traveltimes are picked automatically based on a threshold associated with the relative maximum amplitude of each source-receiver pair.

Table 2 .
Assessment of the inversion results in the input and output domains for the various surrogate modeling strategies.
Using the Local-PCA-PCE scheme with 30 components only, the rmse drops to 0.55 ns.However, the accuracy of the corresponding Global/Local-PCA-PCE is worse, with rmse of 1.31 and 0.67 ns, respectively, mainly due to the small size of the training set.Note that while the lower bound of the PCA decomposition decreases when more PCs are taken into account, the corresponding accuracy of PCE is primarily limited by the size of the training set.Increasing the number of components can actually worsen PCE performance if the training set is insufficient to determine the polynomial coefficients, which, as mentioned, grow significantly with input size.In our case, using 90 components would imply an rmse of 1.39 ns, which is worse than what was obtained by the 60 components PCE.Both the VAE-PCE and Global-PCA-PCE consist of 144 (i.e., the total number of source/receiver combinations) different PCE models operating, for each traveltime simulation, on an identical input, that is, the latent variables of the DGM or the 60 PCs characterizing the Downloaded from https://academic.oup.com/gji/advance-article/doi/10.1093/gji/ggae026/7577609 by guest on 01 February 2024