HMCLab: a framework for solving diverse geophysical inverse problems using the Hamiltonian Monte Carlo method

The use of the probabilistic approach to solve inverse problems is becoming more popular in the geophysical community, thanks to its ability to address nonlinear forward problems and to provide uncertainty quantification. However, such strategy is often tailored to specific applications and therefore there is a lack of a common platform for solving a range of different geophysical inverse problems and showing potential and pitfalls. We demonstrate a common framework to solve such inverse problems ranging from, e.g, earthquake source location to potential field data inversion and seismic tomography. Within this approach, we can provide probabilities related to certain properties or structures of the subsurface. Thanks to its ability to address high-dimensional problems, the Hamiltonian Monte Carlo (HMC) algorithm has emerged as the state-of-the-art tool for solving geophysical inverse problems within the probabilistic framework. HMC requires the computation of gradients, which can be obtained by adjoint methods, making the solution of tomographic problems ultimately feasible. These results can be obtained with"HMCLab", a tool for solving a range of different geophysical inverse problems using sampling methods, focusing in particular on the HMC algorithm. HMCLab consists of a set of samplers and a set of geophysical forward problems. For each problem its misfit function and gradient computation are provided and, in addition, a set of prior models can be combined to inject additional information into the inverse problem. This allows users to experiment with probabilistic inverse problems and also address real-world studies. We show how to solve a selected set of problems within this framework using variants of the HMC algorithm and analyze the results. HMCLab is provided as an open source package written both in Python and Julia, welcoming contributions from the community.


Introduction
In the probabilistic approach, an inverse problem essentially represents an indirect measurement where the knowledge about the observed data and model parameters is completely expressed in terms of probabilities (Tarantola, 2005).Within such formalism, the general solution to the inverse problem is a probability density function (PDF), i.e., the posterior PDF (see Tarantola & Valette (1982) and Mosegaard & Tarantola (2002) for a detailed explanation).The posterior PDF is constructed from the combination of two pieces of separate information: 1) the prior knowledge on the model parameters, expressed by the PDF ρ(m), where m represents the model parameters and 2) the information provided by the experiment, described by L(m).The posterior distribution, under certain fairly wide assumptions, is then given by (Mosegaard & Tarantola, 2002, Tarantola, 2005): (1) Since σ(m) is a PDF, it requires to evaluate the relevant integrals to find features of interest.For example, calculating the expected model given the data requires evaluating the following integral where M represents the whole model space.
In the particular case of linear forward models and Gaussian uncertainty, the probabilistic formalism provides the same closed-form solution than the classical least squares approach.In general, however, such high-dimensional integrals cannot be computed.Therefore, we resort to sampling, a technique to approximate the computation of high-dimensional integrals.By generating points in the model space whose density (number of points per unit volume) is proportional to the posterior σ(m), i.e., samples, one greatly reduces the amount of computations required to estimate statistics (such as E [m]) compared to a systematic grid search.Markov Chain Monte Carlo (MCMC) methods provide a clever way to construct a Markov chain that produces samples drawn for the target distribution, i.e., the posterior PDF.Statistical analysis of the samples obtained with MCMC then provides the answer to any inquiry in terms of probability of certain events, i.e., specific features of the solution.Practically, this means we can compute probabilities related to particular properties or structures of the solution.For instance, we might be interested in the probability of a certain geological body to have a certain volume, or the probability that there is a continuous permeable layer connecting two locations in the subsurface.MCMC to sample target distributions has evolved from the appearance of the original Metropolis algorithm in physics (e.g., Metropolis et al., 1953, Metropolis & Ulam, 1949) into a plethora of variants in many different scientific fields, including geophysics (see Sambridge & Mosegaard, 2002, for a review).The MCMC method to solve inverse problems in geophysics has a long history, which dates back to the pioneering work of Keilis-Borok & Yanovskaja (1967) and Press (1968), which were contemporary to the classic work of Backus & Gilbert (1967) on geophysical inverse theory.The formalization of a comprehensive theoretical framework based on probability theory for geophysical problems was started in the 80s by Tarantola & Valette (1982) and extended later on to include MCMC sampling methods to solve nonlinear inverse problems (e.g., Mosegaard, 2011, Mosegaard & Tarantola, 1995, 2002, Tarantola, 2005).An extensive review can be found in Dębski (2010).
There has been a recent increase in the use of Monte Carlo methods to solve (geophysical) inverse problems, essentially for two reasons: I) relatively recent advances in the sampling algorithms and II) substantial increase in the available computational resources.These advances now allow us to tackle medium (hundreds of model parameters) to relatively large-scale (tens of thousands model parameters) inverse problems within the probabilistic framework.Regarding I), the literature provides a large collection of generic algorithms to perform Monte Carlo sampling, including classic MCMC (e.g., Hastings, 1970), slice sampling (e.g., Neal, 2003), Gibbs sampling (e.g., Geman & Geman, 1984), rejection sampling (e.g., Gilks et al., 1995), sequential Monte Carlo (e.g., J. S. Liu & Chen, 1998) and trans-dimensional Monte Carlo (e.g., Green, 1995), just to cite some of the main categories.The amount of literature dedicated to such algorithms from different fields is so vast that it would be pointless to attempt to give an overview here.More specifically, in geophysics there has been a constant increase in the number of algorithms proposed both for (pseudo-)sampling the posterior PDF and performing global optimization.These include, for instance, an extension of the classic Metropolis-Hasting sampler (e.g., Mosegaard & Tarantola, 1995), simulating annealing for global optimization (e.g., Stoffa & Sen, 1991), a strategy based on the nearest neighbour (Voronoi) partition of the model space (e.g., Sambridge, 1999), joint inversion of different geophysical data with a cascade MCMC (e.g., Bosch, 1999), trans-dimensional MCMC inversion (e.g., Bodin & Sambridge, 2009, Malinverno, 2002) and samplers with geostatistical-based priors (e.g., Hansen et al., 2012, Zunino et al., 2015).Regarding II), the geophysicist's arsenal for solving inverse problems now includes large high-performance computing resources and more easily accessible computation accelerators such as GPUs, TPUs and high-thread-count CPUs.
Traditional algorithms such as the random walk Metropolis may find difficult to setup an efficient sampler for large problems because of two reasons.The first is the property of generating correlated samples which requires a large number of iterations to obtain reliable statistics.The second is the difficulty for the proposal mechanism to generate high-probability models.In high-dimensional spaces, simply randomly perturbing a model will be very unlikely to produce another model with a higher posterior PDF.
The reason is that high-dimensional spaces tend to be very empty ("curse of dimensionality") and so the vast majority of search directions (random perturbations) will point to low probability regions, making the overall algorithm inefficient (Curtis & Lomax, 2001b).If strong prior information is available and if sampling directly the prior is a possibility, the extended Metropolis algorithm (Mosegaard & Tarantola, 1995) can offer a substantial improvement in terms of efficiency.Nevertheless, defining a geologically realistic prior and being able to sample it may not be an easy task in practice.Other approaches are based on an adaptive algorithm which, for instance, computes an approximate local covariance matrix and then samples such information to increase the chance of moving in a direction of higher probability (e.g., Gilks et al., 1996).A recently proposed methodology for improving the proposal strategy is that of constructing an ad hoc proposal PDF based on the results of a simplified deterministic inversion (Khoshkholgh et al., 2021(Khoshkholgh et al., , 2022)), which will make the sampling more efficient in practice, without altering the final equilibrium distribution.This may enable the solution of large problems, although such methodology requires solving an additional inverse problem beforehand and performing some sort of interpretation in addition to the estimation of the modeling error.
The Hamiltonian Monte Carlo (HMC) method has recently gained attention in solid Earth geophysics because of its peculiar properties (Duane et al., 1987, Fichtner et al., 2019, Neal, 2012).It combines sampling with ideas from the field of optimization, where the proposal mechanism exploits also information coming from the gradient of the posterior distribution.This unique combination enables a more efficient solution of problems when the calculation of gradients is not computationally too expensive with respect to the cost of simulating the forward problem, e.g., when adjoint methods (Fichtner et al., 2006, Plessix, 2006, Tarantola, 1984, Tromp et al., 2005) come into play (e.g., Fichtner et al., 2019, Gebraad et al., 2020, Zunino & Mosegaard, 2018).This is effectively a result of the No Free Lunch-theorem described by Wolpert & Macready (1997): one applies prior knowledge to the objective function and its properties (i.e., cheaply available gradient information), whereby it becomes possible to select a relatively efficient algorithm.HMC is capable of generating more uncorrelated samples compared to traditional purely random-walk based algorithms such as the random walk Metropolis algorithm (Neal, 2012), producing more accurate statistical estimations with a smaller number of sampler.Thanks to this property, HMC is more suitable to address high-dimensional inverse problems than traditional derivative-free sampling methods.In fact, the cost of generating independent samples with HMC under increasing dimension n grows as O(n 5/4 ) (Neal, 2012), whereas it grows as O(n 2 ) for standard Metropolis-Hastings (Creutz, 1988) An alternative and recently popularized approach using also information from the gradient is Stein Variational Gradient Descent (Q.Liu & Wang, 2019, SVGD,) a variational inference algorithm, which aims at approximate inference by minimizing the Kullback-Leibler divergence between the proposed and target distributions.
Although the theoretical formalism and the infrastructure to perform intensive computations are there, a common framework to address different geophysical inverse problems has not emerged yet.Implementations of the HMC algorithm are typically application-specific and often not easily accessible to non-specialists.In addition, as these methods are nascent in the field of solid Earth geophysics, the community as a whole has not had time to acquire substantial expertise in the usage of these methods in order to evaluate their potential and routinely apply them to realistic problems.This work aims at facilitating at least part of the generation of this expertise, specifically in applying gradient-based sampling methods to inverse problems and analyzing their results.In this work we show how HMC can be used to obtain useful information from a set of diverse geophysical data sets through some illustrative selected examples from seismology and potential fields problems.All problems are addressed within the same framework, where generic samplers and data structures allows us to easily experiment with different data, priors and possibly to combine them.
These results are obtained with "HMCLab", a tool to solve research problems and a numerical laboratory for experimenting with inverse algorithms such as HMC for a variety of geophysical topics.HMCLab provides software for a set of geophysical problems, for which functions to solve the forward problem, compute gradients of the misfit function and several kinds of priors and samplers for the HMC method are provided.This package is currently written partly in Python (van Rossum, 1995) and partly in Julia (Bezanson et al., 2017), depending on the specific problem.It is, however, in constant evolution as new geophysical problems are added or translated into the other one of the two languages.Moreover, users can supply their own forward model functions and priors which can easily be used with the HMC samplers.In addition, several Jupyter Notebooks are provided that guide the user through the various aspects of applying MCMC algorithms and analyzing their results, for various inverse problems.
In the following, we first give a brief overview of the core of HMC algorithms and illustrate what kind of information can obtained by solving some selected example problems using HMCLab.

The original HMC sampler
HMC constructs a Markov chain over an n-dimensional probability density function σ(m) using classical Hamiltonian mechanics.The algorithm regards the current state m of the Markov chain as the location of a physical particle in an n-dimensional space M (i.e., model or parameter space).It moves under the influence of a potential energy, U , which is defined as (3) To complete the physical system, the state of the Markov chain needs to be artificially augmented with momentum variables p and a generalized mass for every dimension pair.The collection of resulting masses is contained in a symmetric positive definite mass matrix M of dimension n × n.The momenta and the mass matrix define the kinetic energy of the particle as In the HMC algorithm, the momenta p are drawn randomly from a multivariate Gaussian with covariance matrix M (the mass matrix).The sum of the location-dependent potential and momentum-dependent kinetic energy constitute the total energy, or Hamiltonian, of the system The Hamiltonian dynamics are governed by the following equations, which determine the position and momentum of the particle as a function of time τ .This time τ is artificial just like the mass matrix, it has no connection to the actual physics of the inverse problem at hand.We can simplify Hamilton's equations using the fact that kinetic and potential energy depend only on momentum and location, respectively, to obtain Evolving m over time τ generates another possible state of the system with new position m, momentum p, potential energy Ũ , and kinetic energy K. Due to the conservation of energy, the Hamiltonian is equal in both states, i.e., U + K = Ũ + K. Successively drawing random momenta and evolving the system generates a distribution of the possible states of the system.Thereby, HMC samples the joint momentum and model space, referred to as phase space.As we are not interested in the momentum component of phase space, we marginalize over the momenta by simply dropping them.This results in samples drawn from σ(m).
If one could solve Hamilton's equations exactly, every proposed state (after burn-in) would be a valid sample of σ(m).Since Hamilton's equations for non-linear forward models cannot be solved analytically, the system must be integrated numerically.Suitable integrators are symplectic, meaning that time reversibility, phase space partitioning and volume preservation are satisfied (Fichtner et al., 2019, Neal, 2012).In this work, we employ the leapfrog method as described in Neal ( 2012), with higher order symplectic integrators also implemented.However, the Hamiltonian is generally not preserved exactly when explicit time-stepping schemes are used (e.g., Simo et al., 1992).Therefore, the time evolution generates samples not exactly proportional to the original distribution.A Metropolis-Hastings correction step is therefore applied at the end of numerical integration.
In summary, at each iteration, samples are generated starting from a randomly drawn model m in the following way: The mass matrix M is one of the important tuning parameters of the HMC algorithm; details on its meaning and suggestions for tuning can be found in Fichtner et al. (2019Fichtner et al. ( , 2021)).Moreover, employing the discrete leapfrog integrator implies that there are two additional parameters that need to be tuned, namely the time step and the number of iterations L (Neal, 2012).

HMC variants
The algorithm described so far is the simplest version of HMC, however, several variants of the original algorithm exist, which mostly aim at automatically tuning some of the parameters or improving mixing (Fichtner et al., 2021, Neal, 2012, Sambridge, 2014).
A notable example is the No U-Turn Sampler (NUTS) (Hoffman & Gelman, 2014), which aims at providing an automatic tuning of the two leapfrog integrator-related parameters, and L. NUTS finds a suitable value for during the burn-in and then fixes it for the following iterations to avoid breaking the detailed balance property.The number of iterations L instead is dynamically adjusted ("dynamic HMC") at each iteration in a way such that there is no doubling back of the trajectory.This allows for long or short moves depending on the region of the model space which the algorithm is visiting.NUTS is implemented in HMCLab following Hoffman & Gelman (2014).
Additionally, methods to investigate inverse problems that might show strongly isolated modes exist.By running multiple chains with tempered (i.e.smoothed) posterior PDFs and letting these samplers exchange states, the exploration of local minima might be accelerated (Sambridge, 2014).Tempered trajectories following Neal (2012) may also help discovering isolated modes.This variation does not require multiple Markov chains, nonetheless, it is able to more easily transition between local minima, at the expense of a reduced acceptance rate.

Gradient computations
As mentioned above, one important aspect of a successful HMC strategy is the capability to efficiently compute the gradient of the potential energy ∇U (m) = ∇(− log(σ(m))).The first method that proves powerful for relatively simple models is to evaluate derivatives analytically.This typically allows for cheap computation of the gradients, but is only applicable to models that can be analytically differentiated.This is most notably used for the joint non-linear source location and medium velocity estimation as mentioned later in this manuscript.For larger problems, a tool that can provide a substantial help in making gradient calculations efficient is the adjoint method (e.g., Fichtner et al., 2006, Hinze et al., 2008, Lions, 1971, Plessix, 2006, Talagrand & Courtier, 1987, Tarantola, 1984, Tromp et al., 2005).This strategy allows us to compute the gradient ∇U with a computational cost of about two (three in practice) forward simulations, much cheaper than other approaches such as finite difference methods.We employ the adjoint technique in some of our geophysical problems, namely in the case of acoustic and elastic full waveform inversion and for the nonlinear traveltime problem (eikonal solver).
Another useful tool to compute the gradient for certain problems is automatic differentiation (e.g., Griewank & Walther, 2008, Sambridge et al., 2007), a computational technique where derivatives of a user-coded function are provided automatically by the software in the form of a function.This technique can be convenient for problems where it is difficult to derive the adjoint equations (e.g., when the forward operator is not self-adjoint) or where the forward model needs to be adapted for each specific case because it depends, e.g., on the specific rock types present in the area under study, requiring a re-derivation of the analytical derivatives (such as rock physics models (Mavko et al., 2003)).We use this tool, e.g., for the problem of inversion of amplitude-versus-angle (AVA) seismic reflection data, where the forward modelling is a combination of a rock physics model (e.g., Mavko et al., 2003) and a convolutional seismic model.

Prior information
Prior information plays an important role in solving inverse problems by providing additional information directly on the model parameters to better constrain plausible values for the solution and helping to mitigate the non-uniqueness (e.g., Curtis & Lomax, 2001a, Hansen et al., 2012, 2016, Scales & Tenorio, 2001, Zunino et al., 2015).In the probabilistic approach, prior information is represented by a PDF ρ(m) on the model parameters.
HMCLab provides a set of common PDFs for the prior, ranging from simple multivariate Gaussian distributions to more complex distributions such as a combination of Beta PDF-based marginals with a Gaussian copula to correlate the marginals.Another interesting prior is based on the Laplace distribution (related to the L1-norm) which promotes sparse (or blocky) models.Moreover, the user can provide his/her own prior by simply implementing functions with the appropriate signature (see the code documentation for more details).Any of the available priors can be combined with any of the available or user-generated forward models.

Inferring complex information about the subsurface with HMC-Lab
The HMCLab framework allows us to solve diverse inverse problems using sampling methods under a common platform.The software package includes a set of pre-defined geophysical forward and inverse problems, a set of prior distributions and allows the user to supply his/her own forward problem.In the following we show some examples of how to extract useful information about the subsurface for a set of selected geophysical inverse problems in the framework of the HMC method.
Once a collection of samples from the posterior distribution has been obtained, in order to calculate some arbitrary function φ(m) of m, we can use the following relationship: where N is the number of available samples and m i represents one of the posterior models.

2D full waveform acoustic inversion of reflection data
The first example is a 2D inversion of a seismic dataset based on the acoustic approximation.The forward problem is represented by the constant-density acoustic wave equation: where t is time, x and z the spatial coordinates, u is the pressure field, v the acoustic velocity and s the source term.Forward calculations are carried out using a finite-difference scheme (Bunks et al., 1995, Pasalic & McGarry, 2010), where the model parameters are velocity at a set of grid points with size (N x × N z ) = (160 × 90) for the x-and z-direction respectively, for a total of 14400 model parameters.The grid spacing is 10 m in both directions.We assume the observational errors to be Gaussian distributed.Therefore, we use an L 2 -norm potential energy function.The gradient of such a misfit function with respect to velocity is computed by means of the adjoint method for the acoustic wave equation, as described in Bunks et al. (1995).The use of the adjoint method enables us to efficiently evaluate the gradient, an essential prerequisite for being able to perform an HMC inversion.The geometry of the problem resembles the one typically found in exploration seismology, where active sources and receivers are located near the surface of the Earth, as shown in Fig. 1.The top boundary condition is a free surface, while the other sides are absorbing boundaries implemented as C-PML layers (Komatitsch & Martin, 2007).We use a set of 6 sources to generate synthetic data, add correlated Gaussian noise (standard deviation 0.05, correlation length 0.01 s) and use the result as the observed data to be inverted for the velocity model.To perform the inversion we use the NUTS algorithm (Hoffman & Gelman, 2014), part of HMCLab.We ran 2 × 10 5 iterations of the NUTS algorithm, collecting about 45000 samples after thinning the chain and removing the models resulting from the burn-in phase.The starting velocity model is laterally homogeneous (see Fig. 1b).The target model is a modified version of the SEG/EAGE overthrust model (Aminzadeh & Brac, 1997).Fig. 1c shows a randomly chosen model from the collection of the posterior models.The model resembles the target model well, and all the different layers are visible.The potential energy decreases rapidly within the first few hundreds of iterations, when the algorithm attempts to find a model which fits the large-scale structures (see Fig. 1d).Subsequently, the misfit keeps decreasing relatively slowly for much longer.We suspect this is due to the algorithm slowly adjusting the fine-scale structures, until it reaches a relatively stable misfit value.From the resulting collection of posterior models we can extract different pieces of information.One practical example is, e.g., calculating the probability of having a layer boundary or fault at any given node of the grid.To do so, we exploit eq. 9 using an indicator function h(m) and compute which produces a value of one in case a boundary/edge is detected and zero if not.The function h(m) in this case is represented by a Canny edge detection filter (Canny, 1986), which is applied to each velocity model in the posterior collection, so that the model is transformed into a binary image of zeros and ones.Fig. 1e shows the results of such calculations.The large majority of the boundaries present in the target model appear as high probability structures in Fig. 1e, particularly at shallow depths.Finally, Fig. 1f shows a map of probability for a vertical profile of velocity at x = 1030.0m, showing the spread of the solutions.The profiles of the starting and target model are shown for comparison.

First arrival traveltime hypocenter location using fiber-optic sensing
An archetypal seismological inverse problem is the estimation of earthquake hypocenters in a medium with unknown structure and velocity using the first arrival times of the seicmic waves excited by the events.HMCLab supplies a simple approach to the hypocenter location problem that uses first arrival data, assuming a homogeneous medium.Arrival times are modelled by straight rays propagating through this homogeneous medium.The inverse problem only requires relative arrival times of a single phase.
Although the physics of this model seem simple, the strong trade-offs between location, origin time and medium velocity make it ideal for Bayesian inference methods.Especially in the presence of tradeoffs, one would expect strongly correlated posteriors, for which the HMC algorithm works particularly well.
To illustrate how HMC performs on this inverse problem, we applied the algorithm to a dataset acquired on Grimsvötn volcano in Iceland using a 12.5 km long Distributed Acoustic Sensing (DAS) fiber (Fichtner et al., 2022, Klaasen et al., 2022), for which the acquisition geometry is shown in Fig. 2. Due to the use of DAS, this dataset has approximately 1500 separate channels.We simultaneously infer the location of multiple events for which the effective medium velocity is assumed to be the same.The events are selected based on similarity in the observed move-out, as we expect these events to be relatively close.By simulatenously inferring the location of multiple events the data better constrains the medium velocity, which reduces the trade-off between origin time and medium velocity compared to inferring location, origin time and medium velocity for a single event.We define an L 2 misfit on the relative first arrivals of the picked phases, and only include those channels where the phase is picked.This means that some events have relatively less data points and therefore less importance within the inference.As prior, we use uniform distributions on location in a bounded cube of 20 km by 20 km by 10 km (width by length by depth) centered around the DAS cable.As the medium below the field site is unkown, we construct a prior on the P-wave velocity with a logarithmic uniform distribution (to take into account the fact that velocity is a positive parameter) between 340 and 7000 m/s, the extreme ends of possible medium material, i.e. air to relatively fast rock.
The results of a parallel tempered appraisal with 10 chains using HMC are given in Fig. 2. The posterior on medium velocity, seen in Fig. 2c, shows how, despite having a model with strong trade-offs in the parameters (namely origin time and medium P-wave velocity), one is still able to infer knowledge on one of these parameters, adding knowledge compared to the prior.Event 3, which was recorded on relatively few channels of the fiber-optic cable, features a high uncertainty of its location in the subsurface.This is in contrast to event 1 and 2, which seem to have a more concentrated volume of uncertainty.These results show that the posterior PDF of the location of these events is neither unimodal nor Gaussian, something which would have been difficult to deal with using deterministic methods.In general, one can see in Fig. 2b that events with fewer picks are constrained less.Examples of relevant indicator functions (Arnold & Curtis, 2018) for this inference would be the expected average depth of an event, or the probability of the medium velocity lying within a limited interval.

First arrival tomography based on the eikonal equation
Traveltime tomography is a popular approach for seismic inversion where the arrival time of seismic waves at given locations is used to infer the velocity structure of the subsurface.In this case, the forward model is represented by the eikonal equation: where t is the traveltime, c the velocity and d = 2 or 3 is the number of dimensions.Since the forward model is nonlinear, typically ray paths are computed a priori in a reference Earth model and fixed, to linearize the problem and hence solve the inverse problem with gradient-based deterministic methods.However, such strategy where a single solution is sought might miss some important information.Because of the nonlinearity of the problem, the misfit functional may feature multiple minima which cannot be detected with deterministic methods.On the contrary, a probabilistic approach may reveal different plausible velocity models to be consistent with the observed data, as we show in the following example.
The example we address here is to solve a 2D inverse problem with a geometry depicted in Fig. 3, where we have a velocity model described by 4800 cells (80 in the x-direction and 60 in the y-direction), a set of sources randomly distributed close to the bottom of the model and a set of receivers near the surface and along the left and right sides of the model.The grid spacing is 1.25 km in both directions.The forward problem is solved using a fast marching method (FMM) (e.g., Rawlinson & Sambridge, 2004, Sethian, 1996, Treister & Haber, 2016) which computes the traveltimes at each point of a grid using a finite-difference strategy.The gradient of the Gaussian misfit functional with respect to velocity is computed by means of the adjoint method (e.g., Leung & Qian, 2006, Taillandier et al., 2009, Zunino & Mosegaard, 2018).To solve the inverse problem, we ran 2 × 10 5 iterations with the NUTS algorithm.The target model is shown in Fig. 3a, while the starting model is depicted in Fig. 3b.The latter is a laterally homogeneous velocity model.Panels c and d of Fig. 3 show a randomly selected model from the posterior collection and a histogram of the velocity at a given location as obtained by looking at all samples after the burn-in period.Interestingly, the histogram shows a multi-modal distribution, where probable velocity values cluster near three different values.This means that there are three different ranges of velocity values which are highly probable, i.e., they are all compatible with the observed data.Such finding would not be possible with an optimization method where the solution is represented by a single velocity model.The potential energy (misfit) as a function of iterations (Fig. 3) features a sharp decrease in the first hundreds of iterations and then a slow descent until equilibrium is reached around 5 × 10 5 iterations.Fig. 3f shows a map of probability for a vertical profile at x = 27.5 km, together with the profile for the starting and target model.

Magnetic anomaly inversion with polygonal bodies
The last synthetic example presented is an inversion of magnetic anomaly data using a 2.75D parameterization in terms of polygonal bodies (e.g., Campbell, 1983, Rasmussen & Pedersen, 1979).The design and detailed description of the method to construct and solve this inverse problem is the subject of another paper (Zunino et al., 2022).In this setup, each polygonal body has a homogeneous magnetization and its shape is controlled by the position of its vertices, which, in this example, are the unknowns of the inverse problem.In this setup, the relation between the position of vertices (model parameters) and the magnetic response is nonlinear.The label 2.75D means that the polygonal bodies have a given finite lateral extent in the y direction that can be different in the +y and −y directions.The observed data are represented by a profile along the x direction with about 130 observation points (see Fig. 4a).To solve the inverse problem, we ran 5 × 10 4 iterations using the NUTS algorithm.The potential energy decreases rapidly in the first few hundreds of iterations (see Fig. 4b) when the polygonal bodies move   from the starting position to a more likely configuration, similar to the target model.Afterwards, the algorithm samples the posterior distribution, producing a set of posterior models.In this example the mass matrix contains non-zero off-diagonal elements in order to force the algorithm to avoid creating geologically implausible shapes.Such off-diagonal elements control the correlation of the momentum variables and hence indirectly the correlation shown in the models visited by the algorithm.A randomly selected model from the posterior collection is shown in Fig. 4a, which fits the observed data well and is also close to the shape of the target model.In the last panel of Fig. 4a a set of position of the vertices before and after the burn-in phase are depicted, showing the relatively low uncertainty in the positions for the anomaly.
4 HMCLab framework as a software package The HMCLab framework is practically implemented in a set of open-source software packages written in the Python (van Rossum, 1995) and Julia (Bezanson et al., 2017) programming languages.HMCLab is, in fact, a numerical laboratory to allow the user to experiment with sampling algorithms on different geophysical problems, ranging from purely educational examples to research-oriented studies.The aim is to provide a user-friendly framework where it is possible to experiment with various problems and algorithms and solve realistic inverse problems, either provided by HMCLab or created by the user.Table 1 summarizes the geophysical problems and prior models which are currently available in HMC-Lab.Both Julia and Python implementations are modular in that, forward modelling, gradient calculations and sampler (or optimizer) can be combined arbitrarily.Moreover, in addition to the listed problems, the sampler (or optimizer) can be used on user-defined problems.The main categories of inverse problems currently addressed by HMCLab are: • linearized (straight rays) and nonlinear (eikonal equation) traveltime tomography, • full waveform inversion in 2D in the acoustic and elastic formulations (P-SV), • earthquake source location in 3D based on the straight-ray approximation, • joint (or independent) gravity and magnetic anomaly inversion in 2.75D using polygonal bodies, • amplitude versus angle (AVA) seismic data including a rock physics model in 3D.
In addition, a set of priors is provided, which can be combined with any problem.For all the above mentioned geophysical problems, HMCLab provides functions to solve forward problems and to compute the gradient of misfit functions with respect to model parameters.For all these physics, samplers are available to appraise the inverse problem.However, the functions of these physics can be used independently of the sampler, hence the user can construct his/her own inversion scheme.Also, the user may supply his/her own forward model, prior and gradient calculation code and subsequently use the available inversion algorithms by providing a minimum set of functions with the appropriate signature as described in the documentation.
HMCLab is not limited to flavors of the HMC algorithm, but includes other more traditional algorithms such as the random walk Metropolis-Hastings algorithm (Hastings, 1970, Metropolis et al., 1953), where gradients are not used.Moreover, we provide an interface (Gebraad, 2022) to the Stein Variational Gradient Descent (SVGD, Q. Liu & Wang (2019)) variational inference algorithm, providing an alternative probabilistic appraisal algorithm to the included MCMC algorithms.As already mentioned, HMCLab includes functions to compute gradients of the misfit functional, and, as such, deterministic inversions are also possible (e.g., Zunino & Mosegaard, 2019).An example is basic gradient descent, where the modes (local minima) of the defined posterior distribution can be found deterministically.As such, HMCLab also facilitates the usage of Python (e.g., Virtanen et al., 2020) and Julia optimization libraries, including popular algorithms such as the (Limited Memory) Broyden-Fletcher-Goldfarb-Shanno and Newton Conjugate Gradient algorithms (Nocedal & Wright, 2006).
Noteworthy is the inclusion of several notebooks that illustrate the basic concepts of MCMC sampling in general, applied to HMCLab in particular.This ranges from investigating the basic properties of a Markov chain, such as number of proposals, stepsizes and resulting acceptance rates, to tuning the various included algorithms and even implementing one's own inverse problems.These notebooks are available to all users to run out-of-the-box in our supplied Docker environment, but are also available without a Python or Julia interpreter as plain HTML.HMCLab homepage can be found at https://hmclab.science,while the Julia and Python versions at https://gitlab.com/JuliaGeophand https://python.hmclab.science/index.html,respectively.Finally, HMCLab is constantly updated and expanded and contributions from the community are welcomed.

Conclusions
In this work we have shown how a common framework for probabilistic inversion can be utilized to solve a diverse range of geophysical problems.In particular, the HMC method can be used to solve a variety of nonlinear geophysical inverse problems.In contrast to other MCMC algorithms, HMC exploits the information derived from the gradient of the posterior pdf to drive the sampling towards regions of high probability, hence being able to traverse the model space more efficiently.This property is very beneficial, especially for problems which allow efficient computation of the gradients, e.g., when using the adjoint method, analytical derivatives or automatic differentiation.In this paper we have shown a set of example problems including full-waveform acoustic inversion, hypocenter location, traveltime tomography and magnetic anomaly inversion using polygonal bodies.The examples presented have been solved using the software implementation HMCLab, a numerical laboratory for better understanding and solving inverse problems in a probabilistic manner, written in the high-level languages Julia and Python.By providing a collection of models as the solution of the inverse problem, HMCLab enables the user to perform a statistical analysis and retrieve desired probabilistic information.HMCLab is in constant evolution, and hopefully will be augmented by contributions from interested users.In addition to the available forward problems and priors, we made it accessible to the user to construct their own inverse problem and easily apply the methods provided by HMCLab.Moreover, several types of prior information are available, allowing the user to adequately describe prior certainties and uncertainties.
1. Propose momenta p according to the Gaussian with mean 0 and covariance matrix M; 2. Compute the Hamiltonian H of model m with momenta p; 3. Propagate m and p for some time τ to m and p, using the discretized version of Hamilton's equations and a suitable numerical integrator; 4. Compute the Hamiltonian H of model m with momenta p; 5. Accept the proposed move m → m with probability p accept = min 1, exp(H − H) .(8) 6.If accepted, use (and count) m as the new state.Otherwise, keep (and count) the previous state.Then return to 1.

Figure 1 :
Figure 1: Acoustic waves inversion for velocity.a) Target model of velocity, i.e., the one used to calculate the synthetic data.b) The starting velocity model used in the inversion.c) A randomly selected model from the collection of posterior models.d) A plot of the potential energy (misfit) as a function of iteration number, where the red dot indicates the potential energy of the model shown in panel d).e) A map of the probability of having a layer boundary of a fault computed using the collection of posterior models.f) A vertical profile of velocity showing the probability as computed from the collection of posterior models.The profile location is shown by a vertical line in panel c).

Figure 2 :
Figure 2: Source hypocenter location for data recorded on the Grimsvötn volcano.a) Overview of Iceland and location of the inset b).b) Geometry of the DAS acquisition fiber with posterior means and standard deviation ellipses oriented along principal axes.c) Marginal distribution of medium velocity prior and posterior to the inference.Note that the prior on medium velocity extends beyond the range of the plot.d)-f) Marginal distributions for the Y and Z components of the first 3 events.Note how for event 1 and 2, the volumes of uncertainty are more concentrated than for event 3.

Figure 3 :
Figure 3: Nonlinear traveltime inversion.a) Target model used to generate the "observed" data, b) starting model, c) a selected model from the posterior collection, d) histogram of velocity at (x, y) = (92, 5, 26.25) (marked by a red dot in c)), e) potential energy, f) profile of velocity showing a probability map as computed from the posterior collection of models (the profile location is shown by a vertical line in panel c)).

Figure 4 :
Figure4: Magnetic anomaly problem.a) Three panels showing (from top to bottom): 1) the observed magnetic anomaly, the one calculated from the starting and selected models, 2) the starting, target and selected polygonal bodies including topography and position of measurements and 3) a scatter plot of the position of the vertices before and after the burn-in phase (every 10 iterations).b) A plot of the potential energy (log.scale) as a function of iteration number.

Table 1 :
Snapshot of currently available inverse problems and priors.HMCLab is constantly evolving, therefore changes are expected.1-2-3D refers to the physical dimensions of the problem, "parall."means a parallelized (multi-core) implementation is available.