Summary

Estimating the model parameters from measured data generally consists of minimizing an error functional. A classic technique to solve a minimization problem is to successively determine the minimum of a series of linearized problems. This formulation requires the Fréchet derivatives (the Jacobian matrix), which can be expensive to compute. If the minimization is viewed as a non-linear optimization problem, only the gradient of the error functional is needed. This gradient can be computed without the Fréchet derivatives. In the 1970s, the adjoint-state method was developed to efficiently compute the gradient. It is now a well-known method in the numerical community for computing the gradient of a functional with respect to the model parameters when this functional depends on those model parameters through state variables, which are solutions of the forward problem. However, this method is less well understood in the geophysical community. The goal of this paper is to review the adjoint-state method. The idea is to define some adjoint-state variables that are solutions of a linear system. The adjoint-state variables are independent of the model parameter perturbations and in a way gather the perturbations with respect to the state variables. The adjoint-state method is efficient because only one extra linear system needs to be solved.

Several applications are presented. When applied to the computation of the derivatives of the ray trajectories, the link with the propagator of the perturbed ray equation is established.

Introduction

One of the important tasks in data processing consists of determining model parameters from observed data. These tasks can be formulated as inverse problems, namely as the minimization of a functional, for instance the least-squares misfit between synthetic and observed data. In geophysics, this includes tomography, migration/inversion and automatic velocity analysis. When a local (descent) optimization technique, such as the conjugate gradient method, is used, the gradient of the functional is required, Gauthier et al. (1986), Liu & Bleistein (2001) and Mora (1989). The efficiency of the method greatly depends on the accuracy and efficiency of the computation of this gradient. With physical problems, the functional depends on so-called state variables. These state variables are the variables computed from the state equations, namely the equations that define the problem, sometimes called forward equations. For example, in the tomography problem, the state equations can be the ray equations, and the state variables the spatial coordinates and the slowness vectors describing the ray trajectories. The definition of the state variables depends on the mathematical formulation of the physical problem. The state equations depend on the model parameters. For the tomography problem this can be the background velocity (or slowness). The functional depends on those model parameters mainly through the dependency on the state variables.

The gradient of the functional, which depends on a set of state variables solutions of the forward equations, can be obtained with (a set of) the Fréchet derivatives of the state variables. The Fréchet derivatives are the derivatives of the state variables with respect to the model parameter. For instance, for the tomography problem described earlier, these derivatives are the derivatives of the spatial coordinates and the slowness vectors with respect to the slowness background. This gives the so-called Jacobian or sensitivity matrix. The Jacobian matrix can be used to linearize the functional, and the minimization problem can be solved by successively solving linearized problems using linear optimization techniques. However, the computation of the Fréchet derivatives can be expensive.

If non-linear optimization techniques, such as the non-linear conjugate method, are used, only the gradient of the functional may be needed, Gill et al. (1981). In the 1970s, a method based on the adjoint state has been introduced in the theory of inverse problems by Chavent (1974) to efficiently compute the gradient of a functional without the Fréchet derivatives. This approach originated from control theory, Lions (1972). Several authors in geophysics have applied this method, for instance, Lailly (1983), Bécache (1992), Chavent & Jacewitz (1995), Plessix et al. (1999) and Shen et al. (2003). The goal of this note is to review this method that is well known in the numerical community, to give a recipe for applying it based on an augmented functional also called associated Lagrangian, and to describe several examples demonstrating its practical use.

The adjoint-state method is a general method to compute the gradient of a functional that depends on a set of state variables, which are solutions of forward equations. The adjoint-state variables are the solutions of an adjoint linear system and can be seen as variables which gather a global measure of the perturbation of the problem with respect to the state variables. Numerically this approach is attractive because only one extra linear system needs to be solved and often the computation of the gradient with respect to the model parameters is equivalent to one or two evaluations of the forward modelling. This cost is often almost independent of the number of model parameters, which is not always the case when the Fréchet derivatives are computed. However the adjoint-state method does not provide the sensitivity of the solution to errors. For that, the Fréchet derivatives are needed or a Monte Carlo type of methods with a large number of forward computations is required.

The outline of the paper is the following. In a first section the adjoint-state variables are introduced from the perturbation theory. Then based on an augmented functional a recipe to systematically define the adjoint-state equations is described. In the three next sections, examples are given. The first one is the least-squares migration, Tarantola (1987). This is almost a school example. The functional is the least-squares misfit between the synthetics and the measured reflection seismic data. The adjoint states correspond to the backpropagated field and the gradient of this least-squares misfit is a migration operator, Lailly (1983) and Tarantola (1984). The second example is the computation of the gradient of the differential semblance optimization (DSO) functional, Symes & Carazzone (1991) and Shen et al. (2003). This example shows the power of the adjoint-state technique. The third example is the stereotomography, Billette & Lambaré (1998) and Lambaré et al. (2004). The link between the adjoint-state variables and the propagator of the differential equation defining the ray trajectory perturbations, Cervený (2001) and Farra & Madariaga (1987), is established.

Method

The goal of this section is to explain the adjoint-state method for computing the gradient of a functional, J(m), when J depends on u(m). J is defined with the functional, h, by

 
(1)
formula

The state variables, u, satisfy the state equations defined with the mapping, F,  

(2)
formula
F is also called the forward problem or forward equation. m is the model parameter and belongs to the model parameter space M. M is a real space in this article. u belongs to the state variable space, U. U is a real or complex space. A state variable, u, is a physical realization if F(u, m) = 0. F is a mapping from U × M to U. In order to distinguish between a physical realization and any element of U, the elements of U are denoted by ũ.h is a functional from U × M to R, the real space, whereas J is a functional from M to R. Synthetic data are generally a subset of the state variables. It is assumed that h, F, and J are at least continuously differentiable and u(m) is uniquely defined and continuously differentiable.

A simple example is the linear case graphic and graphic, with d the observed data. This corresponds to the simple least-squares misfit with a linear forward problem. The physical realization is defined by u(m) = Am, with A a linear operator, in the discrete case a rectangular matrix.

The adjoint-state method from the perturbation theory

A perturbation, δm, of the model parameter, m, induces a perturbation, δu, of the physical realization, u, and a perturbation δJ of the error functional, J. u+δu should be a physical realization with the model parameter m + δm. Therefore, to the first order:  

(3)
formula
Since F(u, m) = 0, the first order development gives:  
(4)
formula
For the linear case this gives δu = Aδm.

The first order development of J gives:  

(5)
formula
where 〈,〉U is the scalar product in U.

For the simple least-squares misfit, δJ = 〈ud, δuU.

Assuming that for any model parameter m of M there exists a unique solution u of U, u + δu is the unique solution of F(u+δu, m+δm) = 0. Therefore, at the first order, δu is the unique solution of eq. (4) and can be written with the inverse of graphic. This gives:  

(6)
formula
(* denotes the adjoint). For the linear example with the least-squares misfit, this gives δJ = 〈ud, Aδm〉 since graphic in this case; I is the identity operator.

In the second line of eq. (6), the terms that do not depend on the perturbation δm have been gathered, the idea is to avoid the computation of the Fréchet derivatives, graphic, because this can be expensive. Let us now define λ by  

(7)
formula
The perturbation δJ now reads:  
(8)
formula
For the linear example with the least-squares misfit this simply gives λ = ud and δJ =〈ud, AδmU.

λ belongs to the dual space of U. It is called the adjoint-state variable and eq. (7) is the adjoint-state equation. This is a system of linear equations. The linear operator is the adjoint of the operator formed by the derivatives of the state equations (the mapping F) with respect to the state variables. The right-hand side consists of the derivatives of the functional, h, with respect to the state variables. In a sense the adjoint states gather the information on the perturbations of the state variables, viewed as independent variables. The computation of graphic with eqs (7) and (8) is called the adjoint-state method.

The gradient of J can be computed either via the Fréchet derivatives of u with eqs (4) and (5) or via the adjoint-state method with eqs (7) and (8). The important differences between the two approaches are related to eqs (4) and (7). Indeed, on one hand, eq. (7) is independent of δm and needs to be solved only once. On the other hand, the right-hand side of eq. (4) depends on δm and this equation needs to be solved for each perturbation to obtain graphic, namely M times if M is the number of elements in m. The computational time of the adjoint-state method is often almost independent of M, because the time to compute eq. (8) is often negligible compared with the time to solve eq. (7). This makes this approach very efficient. eq. (7) depends on the adjoint of graphic evaluated at (u, m), this means that u should be completely known before solving it.

The adjoint-state equations can also be obtained with the use of an augmented functional, also called associated Lagrangian.

A recipe with the augmented functional

Let us define the augmented functional, graphic, from U×UM to R (U* is the dual of U) by:  

(9)
formula
where graphic is any element of U* and, therefore, does not depend on m, as ũ is any element of U.

u is a physical realization, therefore, F(u, m) = 0, and for any graphic 

(10)
formula
and since graphic is independent of m,  
(11)
formula
We can then choose λ in U* such that:  
(12)
formula
This equation is identical to eq. (7) and is the adjoint-state equation. With this choice we retrieve the result of eq. (8).  
(13)
formula

graphic can be also viewed as the Lagrangian associated with the minimization problem: find the minimum u of h(ũ,m) under the constraint F(u, m) = 0. The theory of optimization with equality constraints, Ciarlet (1989), tells us that u is the minimum, if (u, λ) is a saddle point of graphic.λ are called the Lagrange multipliers. At the saddle point the derivatives of graphic are equal to 0. the derivatives of graphic with respect to ũ and graphic are:

 
(14)
formula

Therefore, graphic gives the state equations and graphic0 gives the adjoint-state equations. And graphic as seen previously. Notice again that when deriving graphic with respect to m,ũ and graphic are independent of m.

This link with the optimization theory is not needed to apply the adjoint-state method. For those familiar with this theory, it helps to recall the method. As in the optimization theory with equality constraints where one scalar Lagrange multiplier is associated with each scalar equation defining the constraints, one scalar adjoint state is associated with each scalar equation defining the mapping F in the augmented functional.

The computation of the gradient with the adjoint states can be summarized in the following recipe when u has been found from F(u, m) = 0:

  • (i)

    Build the augmented functional (associated Lagrangian) graphic. graphic, a functional of independent variables ũgraphic, and m is defined by  

    (15)
    formula
    if graphic is composed of N scalar equations, graphic is a vector with N components, since at each scalar equation of F an adjoint state is associated, and graphic is defined by:  
    (16)
    formula
    For the linear case with the least-squares misfit graphicgraphic.

  • (ii)

    Define the adjoint-state equations. The adjoint-state equations are simply defined by graphic, where the derivatives are evaluated at the point (u, λ). This gives  

    (17)
    formula
    or  
    (18)
    formula
    The solution of this system determines the adjoint state, λ. For the linear case with the least-squares misfit λ = ud.

  • (iii)

    Computation of the gradient of J. The gradient of J consists of the derivatives of graphic with respect to m:  

    (19)
    formula
    or  
    (20)
    formula
    To compute the derivative of the augmented functional, we recall that ũ and graphic are independent of m.

    For the linear case with the least-squares misfit graphicgraphic.

If u has complex values, since J(m) is a real, the real part should be taken in the right-hand side term of eqs (19) and (20).

The linear example with the least-squares misfit is a trivial example. In the next sections more complicated examples are described.

Least-Squares Migration

In this section, we formulate the migration as an inverse problem. The problem consists of minimizing with respect to the square of the slowness, the least-squares misfit between the synthetics, obtained by solving the wave equation, and the recorded (observed) reflection seismic data. The minimization of J should give the exact slowness. Unfortunately, in practice, J has many local minima, and a gradient optimization will only provide the best perturbation of the initial model inside a certain basin of attraction. This basin is generally not the basin of the global minimum, Gauthier et al. (1986). This is the reason why this problem is called the least-squares migration problem in this paper.

This example is a good example to understand how the adjoint-state method can be applied. It also allows us to redemonstrate that the gradient of J is a migration. This was discovered in the 1980s, Lailly (1983) and Tarantola (1984).

We will develop the computation for multiple sources and multiple receivers, first in the frequency domain because it is simple, then in the time domain.

Frequency domain

In frequency domain, the wave equation operator reads: L = – ω2σ2−Δ, with σ the slowness. Note that the dependency on the spatial coordinates, x, is not written. The finite-difference discretization of Lus = fs with given boundary conditions leads to a complex linear system Marfurt (1984):  

(21)
formula
A is a complex matrix of size n by n, where n is the total number of discretization points of the earth model. fs, a complex vector of n elements, represents the source function at the source point s. us, a complex vector of n elements, corresponds to the pressure field due to the shot at s. The model parameter, m, is a vector of M elements, and represents the values of the squared slowness at the discretization points.

The least-squares functional is  

(22)
formula
ds,r are the data recorded at the receiver position r due to the source fs. Ss,r is the restriction matrix onto the receiver r of the shot s.

The augmented functional reads, with graphic and graphicgraphic (the dependence on x is not written, but graphic and graphic depend on the space variables x):  

(23)
formula
where 〈,〉x is the scalar product in Cn. As graphic are complex vectors of n elements, since the forward system, eq. (21), contains n scalar equations.

The derivative of graphic with respect to graphic evaluated at graphic gives:  

(24)
formula
The adjoint state is defined by graphic:  
(25)
formula
There is one adjoint system per shot and per angular frequency.

The matrix A propagates the shot into the earth and us is the incident field originating at s. The adjoint of A propagates backward its source term, Lailly (1983). The source term, the right-hand side of eq. (25), is the sum over the receivers of the shot s of the residual between the synthetics and data. λs is then the backpropagation of the residual field.

The gradient of J is:  

(26)
formula
The gradient of J is a vector of M elements. If we impose that m is discretized on the same grid as us, M = n. Outside the boundary points graphic is equal to −ω2. At the discretization point x, we obtain  
(27)
formula
Up to a multiplication factor, the gradient is similar to a migrated image and the formula is kinematically similar to the imaging principle, Clearbout (1985). A demonstration of this result without the adjoint-state method can be found in Plessix & Mulder (2004).

Time domain

We here develop the same approach but in time domain. The application of the adjoint-state method is slightly more complicated because of the initial boundary conditions.

The wave operator is graphic. With the initial boundary conditions, the pressure field us due to the source fs satisfies:  

(28)
formula
us and fs depend on the time and on the spatial coordinates.

The least-squares functional reads  

(29)
formula
T is the recording time. Ss,r is the restriction operator onto the receiver position, it depends on the spatial coordinates. The model parameter is the squared slowness, m = σ2.

In the time domain us is real. For simplicity, we don't mention the spatial boundary conditions.

We associate the adjoint states graphic and graphic with the initial boundary conditions, and graphic with the wave equation. The augmented functional is defined by:  

(30)
formula
with graphic the real scalar product in the coordinate space.

After two integrations by part:  

(31)
formula
With eqs(30) and (31) we can now compute the derivatives with respect to graphic and evaluate them at (u, λ) to obtain the adjoint-state equations:  
(32)
formula
(T denotes the transpose.)

The gradient of J at the point x is  

(33)
formula
The adjoint states, µ0s and µ1s do not play a role in the gradient of J. We can ignore them.

The system (32) has final boundary conditions. To solve it the computation is done backwards from T to 0. To give a physical sense to the adjoint state and to interpret the integral, eq. (33), a new adjoint state, qs, is defined by a change of variables in the time axis:  

(34)
formula
The new adjoint-state system reads  
(35)
formula
qs satisfies the same wave equation that us but with a different source term. Ss,rus(Tt) – ds,r(Tt) is the residual, the difference between the synthetics and the recorded data, in reverse time. eq. (35) propagates the residual into the earth starting from the final time. qs is called the backpropagated field of the residual. The gradient of J now reads:  
(36)
formula

This result has been demonstrated by Lailly (1983).

Shot-based Differential Semblance Optimization

As explained in the introduction of the previous section, the least-squares formulation is not satisfactory to retrieve the long wavelength components of the velocity model (background) from reflection seismic data, because the least-squares misfit as a function of the background has many local minima. To reformulate the problem and obtain a larger basin of attraction for the global minimum, an idea is to exploit the fact that in the reflection seismic data the earth is seen through different angles of incident. If the background velocity is correct, the pre-stack migration of the data should give the same images, Al Yahya (1989). If the pre-stack migration gives different earth structures, this means that the background slowness used in the migration is erroneous. Several mathematical formulations of this idea have been proposed in the last 20 years, among them Chavent & Jacewitz (1995), Clément (2001), Plessix et al. (2000) and Symes & Carazzone (1991). In order to compute the gradient of the reformulated cost functions, the authors generally use the adjoint-state formulation because it is the most systematic method, without forgetting that they are mainly mathematicians.

As an example, I will describe the gradient computation of the DSO functional introduced in Symes & Carazzone (1991) for a common shot-based approach. The principle is to migrate each shot individually and then to differentiate the pre-stack migrated result with respect to the shot position for fixed points in the migrated images. If the derivative with respect to the shot position of the pre-stack migrated data is zero, it means that the migrated images are independent of the shot position, that is, of the angle of incident and that the background is correct. To obtain a global formulation, the DSO functional is used as a regularization of the least-squares functional.

Using a finite-difference scheme in frequency-domain, eq. (21), the incident wavefield, ui, due to the source function fi located at the shot position i satisfies:  

(37)
formula
The dependency on the spatial coordinates is not explicitly written to simplify the notation. The synthetics at the angular frequency, ω, are Si,jui, with Si,j the restriction operator onto the receiver, j, of the shot, i.

To compute the migration, we introduce the backpropagated field, vi, defined by:  

(38)
formula
with di,j the measured seismic data due to the shot i recorded at the receiver j.

The shot migrated image, ri, is then defined by:  

(39)
formula
Here we abuse the notation and eq. (39) means graphicgraphic, where x is a discretization point.

The functional J reads  

(40)
formula
ns is the number of shots. α1 and α2 are the weights of the least-squares functional (the first term) and the DSO functional (the second term). The real part of ri+1ri is taken in the DSO functional because ri is a complex number and only the real part corresponds to the migrated image.

We recall that m is the squared slowness at the discretization point and all the state variables are differentiable functions with respect to m. The goal is to compute the gradient of J with respect to the squared slowness.

The forward equations, eqs (37), (38) and (39), depend on the state variables ui, vi and ri. ui, vi, ri are discretized on the same grid, therefore, ui, vi, ri belong to Cn, with n the number of discretization points. To define the augmented functional, we associate, for each shot graphic with eq. (37), graphic with eq. (38) and graphic with eq. (39). graphic and graphic belong to Cn because eqs (37) or (38) or (39) define n scalar equations. These quantities depend on ω and x. The augmented functional reads with the state variables, graphic, and the adjoint-state variables, graphic:  

(41)
formula
The adjoint states λu, λv and λr are obtained by taking the derivatives of graphic with respect to graphic equal to zero at the point (u, v, r, λu, λv, λr, m):  
(42)
formula
The gradient of J is obtained by  
(43)
formula
since graphic outside the boundary points. Notice that 〈λui(ω), ui(ω)〉x is a vector, the scalar product in x is taken per component i, since we should have written λui(ω, x) and ui(ω, x).

This application shows the numerical interest of the adjoint-state method with the augmented functional. Indeed, a systematic use of the method automatically produces the result. Using the perturbation as described in the first section can lead to the same result, but its application is a bit more difficult and cumbersome, as shown in the Appendix. The physical interpretation of the adjoint states is difficult to find. We can notice that λr is just the perturbation with respect to r of the DSO functional, λu satisfies the adjoint wave equation and λv the wave equation.

Stereotomography

The last example describes an application based on the ray equations. The functional depends on the traveltimes and on the other ray-based parameters. The derivatives of the traveltimes with respect to the velocity are efficiently computed by integrating the slowness perturbations along the ray. There is no real gain to introduce the adjoint states when only the derivatives of the traveltimes with respect to the velocity parameters are required, because the adjoint-state method does not give a more efficient algorithm. The derivatives of the ray trajectories can be evaluated from the paraxial ray equations and the propagator associated with this linear differential system, Cervený (2001) and Farra & Madariaga (1987). When the functional depends on the ray trajectories, the adjoint-state method provides a faster approach to the computation of the gradient of the functional. This case is illustrated with the stereotomography functional.

The purpose of the stereotomography, as described in Billette & Lambaré (1998), is to retrieve the velocity background not only from traveltimes picked on the seismic data but also from slopes of the locally coherent events in the common source and common receiver gathers. This approach differs from the classic traveltime tomography because the picks are interpreted independently from each other without any association to a given interface and they can represent either reflection or refraction events. The (observed) data are a set of source positions, xs, receiver positions, xr, two-way traveltimes, Tmsr, the slopes, ps, at the source locations, and the slopes, pr, at the receiver locations.

Following Billette & Lambaré (1998), the model parameters are xd the subsurface reflection points, θs and θr the take-off angles of the rays going towards the source, xs, and towards the receiver, xr, Ts and Tr the traveltimes along the rays from xd towards the source and the receiver and the parameters (vk) defining the continuous velocity field, v. The integration parameter along the ray is the time, t. The ray equations are:  

(44)
formula
With graphic the ray equations becomes  
(45)
formula
ya is computed from t = 0 to t = Ta. The subscript a represents s or r. y0 is the function defining the initial conditions. The error functional is  
(46)
formula
with graphic. In eq. (46), cT is a scalar coefficient and Cs and Cr are two diagonal matrices.

Gradient with the adjoint-state method

After the integration by part of the terms graphic, the augmented functional reads  

(47)
formula
graphic is a functional of the state variables, graphic, the adjoint-state variables, graphic, and the model parameters, Ts, Tr, θs, θr, xd and (vk). The derivatives with respect to the state variables, graphic, give the adjoint-state equations:  
(48)
formula
with graphic.

The derivatives of the augmented functional with respect to the model parameters correspond to the derivatives of Jsr with respect to the model parameters. This yields  

(49)
formula
For the computation of the derivatives with respect to Ta, we have used the fact that the ray equations are satisfied at Ta.

Gradient with the Fréchet derivatives

A more traditional approach to compute the gradient is to first determine the Fréchet derivatives. In this case, this means the derivatives of Ts, Tr, ys(Ts), and yr(Tr) with respect to Ts, Tr, xd, θs, θr, and (vk). A usual approach to compute those derivatives is to use the paraxial ray equations Cervený (2001) and Farra & Madariaga (1987). The perturbation δya of the rays ya is obtained from the propagator Pa defined by  

(50)
formula
This gives, Billette & Lambaré (1998) 
(51)
formula
and the Fréchet derivatives are  
(52)
formula
The other Fréchet derivatives are equal to 0.

The derivatives of Jsr are then simply  

(53)
formula
with ν equals to vk, xd, θs, or θr and  
(54)
formula

Relation between adjoint states and propagator

From eqs (49), (52) and (53), we deduce that  

(55)
formula
In fact, the propagator PTa satisfies  
(56)
formula
PTa is the propagator of the adjoint-state differential equation with a final condition, eq. (48). The adjoint state, λa, is then equal to (Ca is a diagonal matrix):  
(57)
formula
The main difference between the two approaches lies in the adjoint-state system (eq. 48) and the propagator system (eq. 50). Whereas the first one is a vectorial system, the second is a matrix system. This means that the adjoint-state system is d times smaller than the propagator system, with d = 4 in 2-D problems and d = 6 in 3-D problems. The adjoint-state method is then roughly d times faster. However, the adjoint-state method does not provide the Jacobian matrix, but only the gradient of J and the Jacobian may be used to determine the sensitivity of the solution to errors. The minimization should rely on non-linear optimization techniques, such as non-linear conjugate or quasi-Newton methods, Gill et al. (1981). In Billette & Lambaré (1998), the authors solve the non-linear optimization problem, by successively solving the linear problems defined by the Jacobian matrices.

Conclusion

The adjoint-state method for the gradient computation of a functional has been reviewed. The technique applies when the functional depends on the model parameters through a set of state variables, solutions of forward equations. The method consists of the computation of one unique extra linear system. The linear operator is formed with the adjoint of the operator defined by the derivatives of the forward model with respect to the state variables and the second member consists of the derivatives of the functional with respect to the state variables. The adjoint-state variables are the solution of this linear system. In a sense, they gather the information of the perturbations with respect to the state variables, assuming that the state variables are independent variables. Since this linear system is independent of the derivatives with respect to the model parameters, the adjoint states have to be computed only once, making the method numerically very efficient. The gradient of the functional with respect to the model parameters is now simply the scalar product between the adjoint states and the derivatives of the forward model with respect to the model parameters.

To form this extra linear system (the adjoint-state system) a recipe based on an augmented functional has been reviewed. This provides a systematic approach. The main step in the definition of this augmented functional is to consider the state variables and the adjoint-state variables as independent variables.

Several examples have been described to show the power of this approach. For the complicated example with the DSO functional, the adjoint-state equations have been derived from perturbation theory. This shows that the use of the augmented functional is not strictly necessary but simplifies the approach.

When the forward equations are the ray equations, the transpose of the propagator of the perturbed ray equations is the propagator of the adjoint equations. The adjoint-state method is computationally more efficient because the adjoint-state system is a vectorial system whereas the system of the propagator is a matrix system. Nevertheless, the adjoint-state method only gives the gradient, and not the Fréchet derivatives. The optimization problem should be solved with a non-linear optimization method, such a quasi-Newton or non-linear conjugate gradient technique.

Acknowledgment

I would like to thank Gilles Lambaré from Ecole des Mines de Paris and Colin Perkins from Shell for their comments on the manuscript.

Appendices

Appendix: Dso Gradient with Perturbation Approach

In this appendix, we retrieve the gradient of the DSO function directly from the perturbation approach without the help of the augmented functional. If the model, m, is perturbed by δm, the wavefields, ui, are perturbed by δui, the backpropagated wavefields, vi, are perturbed by δvi, the migrated images, ri, are perturbed by δri, and the functional J by δJ. From eq. (40) we obtain  

(A1)
formula
(ns is the number of shots, i is the shot index and 〈,〉 is the scalar product in the data space, 〈,〉x is the scalar product in spatial coordinate space.)

The perturbations of the state variable equations, eqs (37), (38) and (39), gives  

(A2)
formula
The dependency on the spatial coordinates and the angular frequency are not written.

The complicated part consists in defining the adjoint-state equations. However with some experience, this is possible. For this example, we first rewrite the second term of eq. (A1) 

(A3)
formula
with r0 = r1 and graphic. And we define  
(A4)
formula
Replacing δri by its value gives:  
(A5)
formula
We then gather the terms depending on δui and the terms depending on δvi:  
(A6)
formula
with Re(〈−λriu*i, δv*ix) = Re(〈−uiλr*i, δvix) and eij = S*ij (Sijuidij).

By replacing δvi by its values in the second term of eq. (A6) we obtain  

(A7)
formula
By replacing δui by its value in the first term of eq. (A6) we obtain  
(A8)
formula
We can now define  
(A9)
formula
The equations eqs (A4) and (A9) are the adjoint-state equations. They are the same that the ones obtained with the use of the augmented functional in the section on the DSO function, however the derivation is less obvious and systematic in this appendix.

The perturbation δJ now reads:  

(A10)
formula
And with graphic, we obtain:  
(A11)
formula

References

1
Al Yahya
K.M.
1989
Velocity analysis by iterative profile migration
Geophysics
 
54
718
729
2
Becache
E.
1992
Reflection tomography: how to cope with multiple arrivals? Part II: a new gradient computation method: continuous and discrete problem
in
PSI Consortium Annual Report
 
3
Billette
F.
Lambaré
G.
1998
Velocity macro-model estimation from seismic reflection data by stereotomography
Geophys. J. Int.
 
135
671
690
4
Cervený
V.
2001
Seismic Ray Method
 
Cambridge Univ. Press
New York.
5
Chavent
G.
1974
Identification of function parameters in partial differential equations
in
Identification of parameter distributed systems
 eds
Goodson
R.E.
Polis
New-York
ASME 1974
6
Chavent
G.
Jacewitz
C.A.
1995
Determination of background velocities by multiple migration fitting
Geophysics
 
60
476
490
7
Ciarlet
P.G.
1989
Introduction to Numerical Linear Algebra and Optimisation
 
Cambridge University Press
New York
8
Clearbout
J.F.
1985
Imaging the Earth's Interior
 
Blackwell Scientific Publications
London
9
Clément
F.
Chavent
G.
Gomez
S.
2001
Migration-based traveltime inversion of 2-D simple structures: a synthetic example
Geophysics
 
66
845
860
10
Farra
V.
Madariaga
R.
1987
Seismic waveform modeling in heterogeneous media by ray perturbation theory
J. geophys. Res.
 
92
2697
2712
11
Gauthier
O.
Virieux
J.
Tarantola
A.
1986
Two-dimensional nonlinear inversion of seismic waveforms: numerical results
geophysics
 
51
1387
1403
12
Gill
P.E.
Murray
W.
Wright
M.H.
1981
Practical Optimization
 
Academic Press ltd
NewYork
13
Lailly
P.
1983
The seismic inverse problem as a sequence of before stack migration
in
Proc. of Conf. on Inverse Scattering, Theory and Applications
 
SIAM
Philadelphia, Pennsylvania
14
Lambaré
G.
Alerini
M.
Baina
R.
Podvin
P.
2004
Stereotomography: a semi-automatic approach for velocity macromodel estimation
Geophys. Prospect.
 
52
671
682
15
Lions
J.
1972
Nonhomogeneous boundary value problems and applications
 
Springer Verlag
Berlin
16
Liu
Z.
Bleistein
N.
2001
Migration velocity analysis: theory and iterative algorithm
Geophysics
 
60
142
153
17
Marfurt
K.J.
1984
Accuracy of finite-difference and finite-element modeling of the scalar and elastic wave equations
Geophysics
 
49
533
549
18
Mora
P.
1989
Inversion = migration + tomography
Geophysics
 
54
1575
1586
19
Mulder
W.A.
ten Kroode
A.P.E.
2002
Automatic velocity analysis by differential semblance optimization
Geophysics
 
67
1184
1191
20
Plessix
R. E.
Mulder
W.A.
2004
Frequency-domain finite-difference amplitude-preserving migration
Geophys. J. Int.
 
157
975
987
21
Plessix
R. E.
de Roech
Y. H.
Guy
Chavent
1999
Waveform inversion of reflection seismic data for kinematic parameters by local optimization
SIAM J. on Scientific Computing
 
20
(
3
),
1033
1052
22
Plessix
R. E.
Mulder
W.A.
ten Kroode
A.P.E.
2000
Automatic cross-well tomography by semblance and differential semblance optimization: theory and gradient computation
Geophys. Prospect.
 
48
913
935
23
Shen
P.
Symes
W.W.
Stolk
C.
2003
Differential semblance velocity analysis by wave-equation migration
in
Proc. of the 73th SEG int'l mtg
 
Dallas
Expanded Abstract
24
Symes
W.W.
Carazzone
J.J.
1991
Velocity inversion by differential semblance optimisation
Geophysics
 
56
654
663
25
Tarantola
A.
1984
Linearized inversion of seismic reflection data
Geophys. Prospect.
 
32
998
1015
26
Tarantola
A.
1987
Inverse Problem theory
 
Elsevier
Amsterdam
27
Vinje
V.
Iversen
E.
Gjøystdal
H.
1993
Traveltime and amplitude estimation using wave front construction
Geophysics
 
58
1157
1166