Summary

Estimating the model parameters from measured data generally consists of minimizing an error functional. A classic technique to solve a minimization problem is to successively determine the minimum of a series of linearized problems. This formulation requires the Fréchet derivatives (the Jacobian matrix), which can be expensive to compute. If the minimization is viewed as a non-linear optimization problem, only the gradient of the error functional is needed. This gradient can be computed without the Fréchet derivatives. In the 1970s, the adjoint-state method was developed to efficiently compute the gradient. It is now a well-known method in the numerical community for computing the gradient of a functional with respect to the model parameters when this functional depends on those model parameters through state variables, which are solutions of the forward problem. However, this method is less well understood in the geophysical community. The goal of this paper is to review the adjoint-state method. The idea is to define some adjoint-state variables that are solutions of a linear system. The adjoint-state variables are independent of the model parameter perturbations and in a way gather the perturbations with respect to the state variables. The adjoint-state method is efficient because only one extra linear system needs to be solved.

Several applications are presented. When applied to the computation of the derivatives of the ray trajectories, the link with the propagator of the perturbed ray equation is established.

Introduction

One of the important tasks in data processing consists of determining model parameters from observed data. These tasks can be formulated as inverse problems, namely as the minimization of a functional, for instance the least-squares misfit between synthetic and observed data. In geophysics, this includes tomography, migration/inversion and automatic velocity analysis. When a local (descent) optimization technique, such as the conjugate gradient method, is used, the gradient of the functional is required, Gauthier et al. (1986), Liu & Bleistein (2001) and Mora (1989). The efficiency of the method greatly depends on the accuracy and efficiency of the computation of this gradient. With physical problems, the functional depends on so-called state variables. These state variables are the variables computed from the state equations, namely the equations that define the problem, sometimes called forward equations. For example, in the tomography problem, the state equations can be the ray equations, and the state variables the spatial coordinates and the slowness vectors describing the ray trajectories. The definition of the state variables depends on the mathematical formulation of the physical problem. The state equations depend on the model parameters. For the tomography problem this can be the background velocity (or slowness). The functional depends on those model parameters mainly through the dependency on the state variables.

The gradient of the functional, which depends on a set of state variables solutions of the forward equations, can be obtained with (a set of) the Fréchet derivatives of the state variables. The Fréchet derivatives are the derivatives of the state variables with respect to the model parameter. For instance, for the tomography problem described earlier, these derivatives are the derivatives of the spatial coordinates and the slowness vectors with respect to the slowness background. This gives the so-called Jacobian or sensitivity matrix. The Jacobian matrix can be used to linearize the functional, and the minimization problem can be solved by successively solving linearized problems using linear optimization techniques. However, the computation of the Fréchet derivatives can be expensive.

If non-linear optimization techniques, such as the non-linear conjugate method, are used, only the gradient of the functional may be needed, Gill et al. (1981). In the 1970s, a method based on the adjoint state has been introduced in the theory of inverse problems by Chavent (1974) to efficiently compute the gradient of a functional without the Fréchet derivatives. This approach originated from control theory, Lions (1972). Several authors in geophysics have applied this method, for instance, Lailly (1983), Bécache (1992), Chavent & Jacewitz (1995), Plessix et al. (1999) and Shen et al. (2003). The goal of this note is to review this method that is well known in the numerical community, to give a recipe for applying it based on an augmented functional also called associated Lagrangian, and to describe several examples demonstrating its practical use.

The adjoint-state method is a general method to compute the gradient of a functional that depends on a set of state variables, which are solutions of forward equations. The adjoint-state variables are the solutions of an adjoint linear system and can be seen as variables which gather a global measure of the perturbation of the problem with respect to the state variables. Numerically this approach is attractive because only one extra linear system needs to be solved and often the computation of the gradient with respect to the model parameters is equivalent to one or two evaluations of the forward modelling. This cost is often almost independent of the number of model parameters, which is not always the case when the Fréchet derivatives are computed. However the adjoint-state method does not provide the sensitivity of the solution to errors. For that, the Fréchet derivatives are needed or a Monte Carlo type of methods with a large number of forward computations is required.

The outline of the paper is the following. In a first section the adjoint-state variables are introduced from the perturbation theory. Then based on an augmented functional a recipe to systematically define the adjoint-state equations is described. In the three next sections, examples are given. The first one is the least-squares migration, Tarantola (1987). This is almost a school example. The functional is the least-squares misfit between the synthetics and the measured reflection seismic data. The adjoint states correspond to the backpropagated field and the gradient of this least-squares misfit is a migration operator, Lailly (1983) and Tarantola (1984). The second example is the computation of the gradient of the differential semblance optimization (DSO) functional, Symes & Carazzone (1991) and Shen et al. (2003). This example shows the power of the adjoint-state technique. The third example is the stereotomography, Billette & Lambaré (1998) and Lambaré et al. (2004). The link between the adjoint-state variables and the propagator of the differential equation defining the ray trajectory perturbations, Cervený (2001) and Farra & Madariaga (1987), is established.

Method

The goal of this section is to explain the adjoint-state method for computing the gradient of a functional, J(m), when J depends on u(m). J is defined with the functional, h, by

(1)

The state variables, u, satisfy the state equations defined with the mapping, F,

(2)
F is also called the forward problem or forward equation. m is the model parameter and belongs to the model parameter space M. M is a real space in this article. u belongs to the state variable space, U. U is a real or complex space. A state variable, u, is a physical realization if F(u, m) = 0. F is a mapping from U × M to U. In order to distinguish between a physical realization and any element of U, the elements of U are denoted by ũ.h is a functional from U × M to R, the real space, whereas J is a functional from M to R. Synthetic data are generally a subset of the state variables. It is assumed that h, F, and J are at least continuously differentiable and u(m) is uniquely defined and continuously differentiable.

A simple example is the linear case and , with d the observed data. This corresponds to the simple least-squares misfit with a linear forward problem. The physical realization is defined by u(m) = Am, with A a linear operator, in the discrete case a rectangular matrix.

The adjoint-state method from the perturbation theory

A perturbation, δm, of the model parameter, m, induces a perturbation, δu, of the physical realization, u, and a perturbation δJ of the error functional, J. u+δu should be a physical realization with the model parameter m + δm. Therefore, to the first order:

(3)
Since F(u, m) = 0, the first order development gives:
(4)
For the linear case this gives δu = Aδm.

The first order development of J gives:

(5)
where 〈,〉U is the scalar product in U.

For the simple least-squares misfit, δJ = 〈ud, δuU.

Assuming that for any model parameter m of M there exists a unique solution u of U, u + δu is the unique solution of F(u+δu, m+δm) = 0. Therefore, at the first order, δu is the unique solution of eq. (4) and can be written with the inverse of . This gives:

(6)
(* denotes the adjoint). For the linear example with the least-squares misfit, this gives δJ = 〈ud, Aδm〉 since in this case; I is the identity operator.

In the second line of eq. (6), the terms that do not depend on the perturbation δm have been gathered, the idea is to avoid the computation of the Fréchet derivatives, , because this can be expensive. Let us now define λ by

(7)
(8)
For the linear example with the least-squares misfit this simply gives λ = ud and δJ =〈ud, AδmU.

λ belongs to the dual space of U. It is called the adjoint-state variable and eq. (7) is the adjoint-state equation. This is a system of linear equations. The linear operator is the adjoint of the operator formed by the derivatives of the state equations (the mapping F) with respect to the state variables. The right-hand side consists of the derivatives of the functional, h, with respect to the state variables. In a sense the adjoint states gather the information on the perturbations of the state variables, viewed as independent variables. The computation of with eqs (7) and (8) is called the adjoint-state method.

The gradient of J can be computed either via the Fréchet derivatives of u with eqs (4) and (5) or via the adjoint-state method with eqs (7) and (8). The important differences between the two approaches are related to eqs (4) and (7). Indeed, on one hand, eq. (7) is independent of δm and needs to be solved only once. On the other hand, the right-hand side of eq. (4) depends on δm and this equation needs to be solved for each perturbation to obtain , namely M times if M is the number of elements in m. The computational time of the adjoint-state method is often almost independent of M, because the time to compute eq. (8) is often negligible compared with the time to solve eq. (7). This makes this approach very efficient. eq. (7) depends on the adjoint of evaluated at (u, m), this means that u should be completely known before solving it.

The adjoint-state equations can also be obtained with the use of an augmented functional, also called associated Lagrangian.

A recipe with the augmented functional

Let us define the augmented functional, , from U×UM to R (U* is the dual of U) by:

(9)
where is any element of U* and, therefore, does not depend on m, as ũ is any element of U.

u is a physical realization, therefore, F(u, m) = 0, and for any

(10)
and since is independent of m,
(11)
We can then choose λ in U* such that:
(12)
This equation is identical to eq. (7) and is the adjoint-state equation. With this choice we retrieve the result of eq. (8).
(13)

can be also viewed as the Lagrangian associated with the minimization problem: find the minimum u of h(ũ,m) under the constraint F(u, m) = 0. The theory of optimization with equality constraints, Ciarlet (1989), tells us that u is the minimum, if (u, λ) is a saddle point of .λ are called the Lagrange multipliers. At the saddle point the derivatives of are equal to 0. the derivatives of with respect to ũ and are:

(14)

Therefore, gives the state equations and 0 gives the adjoint-state equations. And as seen previously. Notice again that when deriving with respect to m,ũ and are independent of m.

This link with the optimization theory is not needed to apply the adjoint-state method. For those familiar with this theory, it helps to recall the method. As in the optimization theory with equality constraints where one scalar Lagrange multiplier is associated with each scalar equation defining the constraints, one scalar adjoint state is associated with each scalar equation defining the mapping F in the augmented functional.

The computation of the gradient with the adjoint states can be summarized in the following recipe when u has been found from F(u, m) = 0:

• (i)

Build the augmented functional (associated Lagrangian) . , a functional of independent variables ũ, and m is defined by

(15)
if is composed of N scalar equations, is a vector with N components, since at each scalar equation of F an adjoint state is associated, and is defined by:
(16)
For the linear case with the least-squares misfit .

• (ii)

Define the adjoint-state equations. The adjoint-state equations are simply defined by , where the derivatives are evaluated at the point (u, λ). This gives

(17)
or
(18)
The solution of this system determines the adjoint state, λ. For the linear case with the least-squares misfit λ = ud.

• (iii)

Computation of the gradient of J. The gradient of J consists of the derivatives of with respect to m:

(19)
or
(20)
To compute the derivative of the augmented functional, we recall that ũ and are independent of m.

For the linear case with the least-squares misfit .

If u has complex values, since J(m) is a real, the real part should be taken in the right-hand side term of eqs (19) and (20).

The linear example with the least-squares misfit is a trivial example. In the next sections more complicated examples are described.

Least-Squares Migration

In this section, we formulate the migration as an inverse problem. The problem consists of minimizing with respect to the square of the slowness, the least-squares misfit between the synthetics, obtained by solving the wave equation, and the recorded (observed) reflection seismic data. The minimization of J should give the exact slowness. Unfortunately, in practice, J has many local minima, and a gradient optimization will only provide the best perturbation of the initial model inside a certain basin of attraction. This basin is generally not the basin of the global minimum, Gauthier et al. (1986). This is the reason why this problem is called the least-squares migration problem in this paper.

This example is a good example to understand how the adjoint-state method can be applied. It also allows us to redemonstrate that the gradient of J is a migration. This was discovered in the 1980s, Lailly (1983) and Tarantola (1984).

We will develop the computation for multiple sources and multiple receivers, first in the frequency domain because it is simple, then in the time domain.

Frequency domain

In frequency domain, the wave equation operator reads: L = – ω2σ2−Δ, with σ the slowness. Note that the dependency on the spatial coordinates, x, is not written. The finite-difference discretization of Lus = fs with given boundary conditions leads to a complex linear system Marfurt (1984):

(21)
A is a complex matrix of size n by n, where n is the total number of discretization points of the earth model. fs, a complex vector of n elements, represents the source function at the source point s. us, a complex vector of n elements, corresponds to the pressure field due to the shot at s. The model parameter, m, is a vector of M elements, and represents the values of the squared slowness at the discretization points.

The least-squares functional is

(22)
ds,r are the data recorded at the receiver position r due to the source fs. Ss,r is the restriction matrix onto the receiver r of the shot s.

The augmented functional reads, with and (the dependence on x is not written, but and depend on the space variables x):

(23)
where 〈,〉x is the scalar product in Cn. As are complex vectors of n elements, since the forward system, eq. (21), contains n scalar equations.

The derivative of with respect to evaluated at gives:

(24)
The adjoint state is defined by :
(25)
There is one adjoint system per shot and per angular frequency.

The matrix A propagates the shot into the earth and us is the incident field originating at s. The adjoint of A propagates backward its source term, Lailly (1983). The source term, the right-hand side of eq. (25), is the sum over the receivers of the shot s of the residual between the synthetics and data. λs is then the backpropagation of the residual field.

(26)
The gradient of J is a vector of M elements. If we impose that m is discretized on the same grid as us, M = n. Outside the boundary points is equal to −ω2. At the discretization point x, we obtain
(27)
Up to a multiplication factor, the gradient is similar to a migrated image and the formula is kinematically similar to the imaging principle, Clearbout (1985). A demonstration of this result without the adjoint-state method can be found in Plessix & Mulder (2004).

Time domain

We here develop the same approach but in time domain. The application of the adjoint-state method is slightly more complicated because of the initial boundary conditions.

The wave operator is . With the initial boundary conditions, the pressure field us due to the source fs satisfies:

(28)
us and fs depend on the time and on the spatial coordinates.

(29)
T is the recording time. Ss,r is the restriction operator onto the receiver position, it depends on the spatial coordinates. The model parameter is the squared slowness, m = σ2.

In the time domain us is real. For simplicity, we don't mention the spatial boundary conditions.

We associate the adjoint states and with the initial boundary conditions, and with the wave equation. The augmented functional is defined by:

(30)
with the real scalar product in the coordinate space.

After two integrations by part:

(31)
With eqs(30) and (31) we can now compute the derivatives with respect to and evaluate them at (u, λ) to obtain the adjoint-state equations:
(32)
(T denotes the transpose.)

The gradient of J at the point x is

(33)
The adjoint states, µ0s and µ1s do not play a role in the gradient of J. We can ignore them.

The system (32) has final boundary conditions. To solve it the computation is done backwards from T to 0. To give a physical sense to the adjoint state and to interpret the integral, eq. (33), a new adjoint state, qs, is defined by a change of variables in the time axis:

(34)
(35)
qs satisfies the same wave equation that us but with a different source term. Ss,rus(Tt) – ds,r(Tt) is the residual, the difference between the synthetics and the recorded data, in reverse time. eq. (35) propagates the residual into the earth starting from the final time. qs is called the backpropagated field of the residual. The gradient of J now reads:
(36)

This result has been demonstrated by Lailly (1983).

Shot-based Differential Semblance Optimization

As explained in the introduction of the previous section, the least-squares formulation is not satisfactory to retrieve the long wavelength components of the velocity model (background) from reflection seismic data, because the least-squares misfit as a function of the background has many local minima. To reformulate the problem and obtain a larger basin of attraction for the global minimum, an idea is to exploit the fact that in the reflection seismic data the earth is seen through different angles of incident. If the background velocity is correct, the pre-stack migration of the data should give the same images, Al Yahya (1989). If the pre-stack migration gives different earth structures, this means that the background slowness used in the migration is erroneous. Several mathematical formulations of this idea have been proposed in the last 20 years, among them Chavent & Jacewitz (1995), Clément (2001), Plessix et al. (2000) and Symes & Carazzone (1991). In order to compute the gradient of the reformulated cost functions, the authors generally use the adjoint-state formulation because it is the most systematic method, without forgetting that they are mainly mathematicians.

As an example, I will describe the gradient computation of the DSO functional introduced in Symes & Carazzone (1991) for a common shot-based approach. The principle is to migrate each shot individually and then to differentiate the pre-stack migrated result with respect to the shot position for fixed points in the migrated images. If the derivative with respect to the shot position of the pre-stack migrated data is zero, it means that the migrated images are independent of the shot position, that is, of the angle of incident and that the background is correct. To obtain a global formulation, the DSO functional is used as a regularization of the least-squares functional.

Using a finite-difference scheme in frequency-domain, eq. (21), the incident wavefield, ui, due to the source function fi located at the shot position i satisfies:

(37)
The dependency on the spatial coordinates is not explicitly written to simplify the notation. The synthetics at the angular frequency, ω, are Si,jui, with Si,j the restriction operator onto the receiver, j, of the shot, i.

To compute the migration, we introduce the backpropagated field, vi, defined by:

(38)
with di,j the measured seismic data due to the shot i recorded at the receiver j.

The shot migrated image, ri, is then defined by:

(39)
Here we abuse the notation and eq. (39) means , where x is a discretization point.

(40)
ns is the number of shots. α1 and α2 are the weights of the least-squares functional (the first term) and the DSO functional (the second term). The real part of ri+1ri is taken in the DSO functional because ri is a complex number and only the real part corresponds to the migrated image.

We recall that m is the squared slowness at the discretization point and all the state variables are differentiable functions with respect to m. The goal is to compute the gradient of J with respect to the squared slowness.

The forward equations, eqs (37), (38) and (39), depend on the state variables ui, vi and ri. ui, vi, ri are discretized on the same grid, therefore, ui, vi, ri belong to Cn, with n the number of discretization points. To define the augmented functional, we associate, for each shot with eq. (37), with eq. (38) and with eq. (39). and belong to Cn because eqs (37) or (38) or (39) define n scalar equations. These quantities depend on ω and x. The augmented functional reads with the state variables, , and the adjoint-state variables, :

(41)
The adjoint states λu, λv and λr are obtained by taking the derivatives of with respect to equal to zero at the point (u, v, r, λu, λv, λr, m):
(42)
The gradient of J is obtained by
(43)
since outside the boundary points. Notice that 〈λui(ω), ui(ω)〉x is a vector, the scalar product in x is taken per component i, since we should have written λui(ω, x) and ui(ω, x).

This application shows the numerical interest of the adjoint-state method with the augmented functional. Indeed, a systematic use of the method automatically produces the result. Using the perturbation as described in the first section can lead to the same result, but its application is a bit more difficult and cumbersome, as shown in the Appendix. The physical interpretation of the adjoint states is difficult to find. We can notice that λr is just the perturbation with respect to r of the DSO functional, λu satisfies the adjoint wave equation and λv the wave equation.

Stereotomography

The last example describes an application based on the ray equations. The functional depends on the traveltimes and on the other ray-based parameters. The derivatives of the traveltimes with respect to the velocity are efficiently computed by integrating the slowness perturbations along the ray. There is no real gain to introduce the adjoint states when only the derivatives of the traveltimes with respect to the velocity parameters are required, because the adjoint-state method does not give a more efficient algorithm. The derivatives of the ray trajectories can be evaluated from the paraxial ray equations and the propagator associated with this linear differential system, Cervený (2001) and Farra & Madariaga (1987). When the functional depends on the ray trajectories, the adjoint-state method provides a faster approach to the computation of the gradient of the functional. This case is illustrated with the stereotomography functional.

The purpose of the stereotomography, as described in Billette & Lambaré (1998), is to retrieve the velocity background not only from traveltimes picked on the seismic data but also from slopes of the locally coherent events in the common source and common receiver gathers. This approach differs from the classic traveltime tomography because the picks are interpreted independently from each other without any association to a given interface and they can represent either reflection or refraction events. The (observed) data are a set of source positions, xs, receiver positions, xr, two-way traveltimes, Tmsr, the slopes, ps, at the source locations, and the slopes, pr, at the receiver locations.

Following Billette & Lambaré (1998), the model parameters are xd the subsurface reflection points, θs and θr the take-off angles of the rays going towards the source, xs, and towards the receiver, xr, Ts and Tr the traveltimes along the rays from xd towards the source and the receiver and the parameters (vk) defining the continuous velocity field, v. The integration parameter along the ray is the time, t. The ray equations are:

(44)
With the ray equations becomes
(45)
ya is computed from t = 0 to t = Ta. The subscript a represents s or r. y0 is the function defining the initial conditions. The error functional is
(46)
with . In eq. (46), cT is a scalar coefficient and Cs and Cr are two diagonal matrices.

After the integration by part of the terms , the augmented functional reads

(47)
is a functional of the state variables, , the adjoint-state variables, , and the model parameters, Ts, Tr, θs, θr, xd and (vk). The derivatives with respect to the state variables, , give the adjoint-state equations:
(48)
with .

The derivatives of the augmented functional with respect to the model parameters correspond to the derivatives of Jsr with respect to the model parameters. This yields

(49)
For the computation of the derivatives with respect to Ta, we have used the fact that the ray equations are satisfied at Ta.

A more traditional approach to compute the gradient is to first determine the Fréchet derivatives. In this case, this means the derivatives of Ts, Tr, ys(Ts), and yr(Tr) with respect to Ts, Tr, xd, θs, θr, and (vk). A usual approach to compute those derivatives is to use the paraxial ray equations Cervený (2001) and Farra & Madariaga (1987). The perturbation δya of the rays ya is obtained from the propagator Pa defined by

(50)
This gives, Billette & Lambaré (1998)
(51)
and the Fréchet derivatives are
(52)
The other Fréchet derivatives are equal to 0.

The derivatives of Jsr are then simply

(53)
with ν equals to vk, xd, θs, or θr and
(54)

Relation between adjoint states and propagator

From eqs (49), (52) and (53), we deduce that

(55)
In fact, the propagator PTa satisfies
(56)
PTa is the propagator of the adjoint-state differential equation with a final condition, eq. (48). The adjoint state, λa, is then equal to (Ca is a diagonal matrix):
(57)
The main difference between the two approaches lies in the adjoint-state system (eq. 48) and the propagator system (eq. 50). Whereas the first one is a vectorial system, the second is a matrix system. This means that the adjoint-state system is d times smaller than the propagator system, with d = 4 in 2-D problems and d = 6 in 3-D problems. The adjoint-state method is then roughly d times faster. However, the adjoint-state method does not provide the Jacobian matrix, but only the gradient of J and the Jacobian may be used to determine the sensitivity of the solution to errors. The minimization should rely on non-linear optimization techniques, such as non-linear conjugate or quasi-Newton methods, Gill et al. (1981). In Billette & Lambaré (1998), the authors solve the non-linear optimization problem, by successively solving the linear problems defined by the Jacobian matrices.

Conclusion

The adjoint-state method for the gradient computation of a functional has been reviewed. The technique applies when the functional depends on the model parameters through a set of state variables, solutions of forward equations. The method consists of the computation of one unique extra linear system. The linear operator is formed with the adjoint of the operator defined by the derivatives of the forward model with respect to the state variables and the second member consists of the derivatives of the functional with respect to the state variables. The adjoint-state variables are the solution of this linear system. In a sense, they gather the information of the perturbations with respect to the state variables, assuming that the state variables are independent variables. Since this linear system is independent of the derivatives with respect to the model parameters, the adjoint states have to be computed only once, making the method numerically very efficient. The gradient of the functional with respect to the model parameters is now simply the scalar product between the adjoint states and the derivatives of the forward model with respect to the model parameters.

To form this extra linear system (the adjoint-state system) a recipe based on an augmented functional has been reviewed. This provides a systematic approach. The main step in the definition of this augmented functional is to consider the state variables and the adjoint-state variables as independent variables.

Several examples have been described to show the power of this approach. For the complicated example with the DSO functional, the adjoint-state equations have been derived from perturbation theory. This shows that the use of the augmented functional is not strictly necessary but simplifies the approach.

When the forward equations are the ray equations, the transpose of the propagator of the perturbed ray equations is the propagator of the adjoint equations. The adjoint-state method is computationally more efficient because the adjoint-state system is a vectorial system whereas the system of the propagator is a matrix system. Nevertheless, the adjoint-state method only gives the gradient, and not the Fréchet derivatives. The optimization problem should be solved with a non-linear optimization method, such a quasi-Newton or non-linear conjugate gradient technique.

Acknowledgment

I would like to thank Gilles Lambaré from Ecole des Mines de Paris and Colin Perkins from Shell for their comments on the manuscript.

Appendices

Appendix: Dso Gradient with Perturbation Approach

In this appendix, we retrieve the gradient of the DSO function directly from the perturbation approach without the help of the augmented functional. If the model, m, is perturbed by δm, the wavefields, ui, are perturbed by δui, the backpropagated wavefields, vi, are perturbed by δvi, the migrated images, ri, are perturbed by δri, and the functional J by δJ. From eq. (40) we obtain

(A1)
(ns is the number of shots, i is the shot index and 〈,〉 is the scalar product in the data space, 〈,〉x is the scalar product in spatial coordinate space.)

The perturbations of the state variable equations, eqs (37), (38) and (39), gives

(A2)
The dependency on the spatial coordinates and the angular frequency are not written.

The complicated part consists in defining the adjoint-state equations. However with some experience, this is possible. For this example, we first rewrite the second term of eq. (A1)

(A3)
with r0 = r1 and . And we define
(A4)
Replacing δri by its value gives:
(A5)
We then gather the terms depending on δui and the terms depending on δvi:
(A6)
with Re(〈−λriu*i, δv*ix) = Re(〈−uiλr*i, δvix) and eij = S*ij (Sijuidij).

By replacing δvi by its values in the second term of eq. (A6) we obtain

(A7)
By replacing δui by its value in the first term of eq. (A6) we obtain
(A8)
We can now define
(A9)
The equations eqs (A4) and (A9) are the adjoint-state equations. They are the same that the ones obtained with the use of the augmented functional in the section on the DSO function, however the derivation is less obvious and systematic in this appendix.

(A10)
And with , we obtain:
(A11)

References

1
Al Yahya
K.M.
1989
Velocity analysis by iterative profile migration
Geophysics

54
718
729
2
Becache
E.
1992
Reflection tomography: how to cope with multiple arrivals? Part II: a new gradient computation method: continuous and discrete problem
in
PSI Consortium Annual Report

3
Billette
F.
Lambaré
G.
1998
Velocity macro-model estimation from seismic reflection data by stereotomography
Geophys. J. Int.

135
671
690
4
Cervený
V.
2001
Seismic Ray Method

Cambridge Univ. Press
New York.
5
Chavent
G.
1974
Identification of function parameters in partial differential equations
in
Identification of parameter distributed systems
eds
Goodson
R.E.
Polis
New-York
ASME 1974
6
Chavent
G.
Jacewitz
C.A.
1995
Determination of background velocities by multiple migration fitting
Geophysics

60
476
490
7
Ciarlet
P.G.
1989
Introduction to Numerical Linear Algebra and Optimisation

Cambridge University Press
New York
8
Clearbout
J.F.
1985
Imaging the Earth's Interior

Blackwell Scientific Publications
London
9
Clément
F.
Chavent
G.
Gomez
S.
2001
Migration-based traveltime inversion of 2-D simple structures: a synthetic example
Geophysics

66
845
860
10
Farra
V.
R.
1987
Seismic waveform modeling in heterogeneous media by ray perturbation theory
J. geophys. Res.

92
2697
2712
11
Gauthier
O.
Virieux
J.
Tarantola
A.
1986
Two-dimensional nonlinear inversion of seismic waveforms: numerical results
geophysics

51
1387
1403
12
Gill
P.E.
Murray
W.
Wright
M.H.
1981
Practical Optimization

NewYork
13
Lailly
P.
1983
The seismic inverse problem as a sequence of before stack migration
in
Proc. of Conf. on Inverse Scattering, Theory and Applications

SIAM
14
Lambaré
G.
Alerini
M.
Baina
R.
Podvin
P.
2004
Stereotomography: a semi-automatic approach for velocity macromodel estimation
Geophys. Prospect.

52
671
682
15
Lions
J.
1972
Nonhomogeneous boundary value problems and applications

Springer Verlag
Berlin
16
Liu
Z.
Bleistein
N.
2001
Migration velocity analysis: theory and iterative algorithm
Geophysics

60
142
153
17
Marfurt
K.J.
1984
Accuracy of finite-difference and finite-element modeling of the scalar and elastic wave equations
Geophysics

49
533
549
18
Mora
P.
1989
Inversion = migration + tomography
Geophysics

54
1575
1586
19
Mulder
W.A.
ten Kroode
A.P.E.
2002
Automatic velocity analysis by differential semblance optimization
Geophysics

67
1184
1191
20
Plessix
R. E.
Mulder
W.A.
2004
Frequency-domain finite-difference amplitude-preserving migration
Geophys. J. Int.

157
975
987
21
Plessix
R. E.
de Roech
Y. H.
Guy
Chavent
1999
Waveform inversion of reflection seismic data for kinematic parameters by local optimization
SIAM J. on Scientific Computing

20
(
3
),
1033
1052
22
Plessix
R. E.
Mulder
W.A.
ten Kroode
A.P.E.
2000
Automatic cross-well tomography by semblance and differential semblance optimization: theory and gradient computation
Geophys. Prospect.

48
913
935
23
Shen
P.
Symes
W.W.
Stolk
C.
2003
Differential semblance velocity analysis by wave-equation migration
in
Proc. of the 73th SEG int'l mtg

Dallas
Expanded Abstract
24
Symes
W.W.
Carazzone
J.J.
1991
Velocity inversion by differential semblance optimisation
Geophysics

56
654
663
25
Tarantola
A.
1984
Linearized inversion of seismic reflection data
Geophys. Prospect.

32
998
1015
26
Tarantola
A.
1987
Inverse Problem theory

Elsevier
Amsterdam
27
Vinje
V.
Iversen
E.
Gjøystdal
H.
1993
Traveltime and amplitude estimation using wave front construction
Geophysics

58
1157
1166