Summary
Estimating the model parameters from measured data generally consists of minimizing an error functional. A classic technique to solve a minimization problem is to successively determine the minimum of a series of linearized problems. This formulation requires the Fréchet derivatives (the Jacobian matrix), which can be expensive to compute. If the minimization is viewed as a nonlinear optimization problem, only the gradient of the error functional is needed. This gradient can be computed without the Fréchet derivatives. In the 1970s, the adjointstate method was developed to efficiently compute the gradient. It is now a wellknown method in the numerical community for computing the gradient of a functional with respect to the model parameters when this functional depends on those model parameters through state variables, which are solutions of the forward problem. However, this method is less well understood in the geophysical community. The goal of this paper is to review the adjointstate method. The idea is to define some adjointstate variables that are solutions of a linear system. The adjointstate variables are independent of the model parameter perturbations and in a way gather the perturbations with respect to the state variables. The adjointstate method is efficient because only one extra linear system needs to be solved.
Several applications are presented. When applied to the computation of the derivatives of the ray trajectories, the link with the propagator of the perturbed ray equation is established.
Introduction
One of the important tasks in data processing consists of determining model parameters from observed data. These tasks can be formulated as inverse problems, namely as the minimization of a functional, for instance the leastsquares misfit between synthetic and observed data. In geophysics, this includes tomography, migration/inversion and automatic velocity analysis. When a local (descent) optimization technique, such as the conjugate gradient method, is used, the gradient of the functional is required, Gauthier et al. (1986), Liu & Bleistein (2001) and Mora (1989). The efficiency of the method greatly depends on the accuracy and efficiency of the computation of this gradient. With physical problems, the functional depends on socalled state variables. These state variables are the variables computed from the state equations, namely the equations that define the problem, sometimes called forward equations. For example, in the tomography problem, the state equations can be the ray equations, and the state variables the spatial coordinates and the slowness vectors describing the ray trajectories. The definition of the state variables depends on the mathematical formulation of the physical problem. The state equations depend on the model parameters. For the tomography problem this can be the background velocity (or slowness). The functional depends on those model parameters mainly through the dependency on the state variables.
The gradient of the functional, which depends on a set of state variables solutions of the forward equations, can be obtained with (a set of) the Fréchet derivatives of the state variables. The Fréchet derivatives are the derivatives of the state variables with respect to the model parameter. For instance, for the tomography problem described earlier, these derivatives are the derivatives of the spatial coordinates and the slowness vectors with respect to the slowness background. This gives the socalled Jacobian or sensitivity matrix. The Jacobian matrix can be used to linearize the functional, and the minimization problem can be solved by successively solving linearized problems using linear optimization techniques. However, the computation of the Fréchet derivatives can be expensive.
If nonlinear optimization techniques, such as the nonlinear conjugate method, are used, only the gradient of the functional may be needed, Gill et al. (1981). In the 1970s, a method based on the adjoint state has been introduced in the theory of inverse problems by Chavent (1974) to efficiently compute the gradient of a functional without the Fréchet derivatives. This approach originated from control theory, Lions (1972). Several authors in geophysics have applied this method, for instance, Lailly (1983), Bécache (1992), Chavent & Jacewitz (1995), Plessix et al. (1999) and Shen et al. (2003). The goal of this note is to review this method that is well known in the numerical community, to give a recipe for applying it based on an augmented functional also called associated Lagrangian, and to describe several examples demonstrating its practical use.
The adjointstate method is a general method to compute the gradient of a functional that depends on a set of state variables, which are solutions of forward equations. The adjointstate variables are the solutions of an adjoint linear system and can be seen as variables which gather a global measure of the perturbation of the problem with respect to the state variables. Numerically this approach is attractive because only one extra linear system needs to be solved and often the computation of the gradient with respect to the model parameters is equivalent to one or two evaluations of the forward modelling. This cost is often almost independent of the number of model parameters, which is not always the case when the Fréchet derivatives are computed. However the adjointstate method does not provide the sensitivity of the solution to errors. For that, the Fréchet derivatives are needed or a Monte Carlo type of methods with a large number of forward computations is required.
The outline of the paper is the following. In a first section the adjointstate variables are introduced from the perturbation theory. Then based on an augmented functional a recipe to systematically define the adjointstate equations is described. In the three next sections, examples are given. The first one is the leastsquares migration, Tarantola (1987). This is almost a school example. The functional is the leastsquares misfit between the synthetics and the measured reflection seismic data. The adjoint states correspond to the backpropagated field and the gradient of this leastsquares misfit is a migration operator, Lailly (1983) and Tarantola (1984). The second example is the computation of the gradient of the differential semblance optimization (DSO) functional, Symes & Carazzone (1991) and Shen et al. (2003). This example shows the power of the adjointstate technique. The third example is the stereotomography, Billette & Lambaré (1998) and Lambaré et al. (2004). The link between the adjointstate variables and the propagator of the differential equation defining the ray trajectory perturbations, Cervený (2001) and Farra & Madariaga (1987), is established.
Method
The goal of this section is to explain the adjointstate method for computing the gradient of a functional, J(m), when J depends on u(m). J is defined with the functional, h, by
The state variables, u, satisfy the state equations defined with the mapping, F,
F is also called the forward problem or forward equation. m is the model parameter and belongs to the model parameter space M. M is a real space in this article. u belongs to the state variable space, U. U is a real or complex space. A state variable, u, is a physical realization if F(u, m) = 0. F is a mapping from U × M to U. In order to distinguish between a physical realization and any element of U, the elements of U are denoted by ũ.h is a functional from U × M to R, the real space, whereas J is a functional from M to R. Synthetic data are generally a subset of the state variables. It is assumed that h, F, and J are at least continuously differentiable and u(m) is uniquely defined and continuously differentiable.A simple example is the linear case and , with d the observed data. This corresponds to the simple leastsquares misfit with a linear forward problem. The physical realization is defined by u(m) = Am, with A a linear operator, in the discrete case a rectangular matrix.
The adjointstate method from the perturbation theory
A perturbation, δm, of the model parameter, m, induces a perturbation, δu, of the physical realization, u, and a perturbation δJ of the error functional, J. u+δu should be a physical realization with the model parameter m + δm. Therefore, to the first order:
Since F(u, m) = 0, the first order development gives: For the linear case this gives δu = Aδm.The first order development of J gives:
where 〈,〉_{U} is the scalar product in U.For the simple leastsquares misfit, δJ = 〈u – d, δu〉_{U}.
Assuming that for any model parameter m of M there exists a unique solution u of U, u + δu is the unique solution of F(u+δu, m+δm) = 0. Therefore, at the first order, δu is the unique solution of eq. (4) and can be written with the inverse of . This gives:
(* denotes the adjoint). For the linear example with the leastsquares misfit, this gives δJ = 〈u − d, Aδm〉 since in this case; I is the identity operator.In the second line of eq. (6), the terms that do not depend on the perturbation δm have been gathered, the idea is to avoid the computation of the Fréchet derivatives, , because this can be expensive. Let us now define λ by
The perturbation δJ now reads: For the linear example with the leastsquares misfit this simply gives λ = u – d and δJ =〈u – d, Aδm〉_{U}.λ belongs to the dual space of U. It is called the adjointstate variable and eq. (7) is the adjointstate equation. This is a system of linear equations. The linear operator is the adjoint of the operator formed by the derivatives of the state equations (the mapping F) with respect to the state variables. The righthand side consists of the derivatives of the functional, h, with respect to the state variables. In a sense the adjoint states gather the information on the perturbations of the state variables, viewed as independent variables. The computation of with eqs (7) and (8) is called the adjointstate method.
The gradient of J can be computed either via the Fréchet derivatives of u with eqs (4) and (5) or via the adjointstate method with eqs (7) and (8). The important differences between the two approaches are related to eqs (4) and (7). Indeed, on one hand, eq. (7) is independent of δm and needs to be solved only once. On the other hand, the righthand side of eq. (4) depends on δm and this equation needs to be solved for each perturbation to obtain , namely M times if M is the number of elements in m. The computational time of the adjointstate method is often almost independent of M, because the time to compute eq. (8) is often negligible compared with the time to solve eq. (7). This makes this approach very efficient. eq. (7) depends on the adjoint of evaluated at (u, m), this means that u should be completely known before solving it.
The adjointstate equations can also be obtained with the use of an augmented functional, also called associated Lagrangian.
A recipe with the augmented functional
Let us define the augmented functional, , from U×U*×M to R (U* is the dual of U) by:
where is any element of U* and, therefore, does not depend on m, as ũ is any element of U.u is a physical realization, therefore, F(u, m) = 0, and for any
and since is independent of m, We can then choose λ in U* such that: This equation is identical to eq. (7) and is the adjointstate equation. With this choice we retrieve the result of eq. (8).can be also viewed as the Lagrangian associated with the minimization problem: find the minimum u of h(ũ,m) under the constraint F(u, m) = 0. The theory of optimization with equality constraints, Ciarlet (1989), tells us that u is the minimum, if (u, λ) is a saddle point of .λ are called the Lagrange multipliers. At the saddle point the derivatives of are equal to 0. the derivatives of with respect to ũ and are:
Therefore, gives the state equations and 0 gives the adjointstate equations. And as seen previously. Notice again that when deriving with respect to m,ũ and are independent of m.
This link with the optimization theory is not needed to apply the adjointstate method. For those familiar with this theory, it helps to recall the method. As in the optimization theory with equality constraints where one scalar Lagrange multiplier is associated with each scalar equation defining the constraints, one scalar adjoint state is associated with each scalar equation defining the mapping F in the augmented functional.
The computation of the gradient with the adjoint states can be summarized in the following recipe when u has been found from F(u, m) = 0:

(i)
Build the augmented functional (associated Lagrangian) . , a functional of independent variables ũ, and m is defined by
if is composed of N scalar equations, is a vector with N components, since at each scalar equation of F an adjoint state is associated, and is defined by: For the linear case with the leastsquares misfit . 
(ii)
Define the adjointstate equations. The adjointstate equations are simply defined by , where the derivatives are evaluated at the point (u, λ). This gives
or The solution of this system determines the adjoint state, λ. For the linear case with the leastsquares misfit λ = u – d. 
(iii)
Computation of the gradient of J. The gradient of J consists of the derivatives of with respect to m:
or To compute the derivative of the augmented functional, we recall that ũ and are independent of m.For the linear case with the leastsquares misfit .
If u has complex values, since J(m) is a real, the real part should be taken in the righthand side term of eqs (19) and (20).
The linear example with the leastsquares misfit is a trivial example. In the next sections more complicated examples are described.
LeastSquares Migration
In this section, we formulate the migration as an inverse problem. The problem consists of minimizing with respect to the square of the slowness, the leastsquares misfit between the synthetics, obtained by solving the wave equation, and the recorded (observed) reflection seismic data. The minimization of J should give the exact slowness. Unfortunately, in practice, J has many local minima, and a gradient optimization will only provide the best perturbation of the initial model inside a certain basin of attraction. This basin is generally not the basin of the global minimum, Gauthier et al. (1986). This is the reason why this problem is called the leastsquares migration problem in this paper.
This example is a good example to understand how the adjointstate method can be applied. It also allows us to redemonstrate that the gradient of J is a migration. This was discovered in the 1980s, Lailly (1983) and Tarantola (1984).
We will develop the computation for multiple sources and multiple receivers, first in the frequency domain because it is simple, then in the time domain.
Frequency domain
In frequency domain, the wave equation operator reads: L = – ω^{2}σ^{2}−Δ, with σ the slowness. Note that the dependency on the spatial coordinates, x, is not written. The finitedifference discretization of Lu_{s} = f_{s} with given boundary conditions leads to a complex linear system Marfurt (1984):
A is a complex matrix of size n by n, where n is the total number of discretization points of the earth model. f_{s}, a complex vector of n elements, represents the source function at the source point s. u_{s}, a complex vector of n elements, corresponds to the pressure field due to the shot at s. The model parameter, m, is a vector of M elements, and represents the values of the squared slowness at the discretization points.The leastsquares functional is
d_{s,r} are the data recorded at the receiver position r due to the source f_{s}. S_{s,r} is the restriction matrix onto the receiver r of the shot s.The augmented functional reads, with and (the dependence on x is not written, but and depend on the space variables x):
where 〈,〉_{x} is the scalar product in C^{n}. As are complex vectors of n elements, since the forward system, eq. (21), contains n scalar equations.The derivative of with respect to evaluated at gives:
The adjoint state is defined by : There is one adjoint system per shot and per angular frequency.The matrix A propagates the shot into the earth and u_{s} is the incident field originating at s. The adjoint of A propagates backward its source term, Lailly (1983). The source term, the righthand side of eq. (25), is the sum over the receivers of the shot s of the residual between the synthetics and data. λ_{s} is then the backpropagation of the residual field.
The gradient of J is a vector of M elements. If we impose that m is discretized on the same grid as u_{s}, M = n. Outside the boundary points is equal to −ω^{2}. At the discretization point x, we obtain Up to a multiplication factor, the gradient is similar to a migrated image and the formula is kinematically similar to the imaging principle, Clearbout (1985). A demonstration of this result without the adjointstate method can be found in Plessix & Mulder (2004).Time domain
We here develop the same approach but in time domain. The application of the adjointstate method is slightly more complicated because of the initial boundary conditions.
The wave operator is . With the initial boundary conditions, the pressure field u_{s} due to the source f_{s} satisfies:
u_{s} and f_{s} depend on the time and on the spatial coordinates.The leastsquares functional reads
T is the recording time. S_{s,r} is the restriction operator onto the receiver position, it depends on the spatial coordinates. The model parameter is the squared slowness, m = σ^{2}.In the time domain u_{s} is real. For simplicity, we don't mention the spatial boundary conditions.
We associate the adjoint states and with the initial boundary conditions, and with the wave equation. The augmented functional is defined by:
with the real scalar product in the coordinate space.After two integrations by part:
With eqs(30) and (31) we can now compute the derivatives with respect to and evaluate them at (u, λ) to obtain the adjointstate equations: (^{T} denotes the transpose.)The gradient of J at the point x is
The adjoint states, µ^{0}_{s} and µ^{1}_{s} do not play a role in the gradient of J. We can ignore them.The system (32) has final boundary conditions. To solve it the computation is done backwards from T to 0. To give a physical sense to the adjoint state and to interpret the integral, eq. (33), a new adjoint state, q_{s}, is defined by a change of variables in the time axis:
The new adjointstate system reads q_{s} satisfies the same wave equation that u_{s} but with a different source term. S_{s,r}u_{s}(T – t) – d_{s,r}(T – t) is the residual, the difference between the synthetics and the recorded data, in reverse time. eq. (35) propagates the residual into the earth starting from the final time. q_{s} is called the backpropagated field of the residual. The gradient of J now reads:This result has been demonstrated by Lailly (1983).
Shotbased Differential Semblance Optimization
As explained in the introduction of the previous section, the leastsquares formulation is not satisfactory to retrieve the long wavelength components of the velocity model (background) from reflection seismic data, because the leastsquares misfit as a function of the background has many local minima. To reformulate the problem and obtain a larger basin of attraction for the global minimum, an idea is to exploit the fact that in the reflection seismic data the earth is seen through different angles of incident. If the background velocity is correct, the prestack migration of the data should give the same images, Al Yahya (1989). If the prestack migration gives different earth structures, this means that the background slowness used in the migration is erroneous. Several mathematical formulations of this idea have been proposed in the last 20 years, among them Chavent & Jacewitz (1995), Clément (2001), Plessix et al. (2000) and Symes & Carazzone (1991). In order to compute the gradient of the reformulated cost functions, the authors generally use the adjointstate formulation because it is the most systematic method, without forgetting that they are mainly mathematicians.
As an example, I will describe the gradient computation of the DSO functional introduced in Symes & Carazzone (1991) for a common shotbased approach. The principle is to migrate each shot individually and then to differentiate the prestack migrated result with respect to the shot position for fixed points in the migrated images. If the derivative with respect to the shot position of the prestack migrated data is zero, it means that the migrated images are independent of the shot position, that is, of the angle of incident and that the background is correct. To obtain a global formulation, the DSO functional is used as a regularization of the leastsquares functional.
Using a finitedifference scheme in frequencydomain, eq. (21), the incident wavefield, u_{i}, due to the source function f_{i} located at the shot position i satisfies:
The dependency on the spatial coordinates is not explicitly written to simplify the notation. The synthetics at the angular frequency, ω, are S_{i,j}u_{i}, with S_{i,j} the restriction operator onto the receiver, j, of the shot, i.To compute the migration, we introduce the backpropagated field, v_{i}, defined by:
with d_{i,j} the measured seismic data due to the shot i recorded at the receiver j.The shot migrated image, r_{i}, is then defined by:
Here we abuse the notation and eq. (39) means , where x is a discretization point. n_{s} is the number of shots. α_{1} and α_{2} are the weights of the leastsquares functional (the first term) and the DSO functional (the second term). The real part of r_{i+1} –r_{i} is taken in the DSO functional because r_{i} is a complex number and only the real part corresponds to the migrated image.We recall that m is the squared slowness at the discretization point and all the state variables are differentiable functions with respect to m. The goal is to compute the gradient of J with respect to the squared slowness.
The forward equations, eqs (37), (38) and (39), depend on the state variables u_{i}, v_{i} and r_{i}. u_{i}, v_{i}, r_{i} are discretized on the same grid, therefore, u_{i}, v_{i}, r_{i} belong to C^{n}, with n the number of discretization points. To define the augmented functional, we associate, for each shot with eq. (37), with eq. (38) and with eq. (39). and belong to C^{n} because eqs (37) or (38) or (39) define n scalar equations. These quantities depend on ω and x. The augmented functional reads with the state variables, , and the adjointstate variables, :
The adjoint states λ^{u}, λ^{v} and λ^{r} are obtained by taking the derivatives of with respect to equal to zero at the point (u, v, r, λ^{u}, λ^{v}, λ^{r}, m): The gradient of J is obtained by since outside the boundary points. Notice that 〈λ^{u}_{i}(ω), u_{i}(ω)〉_{x} is a vector, the scalar product in x is taken per component i, since we should have written λ^{u}_{i}(ω, x) and u_{i}(ω, x).This application shows the numerical interest of the adjointstate method with the augmented functional. Indeed, a systematic use of the method automatically produces the result. Using the perturbation as described in the first section can lead to the same result, but its application is a bit more difficult and cumbersome, as shown in the Appendix. The physical interpretation of the adjoint states is difficult to find. We can notice that λ^{r} is just the perturbation with respect to r of the DSO functional, λ^{u} satisfies the adjoint wave equation and λ^{v} the wave equation.
Stereotomography
The last example describes an application based on the ray equations. The functional depends on the traveltimes and on the other raybased parameters. The derivatives of the traveltimes with respect to the velocity are efficiently computed by integrating the slowness perturbations along the ray. There is no real gain to introduce the adjoint states when only the derivatives of the traveltimes with respect to the velocity parameters are required, because the adjointstate method does not give a more efficient algorithm. The derivatives of the ray trajectories can be evaluated from the paraxial ray equations and the propagator associated with this linear differential system, Cervený (2001) and Farra & Madariaga (1987). When the functional depends on the ray trajectories, the adjointstate method provides a faster approach to the computation of the gradient of the functional. This case is illustrated with the stereotomography functional.
The purpose of the stereotomography, as described in Billette & Lambaré (1998), is to retrieve the velocity background not only from traveltimes picked on the seismic data but also from slopes of the locally coherent events in the common source and common receiver gathers. This approach differs from the classic traveltime tomography because the picks are interpreted independently from each other without any association to a given interface and they can represent either reflection or refraction events. The (observed) data are a set of source positions, x_{s}, receiver positions, x_{r}, twoway traveltimes, T^{m}_{sr}, the slopes, p_{s}, at the source locations, and the slopes, p_{r}, at the receiver locations.
Following Billette & Lambaré (1998), the model parameters are x_{d} the subsurface reflection points, θ_{s} and θ_{r} the takeoff angles of the rays going towards the source, x_{s}, and towards the receiver, x_{r}, T_{s} and T_{r} the traveltimes along the rays from x_{d} towards the source and the receiver and the parameters (v_{k}) defining the continuous velocity field, v. The integration parameter along the ray is the time, t. The ray equations are:
With the ray equations becomes y_{a} is computed from t = 0 to t = T_{a}. The subscript a represents s or r. y_{0} is the function defining the initial conditions. The error functional is with . In eq. (46), c_{T} is a scalar coefficient and C_{s} and C_{r} are two diagonal matrices.Gradient with the adjointstate method
After the integration by part of the terms , the augmented functional reads
is a functional of the state variables, , the adjointstate variables, , and the model parameters, T_{s}, T_{r}, θ_{s}, θ_{r}, x_{d} and (v_{k}). The derivatives with respect to the state variables, , give the adjointstate equations: with .The derivatives of the augmented functional with respect to the model parameters correspond to the derivatives of J_{sr} with respect to the model parameters. This yields
For the computation of the derivatives with respect to T_{a}, we have used the fact that the ray equations are satisfied at T_{a}.Gradient with the Fréchet derivatives
A more traditional approach to compute the gradient is to first determine the Fréchet derivatives. In this case, this means the derivatives of T_{s}, T_{r}, y_{s}(T_{s}), and y_{r}(T_{r}) with respect to T_{s}, T_{r}, x_{d}, θ_{s}, θ_{r}, and (v_{k}). A usual approach to compute those derivatives is to use the paraxial ray equations Cervený (2001) and Farra & Madariaga (1987). The perturbation δy_{a} of the rays y_{a} is obtained from the propagator P_{a} defined by
This gives, Billette & Lambaré (1998) and the Fréchet derivatives are The other Fréchet derivatives are equal to 0.The derivatives of J_{sr} are then simply
with ν equals to v_{k}, x_{d}, θ_{s}, or θ_{r} andRelation between adjoint states and propagator
From eqs (49), (52) and (53), we deduce that
In fact, the propagator P^{T}_{a} satisfies P^{T}_{a} is the propagator of the adjointstate differential equation with a final condition, eq. (48). The adjoint state, λ_{a}, is then equal to (C_{a} is a diagonal matrix): The main difference between the two approaches lies in the adjointstate system (eq. 48) and the propagator system (eq. 50). Whereas the first one is a vectorial system, the second is a matrix system. This means that the adjointstate system is d times smaller than the propagator system, with d = 4 in 2D problems and d = 6 in 3D problems. The adjointstate method is then roughly d times faster. However, the adjointstate method does not provide the Jacobian matrix, but only the gradient of J and the Jacobian may be used to determine the sensitivity of the solution to errors. The minimization should rely on nonlinear optimization techniques, such as nonlinear conjugate or quasiNewton methods, Gill et al. (1981). In Billette & Lambaré (1998), the authors solve the nonlinear optimization problem, by successively solving the linear problems defined by the Jacobian matrices.Conclusion
The adjointstate method for the gradient computation of a functional has been reviewed. The technique applies when the functional depends on the model parameters through a set of state variables, solutions of forward equations. The method consists of the computation of one unique extra linear system. The linear operator is formed with the adjoint of the operator defined by the derivatives of the forward model with respect to the state variables and the second member consists of the derivatives of the functional with respect to the state variables. The adjointstate variables are the solution of this linear system. In a sense, they gather the information of the perturbations with respect to the state variables, assuming that the state variables are independent variables. Since this linear system is independent of the derivatives with respect to the model parameters, the adjoint states have to be computed only once, making the method numerically very efficient. The gradient of the functional with respect to the model parameters is now simply the scalar product between the adjoint states and the derivatives of the forward model with respect to the model parameters.
To form this extra linear system (the adjointstate system) a recipe based on an augmented functional has been reviewed. This provides a systematic approach. The main step in the definition of this augmented functional is to consider the state variables and the adjointstate variables as independent variables.
Several examples have been described to show the power of this approach. For the complicated example with the DSO functional, the adjointstate equations have been derived from perturbation theory. This shows that the use of the augmented functional is not strictly necessary but simplifies the approach.
When the forward equations are the ray equations, the transpose of the propagator of the perturbed ray equations is the propagator of the adjoint equations. The adjointstate method is computationally more efficient because the adjointstate system is a vectorial system whereas the system of the propagator is a matrix system. Nevertheless, the adjointstate method only gives the gradient, and not the Fréchet derivatives. The optimization problem should be solved with a nonlinear optimization method, such a quasiNewton or nonlinear conjugate gradient technique.
Acknowledgment
I would like to thank Gilles Lambaré from Ecole des Mines de Paris and Colin Perkins from Shell for their comments on the manuscript.
Appendices
Appendix: Dso Gradient with Perturbation Approach
In this appendix, we retrieve the gradient of the DSO function directly from the perturbation approach without the help of the augmented functional. If the model, m, is perturbed by δm, the wavefields, u_{i}, are perturbed by δu_{i}, the backpropagated wavefields, v_{i}, are perturbed by δv_{i}, the migrated images, r_{i}, are perturbed by δr_{i}, and the functional J by δJ. From eq. (40) we obtain
(n_{s} is the number of shots, i is the shot index and 〈,〉 is the scalar product in the data space, 〈,〉_{x} is the scalar product in spatial coordinate space.)The perturbations of the state variable equations, eqs (37), (38) and (39), gives
The dependency on the spatial coordinates and the angular frequency are not written.The complicated part consists in defining the adjointstate equations. However with some experience, this is possible. For this example, we first rewrite the second term of eq. (A1)
with r_{0} = r_{1} and . And we define Replacing δr_{i} by its value gives: We then gather the terms depending on δu_{i} and the terms depending on δv_{i}: with Re(〈−λ^{r}_{i}u*_{i}, δv*_{i}〉_{x}) = Re(〈−u_{i}λ^{r}*_{i}, δv_{i}〉_{x}) and e_{ij} = S*_{ij} (S_{ij}u_{i}−d_{ij}).By replacing δv_{i} by its values in the second term of eq. (A6) we obtain
By replacing δu_{i} by its value in the first term of eq. (A6) we obtain We can now define The equations eqs (A4) and (A9) are the adjointstate equations. They are the same that the ones obtained with the use of the augmented functional in the section on the DSO function, however the derivation is less obvious and systematic in this appendix.The perturbation δJ now reads:
And with , we obtain: