True-amplitude versus trace-normalized full waveform inversion

(FWI) is a powerful to estimate high-resolution physical parameters of the subsurface by iteratively minimizing the misﬁt between the observed and synthetic seismic data. Standard FWI seismic misﬁt between amplitude-preserved seismic data (true-amplitude FWI). However, in order to mitigate the variations in sources and recording systems acquired on complex geological structures and the physics that cannot be modelled using an approximation of the seismic wave equation, the observed and synthetic seismic data are normalized trace-by-trace and then used to perform FWI. Trace-by-trace normalization removes the amplitude effects related to offset variations and only keeps the phase information. Furthermore, trace-by-trace normalization changes the true amplitude difference because of different normalization factors used for the corresponding synthetic and observed traces. In this paper, we study the performance of true-amplitude FWI and trace-normalized-residual-based FWI in the time domain. The misﬁt function of trace-normalized-residual-based FWI is deﬁned such that the adjoint source used in gradient calculation is the trace-normalized seismic residual. We compare the two inversion schemes with synthetic seismic data simulated on laterally invariant models and the more complex 2-D Marmousi model. In order to simulate realistic scenarios, we perform the elastic FWI ignoring attenuation to noisy seismic data and to the synthetic data modelled using a viscoelastic modelling scheme. Comparisons of seismic data and adjoint sources show that trace-by-trace normalization increases the magnitude of seismic data at far offsets, which are usually more cycle-skipped than those at near offsets. The inverted results on linear-gradient models demonstrate that trace-by-trace normalization increases the non-linearity of FWI, so an initial model with sufﬁcient accuracy is required to guarantee the convergence to the global minimum. The inverted results and the ﬁnal seismic residuals computed using seismic data without trace-by-trace normalization demonstrate that true-amplitude FWI provides inverted models with higher accuracy than trace-normalized-residual-based FWI, even when the unknown density is updated using density–velocity relationship in inversion or in the presence of noise and complex physics, such as attenuation.


I N T RO D U C T I O N
Full waveform inversion (FWI), first proposed by Lailly (1983) and Tarantola (1984), is an advanced geophysical imaging technique that attempts to retrieve quantitatively high-resolution physical parameters of the subsurface from seismic data (Virieux & Operto 2009). By taking into account traveltimes, amplitude and phase information, FWI can, in principle, effectively reconstruct the medium and short wavelength components of the elastic parameters (Neves & Singh 1996). Theoretically, FWI has a resolution power of half of the propagated wavelength, which is much higher than that of inversion methods using only traveltime information, such as wave equation traveltime tomography (e.g. Luo & Schuster 1991;Wang et al. 2014).
FWI is generally described as a local non-linear least-squares (LS) optimization problem and is usually solved using gradientbased approaches (Virieux & Operto 2009). Given an initial estimation of the elastic parameters with good accuracy, FWI improves this initial estimation by iteratively minimizing the mismatch between the observed data and synthetic data, the latter being mod-elled using a wave equation solver. The updated perturbation, that is gradient, is explicitly computed by zero-lag cross-correlating the source generated forward-propagated wavefield and the backwardprojected wavefield emitted from receiver positions with seismic residuals as excitation sources (Tarantola 1984). Currently, FWI has been successfully applied on synthetic seismic data (e.g. Mora 1987;Shipp & Singh 2002;Borisov & Singh 2015) and field seismic data (e.g. Shipp & Singh 2002;Warner et al. 2013;Arnoux et al. 2017;Górszczyk et al. 2017;Huot & Singh 2018).
True-amplitude FWI with LS misfit function employs the difference between the amplitude-preserved observed and synthetic seismic data to construct the gradient (e.g. Tarantola 1984;Virieux & Operto 2009), which completely considers the amplitude variation with offset (AVO) and the phase information. However, one requires an accurate source wavelet and accurate modelling of the physics in elastic media. In order to minimize the physics of the Earth that influence the wave propagation not completely described by the wave equation (e.g. Shen 2010;Warner et al. 2012), such as environmental noise and attenuation effects of the Earth, or to mitigate the receiver site effects (Tao et al. 2017) or source inaccuracy (Louboutin et al. 2017), seismic data after trace-by-trace normalization have been used in FWI both in the time domain (e.g. Morgan et al. 2013Warner et al. 2013) and in the frequency domain (e.g. Dessa et al. 2004;Operto et al. 2004;Ravaut et al. 2004;Bleibinhaus et al. 2007;Malinowski & Operto 2008). Seismic inversion schemes using trace-by-trace normalized seismic data can be separated into two types: trace-normalized-misfit-based FWI and trace-normalized-residual-based FWI. Trace-normalizedmisfit-based FWI method builds the misfit function using the LS norm of the difference between trace-by-trace normalized observed and synthetic data (e.g. Shen 2010; Morgan et al. 2016). In time domain, this is a zero-lag cross-correlation based misfit function, though they have different expressions for misfit functions (e.g. Shen 2010;Liu et al. 2016;Louboutin et al. 2017;Tao et al. 2017). This type of misfit functions is not sensitive to absolute amplitudes of seismic data and emphasizes mainly the phase information. However, normalizing the traces increases the non-linearity in the inversion (Liu et al. 2016), which means a better initial model is required and the standard cycle-skipping criterion is not sufficient for a good convergence of inversion. Trace-normalized-misfit-based FWI has been widely studied (e.g. Shen 2010;Liu et al. 2016;Tao et al. 2017) and successfully applied on field seismic data (e.g. Morgan et al. 2013Warner et al. 2013. Unlike trace-normalizedmisfit-based FWI, trace-normalized-residual-based FWI takes the trace-by-trace normalized seismic residual as an adjoint source to construct the gradient directly and the corresponding objective function is built based on the adjoint source (Louboutin et al. 2017).
The amplitude variations of first arrivals are very sensitive to the velocity gradient. For example, the high velocity gradient at the boundary of seismic layers 2A/2B in the oceanic crust can create a triplication on the shot gathers (Nedimović et al. 2008), which have very large amplitudes. In the presence of a low-velocity zone, which may be caused by the presence of gas or fluid, there is a shadow zone on the shot gathers where the amplitudes are very weak (Huot & Singh 2018). At some oceanic spreading ridges (Vera et al. 1990), the existence of both seismic layers 2A/2B (high-velocitygradient zone) and a magma chamber (low-velocity zone) produces complicated seismograms. However, trace-by-trace normalization balances the horizontal energy distribution of seismic data, which means the AVO effect is neglected. Meanwhile, because the corresponding traces in the observed and synthetic data are normalized by different factors-each trace is normalized by its norm-the true amplitude difference between these two traces, which corresponds to the response of the velocity anomaly, is changed. On the other hand, since all the samples in a single trace are normalized by the same factor, the relative amplitude variation within each trace is preserved, that is the phase information is retained. Considering all these effects caused by trace-by-trace normalization, trace-normalized-based FWI may have different performance when compared to true-amplitude FWI, which may lead the inversion converge to a different final result.
In this paper, we focus on the comparison of true-amplitude and trace-normalized-residual-based FWI in the time domain. The difference in performance between these two inversion methods has not been investigated in detail using synthetic or field seismic data. The purpose of this work is to better understand the influence of trace-by-trace normalization on the inverted result. All the numerical examples in this paper are implemented using time domain elastic FWI workflow. The only difference between the workflows of these two inversion schemes is the calculation of the adjoint source. We compare the performance of these two inversion schemes on four laterally invariant models first in order to gain insight about the physics and we then apply them to a complex 2-D Marmousi model. The effects of unknown density, noise and attenuation are also considered.

True-amplitude FWI
True-amplitude FWI minimizes the LS difference between the modelled and observed seismic data (Tarantola 1984): where m is the model parameters vector, u and d are the synthetic and observed seismic data, respectively, Ns is the number of shot gathers and Nr the number of traces per source, 2 represents the L 2 norm of a vector. This optimization problem is usually solved by using gradientbased methods. The gradient of eq. (1) is obtained by taking the derivative of J 1 with respect to the lth model parameter where ∂u i, j ∂m l is the Fréchet derivative and T represents vector transpose. Eq. (2) can be effectively computed by zero-lag crosscorrelating the source generated forward-propagated wavefield and the adjoint source generated wavefield by back projecting the seismic residual (Tarantola 1984). The adjoint source for the backpropagation is the seismic residual defined as (3)

Trace-normalized-residual-based FWI
The adjoint source of trace-normalized-residual-based FWI is the difference between seismic data normalized trace by trace with the root-square values of the traces and can be written as follows Downloaded from https://academic.oup.com/gji/article-abstract/220/2/1421/5643916 by guest on 20 May 2020 Figure 1. Plots of true, initial and inverted velocity models of the inversion with a linear velocity gradient model using a good initial model. (Louboutin et al. 2017) Because changing the misfit function only influences the adjoint source, rather than the algorithm for calculating gradient (Brossier et al. 2010), the gradient of the misfit function corresponding to eq. (4) with respect to the lth model parameter is , the misfit function J 2 of tracenormalized-residual-based FWI can be written as Note that eq. (6) is not a pure trace-normalized misfit function but a combination of synthetic trace and trace-normalized residual in order to have the trace-normalized residual as the adjoint source.

N U M E R I C A L E X A M P L E S
In this section, we compare the behavior of true-amplitude and trace-normalized-residual-based FWI by applying them to synthetic seismic data. In all examples, only P-wave velocity is inverted. The S-wave velocity is computed from the true P-wave velocity using the relationships described in Shipp & Singh (2002) and is kept constant in inversion. The density is computed using the density-velocity relationship (Shipp & Singh 2002) and is updated in each iteration, except the true density used in the example of unknown density (Section 3.1.5). Pressure data recorded by streamer is used for inversion and the length of the streamer is designed to ensure that waves from the deepest parts of the model can be recorded. The excitation source is a Ricker wavelet with a dominant frequency of 4 Hz (the maximum frequency is approximately 10 Hz) and is used in all examples in this paper. We perform the same and sufficient iteration for true-amplitude and trace-normalizedresidual-based FWI in each test.

Numerical examples on laterally invariant models
In order to get a good insight on the data sensitivity for the two inversion methods, we first performed tests on laterally invariant (1-D) models. The synthetic data is modelled by solving the elastic-wave equation using a finite-difference method (Levander 1988) with an optimal absorbing boundary condition (Peng & Toksöz 1995) to attenuate the artificial reflections from all boundaries. Four laterally invariant models, that is linear velocity gradient model, highvelocity-gradient zone, low-velocity zone and the combinations of them, are considered, which describe the basic velocity structures observed in the Earth. The inversion scheme with only one shot gather (Pica et al. 1990) is applied to reduce the computational cost.

1-D linear velocity gradient model
The first model is a linear velocity gradient model with a 100m-thick water layer on the top. The true P-wave velocity model increases linearly from 1500 to 3600 m s -1 with depth from 100 to 1500 m ( Fig. 1, red line). The velocity gradient is higher than those observed in sedimentary basins, but is consistent with velocities observed for normal oceanic crust. We first invert the seismic data using a linear velocity gradient model that is very close to the true model, where the velocity varies from 1500 to 3400 m s -1 beneath the water layer ( Fig. 1, black line).
Both inversion methods recover the true model correctly ( Fig. 1), though true-amplitude FWI provides a slightly better accuracy deeper in the model. The comparisons of the true and initial synthetic seismic data before and after trace-by-trace normalization (Figs 2a and b, respectively) demonstrate that trace-by-trace normalization removes the AVO effects, and the seismic data at intermediate and far offsets are amplified. As there is no cycle-skipping between the two data sets, the inversions have converged to the global minimum.
In the second test, a poor P-wave velocity model is used as an initial model to invert the same true model shown in Fig. 1. The velocity of this poor model below the water layer linearly increases from 1500 to 2900 m s -1 (Fig. 3, black line). Starting from a poor initial model, true-amplitude FWI reconstructs the true model correctly (Fig. 3, green line), which is comparable with the result obtained using the good initial model (Fig. 1, green line). However, trace-normalized-residual-based FWI yields a result that is slightly better than the initial model but still far from the true model ( Fig. 3, blue line). The variations of misfit functions (Fig. 4) show that both misfit functions decrease rapidly at the beginning. However, the misfit of trace-normalized-residual-based FWI stops decreasing after six iterations, whereas that of true-amplitude FWI decreases to nearly zero. Figs 3 and 4 suggest that trace-normalizedresidual-based FWI converges to a local minimum when the starting model is not close to the true model, and that more iterations do not improve the inverted result.
As the initial model deviates from the true model significantly, the traveltime shifts of seismic data with offsets greater than about 2.5 km are larger than half of the period of the dominant wavelength in the source wavelet ( Fig. 5a), which means there is cycleskipping. Due to geometric spreading effect, traces at near offsets of the amplitude-preserved data have relatively stronger amplitude than those at far offsets. True-amplitude FWI preserves amplitudes, so that the high amplitudes of the seismic residual appear at near offsets (Fig. 5b). The signals at near offsets come from the shallow depth of the model, which means true-amplitude     FWI updates the velocity model from shallow to deep gradually. Once the velocity of the shallow layers is recovered correctly, the traveltime shifts between the observed and synthetic events at far offsets become smaller, which decreases the cycle-skipping at far offsets and benefits the velocity recovery at greater depths. On the other hand, trace-by-trace normalization suppresses the AVO effects and increases the amplitude of the seismic data at intermediate and far offsets (Fig. 5c), which correspondingly increases the magnitude of seismic residuals at far offsets (Fig. 5d), increases the non-linearity and hence the inversion converges to a local minimum.

1-D model with a high-velocity-gradient zone
In this section, a P-wave velocity model with a high-velocitygradient zone (Fig. 6, red line) is used as a true model to compare the difference between true-amplitude and trace-normalizedresidual-based FWI. The high-velocity-gradient zone can produce a triplication with large amplitudes on the seismic section. The strong velocity gradient occurs between 900 and 1100 m depth with velocity increasing from 2400 to 3500 m s -1 . Such a high velocity gradient could be present at sediment salt or carbonate interface or at Layer 2A/2B boundary in the oceanic crust. The initial P-wave velocity model is a linear velocity gradient model with a water layer above it (Fig. 6, black line).
Although no high-velocity-gradient zone is present in the initial model, both true-amplitude and trace-normalized-residual-based FWI yield inverted results which are much closer to the true model and include a high-velocity-gradient zone (Fig. 6). The velocities of layers above the high-velocity-gradient zone are correctly recovered using seismic data with and without trace-by-trace normalization. However, for the high-velocity-gradient zone and layers below it, the true-amplitude FWI correctly recovers their velocities and depths, while trace-normalized-residual-based FWI cannot completely retrieve their velocities and provides wrong depth and thickness of the high-velocity-gradient zone.
The amplitude of traces at far offsets is much weaker than that at near offsets on the amplitude-preserved seismic section and the triplication created by the high-velocity-gradient zone shows strong amplitude (Fig. 7a, in red). There is no triplication on the initial synthetic seismic data (Fig. 7a, in black). The strong amplitude of the triplication indicates that reconstructing this triplication correctly is Downloaded from https://academic.oup.com/gji/article-abstract/220/2/1421/5643916 by guest on 20 May 2020 Figure 7. (a) Amplitude-preserved true data (in red) and initial synthetic data (in black) for inversion using the model with a high-velocity-gradient zone. There is a triplication on the true data. (b) Trace-normalized true data (in red) and initial synthetic data (in black). (c) Trace-normalized true data (in red) and final synthetic data resulting from trace-normalizedresidual-based FWI (in black). Note the large amplitudes at far offsets for the trace-normalized data (b) and (c). (d) Comparison of final residual for true-amplitude (in red) and trace-normalized-residual-based (in black) FWI computed using data without trace-by-trace normalization plotted on the same scale. The amplitude ranges of the residuals are shown on the figure and the residual of true-amplitude FWI is nearly 10 times smaller than that of trace-normalized-residual-based FWI. critical for the recovery of the layers below. All traces after traceby-trace normalization have similar amplitude magnitude (Fig. 7b). Figs 7(a) and (b) demonstrate no cycle-skipping exists between the true and initial synthetic data. The phases of the true and final synthetic seismic data match well at all offsets and the triplication is recovered ( Fig. 7c) after performing trace-normalized-residual-based FWI, which excludes the inversion converging to a local minimum. Comparing the final seismic residuals computed using seismic data without trace-by-trace normalization (Fig. 7d), we can find that the amplitude (shown on Fig. 7) of the residual resulting from true-amplitude FWI is nearly ten times smaller than that of tracenormalized-residual-based FWI. The large amplitude of residual around triplication (Fig. 7d, in black) illustrates that the triplication is not completely reconstructed by using trace-normalized-residualbased FWI. The incomplete match of the triplication means that the high-velocity-gradient zone is not precisely recovered, and this leads to negative influence on the recovery of the layers below highvelocity-gradient zone, which explains the lower accuracy of the inverted result from trace-normalized-residual-based FWI (Fig. 6, blue line).

1-D model with a low-velocity zone
In this section, a P-wave velocity model with a low-velocity zone (Fig. 8, red line) is used for comparing the two inversion schemes. A low velocity zone could be created by the presence of gas, fluid and melt, which can create a shadow zone on seismic data with very weak amplitudes associated to these. The low velocity zone on the true model is nearly 800 m thick, starting from the depth of 550 m. The initial P-wave velocity model (Fig. 8, black line) contains a low-velocity zone with almost the same thickness and depth as the true model to avoid cycle-skipping, but the velocities of the low-velocity zones are different.
The inverted results of true-amplitude and trace-normalizedresidual-based FWI are shown in Fig 8. Comparing with tracenormalized-residual-based FWI, true-amplitude FWI provides a result with higher accuracy, which is very close to the true P-wave velocity model at all depths (Fig. 8, green line). The inverted Pwave velocity from trace-normalized-residual-based FWI is better than the initial model above the low velocity zone (Fig. 8, blue line). However, the upper part of the low velocity zone is poorly recovered and the velocity below the low-velocity zone is worse than the initial model. (d) Comparison of final residuals of true-amplitude (in red) and tracenormalized-residual-based FWI (in black) computed using data without normalization plotted on the same scale. The amplitude ranges of the residuals are shown on the figure and the residual of true-amplitude FWI is five times smaller than that of trace-normalized-residual-based FWI. Figure 10. Plots of true, initial and inverted P-wave velocity models for inversion using the P-wave velocity model with a high-velocity-gradient zone and a low-velocity zone.
The comparisons of true and initial synthetic data before and after trace-by-trace normalization (Figs 9a and b, respectively) demonstrate that there is no cycle-skipping between the two data sets. There is a shadow zone on both the true and initial synthetic data and the amplitude of the shadow zone is very weak on the trueamplitude seismic section (Fig. 9a). However, the weak amplitude is enhanced on the trace-normalized seismic section, indicating that the AVO effects are modified after trace-by-trace normalization. The traveltime shifts between the true and synthetic data are shortened and the phases of the first arrivals match well almost everywhere (Fig. 9c) after trace-normalized-residual-based FWI. Considering the weak amplitude of the shadow zone, the velocity of the lowvelocity zone is mainly constrained by the refractions coming from layers below the low-velocity zone. However, this constrain does not hold anymore because the refractions from layers below the low velocity zone are well matched in amplitude and phase after trace-by-trace normalization (Fig. 9c, pointed out by blue arrows). But the large amplitude of the final true amplitude seismic residual resulting from trace-normalized-residual-based FWI demonstrates that the true and final synthetic data are actually not completely matched in amplitudes (Fig. 9d, in black and is pointed by blue arrows) if no trace-by-trace normalization is applied. This means that the low accuracy of the result from trace-normalized-residual-based FWI is mainly caused by the loss of true amplitude differences.

1-D model with a high-velocity-gradient zone and a low-velocity zone
In the final test on a laterally invariant model, we compare the performance of the two inversion schemes using a P-wave velocity model containing a high-velocity-gradient zone and a low-velocity zone (Fig. 10, red line). The high-velocity-gradient zone starts from 750 to 975 m depth, with velocity increasing from 2500 to 4550 m s -1 . The low-velocity zone is 850-m-thick starting from the depth of 1750 m. Such a model reflects the oceanic crustal structure at fast and intermediate spreading centers and could also represent a salt body above fluids in sediments. There is no high-velocitygradient zone in the initial P-wave velocity model (Fig. 10, black line), but a low-velocity zone is added to avoid cycle-skipping.
Both methods yield final results that are very close to the true model (Fig. 10), which are much better than the initial model at all depths. Nevertheless, the inverted velocity model resulting from trace-normalized-residual-based FWI (Fig. 10, blue line) is more oscillatory than that obtained from true-amplitude FWI (Fig. 10, green line) above the high-velocity-gradient zone. In the low velocity zone, true-amplitude FWI almost completely recovers its velocity and depth, while trace-normalized-residual-based FWI only correctly retrieves the velocity of the lower part and gives an incorrect depth for the bottom of the layer. Due to the incomplete recovery of the low-velocity zone using seismic data after trace-bytrace normalization, the velocities below the low-velocity zone are less accurate than those resulting from true-amplitude FWI.
Although the velocity is not completely recovered by using tracenormalized-residual-based FWI (Fig. 10, blue line), the true and final synthetic data after trace-by-trace normalization match well everywhere (Fig. 11a) including triplication (Fig. 11a, pointed out by blue arrows) and shadow zone (Fig. 11a, pointed out by blue circle). This means that the low accuracy of trace-normalized-residualbased FWI is caused by amplitude change. The large amplitude of the last adjoint source (Fig. 11b, pointed out by red circle) of trace-normalized-residual-based FWI appears at far offsets, which will mainly update the velocity of layers below the low-velocity zone. However, the incomplete recovery of the low-velocity zone will lead to an incorrect update of the velocity. The weak amplitude at the near offset on the adjoint source (Fig. 11b, pointed out by blue arrows) demonstrates that the oscillation of the velocity above high-velocity-gradient zone cannot be removed, and this will have a negative influence on the velocity recovery of layers below the high-velocity-gradient zone. This example demonstrates that traceby-trace normalization suppresses the true-amplitude difference, which changes the updating order of the model and prevents full recovery of the velocity.

1-D model with unknown true density
Until now it was assumed the true and inverted densities follow the same velocity-density relationship. However, in practice, the true density is unknown and does not perfectly satisfy a velocitydensity relationship, such as the Gardner's law (Gardner et al. 1974) or its variant (Shipp & Singh 2002). In order to compare the performance of the two inversion schemes, we perform inversions on the Figure 12. (a) Plots of true, initial and inverted P-wave velocity models for inversion using the model with a high-velocity-gradient zone. Density is updated using density-velocity relationship during inversion. (b) Comparison of the true density and densities estimated from density-velocity relationship (Shipp & Singh 2002) using the initial and final inverted Pwave velocities. In this example, the true density does not follow the same density-velocity relationship used in inversion.
1-D high-velocity-gradient model where the true and initial P-wave velocity models (Fig. 12a) are the same as that shown in Fig. 6 but the true density does not follow Gardner's law or its variant (Fig. 12a, red line). During the inversion, the density is updated in each iteration using the density-velocity relationship from the inverted P-wave velocity (Shipp & Singh 2002). The comparisons of different density models (Fig. 12b) shows that both the densities calculated from initial and inverted P-wave velocities are significantly different from the true density, because they do not satisfy the same velocity-density relationship.
The final inverted P-wave velocity models of true-amplitude and trace-normalized-residual-based FWI are shown in Fig. 12(a). Both inversion methods yield final models that are much better than the initial model at almost all depth. Just below the seafloor, the velocity obtained from true-amplitude FWI is higher than the true model, which is caused by the incorrect density (Fig. 12b, green line) around seafloor (or incorrect reflectivity around seafloor). However, the true-amplitude FWI provides better accuracy in recovering the highvelocity-gradient zone (from 900 to 1100 km in the true velocity model). The low accuracy of the trace-normalized-residual-based FWI is caused by the loss of the true-amplitude difference due to trace-by-trace normalization. This example demonstrates that trueamplitude FWI outperforms trace-normalized-residual-based FWI even when the true and inverted densities do not satisfy the same density-velocity relationship.

Numerical example on Marmousi model
For the tests on laterally invariant velocity models, true-amplitude FWI provides inverted results with higher accuracy than tracenormalized-residual-based FWI. To better compare the difference between the two inversion schemes, we apply them on the Marmousi model (Fig. 13a), which has a complex velocity structure. 32 sources uniformly deployed at 325 m interval are used for inversion. The pressure wavefield is recorded by a 7-km-long streamer and the first and last source-streamer configurations are shown in Fig. 13(a). This geometry provides good data coverage for the model within horizontal distances between 5 and 14 km. The initial velocity model is created by smoothing the true model with a Gaussian filter (Fig. 13b). The water layer is not updated during inversion. As before, the S-wave velocity is kept constant and the density is updated using density-velocity relationship (Shipp & Singh 2002) during inversion.
The accuracy of the inverted results is evaluated using the mean absolute percentage error (MAPE, Liu et al. 2016) where m true and m inv are the true and inverted models, respectively. N is the overall grid numbers in the discretized model m true . | | is the absolute value operator. Only velocity profiles within horizontal distances of 5-14 km are used to compute MAPE considering the data coverage. The MAPE of the initial model (Fig. 13b) is 6.93 per cent. Both inversion methods yield velocity structures that are close to the true Marmousi model after 200 iterations (Figs 14a and  b), where the layers and faults are much clearer when comparing with the initial model. The MAPEs of the inverted results of true-amplitude (Fig. 14a) and trace-normalized-residual-based FWI (Fig. 14b) are 4.86 and 5.20 per cent, respectively, which are decreased when compared with the initial MAPE of 6.93 per cent, even the decrease seems marginal. However, the smaller MAPE of Fig. 14(a) suggests that true-amplitude FWI provides higher accuracy than trace-normalized-residual-based FWI. The comparisons of 1-D velocity profiles located at horizontal distance of 10.5 and 12 km (Figs 14c and d) show that most of the structures above 2.5 km depth are recovered after inversion using data without or with trace-by-trace normalization. The inverted velocities below 2.5 km have lower accuracy, which could be due to the acquisition aperture and geometric spreading of wave propagation. Compared to trace-normalized-residual-based FWI, true-amplitude FWI gives better recovery of the velocities where strong velocity contrast exists, for example between 2 and 2.5 km depth in Figs 14(c) and (d).
The final seismic residual of true-amplitude FWI computed using amplitude-preserved seismic data is slighter smaller than that resulting from trace-normalized-residual-based FWI (amplitude values are shown on Fig. 15a), which proves the higher accuracy of the inverted result from true-amplitude FWI. The seismic residual of true-amplitude FWI shows strong amplitudes at near and intermediate offsets, which suggest that the inversion algorithm would refine the shallow and intermediate depths of the model as the inversion proceeds. However, the strong amplitude of the last adjoint source of trace-normalized-residual-based FWI (Fig. 15b, pointed by blue circle) occurs at intermediate and far offsets, and traces at near offsets (<2 km) have weak amplitudes, which means that tracenormalized-residual-based FWI will update the deep part of the model. Considering the incomplete recovery of the shallow depth of the model, trace-normalized-residual-based FWI cannot update the velocity of deeper layers correctly. This example demonstrates that trace-normalized-residual-based FWI has lower accuracy because trace-by-trace normalization has removed the true-amplitude difference.

Numerical examples on noisy and attenuated seismic data
Real seismic data are usually contaminated by environmental noises and these noises cannot be modelled using the elastic wave equation. To better compare the performance of the two inversion schemes, we apply inversions on noisy seismic data. The noisy seismic data with a signal-to-noise ratio of 20 dB are produced by adding Gaussian noise to the previous synthetic seismic data for the Marmousi model.
The noise before first arrivals are muted to avoid undesired artefacts. The initial velocity model for the inversion is the same as that for the noise-free data (Fig. 13b).
The final results of the two inversion schemes after 200 iterations are shown in Fig. 16. The main structures of the Marmousi model between horizontal distances of 5-14 km are recovered where the data coverage is good. The MAPEs related to Figs 16(a) and (b) are 5.00 and 5.39 per cent, respectively, which are smaller than that of the initial model. But the MAPEs of results from noisy data are larger than those from noisy-free data, which demonstrates that noise in the seismic data decreases the accuracy in both inversion methods. Similar to the case using noisy free data, true-amplitude FWI provides higher accuracy than trace-normalized-residual-based FWI. The comparisons of 1-D velocity profiles at locations 10.5 and 12 km (Figs 16c and d) show that both inversion schemes provide an accurate velocity estimation above 1.5 km depth. The better recovery of velocity between 1.5 and 2.5 km validates the higher accuracy of true-amplitude FWI. These results demonstrate that trace-by-trace normalization cannot remove the negative effects of noise on the inverted result.
However, the Earth is not perfectly elastic, and the anelastic properties of the subsurface will attenuate the seismic wave during propagation due to energy loss and phase distortion. The attenuation effects have significant impact on the result when the elastic waveform inversion method ignoring attenuation is used to invert the strong attenuated seismic data. To better compare the two inversion schemes, we invert the attenuated synthetic seismic data using the elastic FWI workflow, in which no attenuation is involved in the inversion. The attenuated synthetic seismic data are modelled on the Marmousi model (Fig. 13a) and the attenuation property is described by introducing Q model (Fig. 17) into the wave propagation. There is no attenuation in water. The attenuated seismic data are computed by solving the time-domain viscoelastic wave equation with finite difference method (Robertsson et al. 1994). We invert the attenuated seismic data using elastic FWI workflow, in which no attenuation is involved. The smoothed model shown in Fig. 13(b) is used as the starting model for inversion. respectively, which means the inverted result from true-amplitude FWI is better and both inverted results are worse than that resulting from attenuation-free and noise-free seismic data (Fig. 14). Figs 18(c) and (d) compare two 1-D velocity profiles at horizontal distances of 10.5 and 12 km. The final velocity profiles resulting from the two inversion methods are better than the initial model. True-amplitude FWI gives better recovery for depth between 1.5 and 2.5 km, where the velocity contrast is strong. This demonstrates trace-normalized-residual-based FWI does not have better capability than true-amplitude FWI for dealing with attenuation of the seismic data. The worse accuracy of the results inverted from the attenuated seismic data demonstrates the importance of the precise descriptions of wave propagation used in FWI.

C O N C L U S I O N
We have compared the performance of true-amplitude FWI and trace-normalized-residual-based FWI by inverting synthetic seis-mic data using time domain elastic FWI workflow. The only difference between the two inversion schemes is the computation of the adjoint source. The adjoint source of true-amplitude FWI is obtained using the amplitude-preserved seismic data, while that of trace-normalized-residual-based FWI is calculated using seismic data after trace-by-trace normalization.
The numerical examples show that trace-normalized-residualbased FWI has more risk of getting trapped in a local minimum, which means it requires a more accurate initial model to ensure the convergence of inversion. The higher non-linearity is caused by the enhancement of seismic data at far offsets, which is usually related to the deeper parts of the model, tending to be cycle-skipped.
The comparisons of the inverted results and seismic residuals demonstrate that trace-by-trace normalization decreases the accuracy of FWI, even though the initial model is good enough to ensure the inversion converges to the global minimum. This is caused by the true amplitude loss of the seismic residual after trace-by-trace normalization.   The better velocity recovery performance of true-amplitude FWI on noisy and attenuated seismic data demonstrates that tracenormalized-residual-based FWI does not necessarily mitigate the effects of unknown density, noise in the data or poor physics used for modelling. This suggests that trace-normalized-residualbased FWI is not a reliable alternative to true-amplitude FWI when inverting real seismic data, as suggested by Louboutin et al. (2017), especially when waveform inversion results are used for lithological interpretation (e.g. Singh et al. 1998;Huot & Singh 2018).

A C K N O W L E D G E M E N T S
The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC Advance Grant agreement no 339442 TransAtlanticILAB. The numerical examples with the Marmousi model are performed on the S-CAPAD platform of IPGP, France. We would like to thank Editor Jean Virieux, Dr Oscar Calderon Agudo and another anonymous reviewer for their constructive comments.