Global seismic attenuation imaging using full-waveform inversion: A comparative assessment of different choices of misfit functionals

SUMMARY We present the results of synthetic tests that aim at evaluating the relative performance of three different deﬁnitions of misﬁt functionals in the context of 3-D imaging of shear wave attenuation in the earth’s upper mantle at the global scale, using long-period full-waveform data. The synthetic tests are conducted with simple hypothetical upper-mantle models that contain Q μ anomalies centred at different depths and locations, with or without additional seismic velocity anomalies. To build synthetic waveform data sets, we performed simulations of 50 events in the hypothetical (target) models, using the spectral element method, ﬁltered in the period range 60–400 s. The selected events are chosen among 273 events used in the development of radially anisotropic model SEMUCB-WM1 and recorded at 495 stations worldwide. The synthetic Z -component waveforms correspond to paths and time intervals (fundamental mode and overtone Rayleigh waves) that exist in the real waveform data set. The inversions for shear attenuation structure are carried out using a Gauss–Newton optimization scheme in which the gradient and Hessian are computed using normal mode perturbation theory. The three different misﬁt functionals considered are based on time domain waveform (WF) and waveform envelope (E-WF) differences, as well as spectral amplitude ratios (SA), between observed and predicted waveforms. We evaluate the performance of the three misﬁt functional deﬁnitions in the presence of seismic noise and unresolved S -wave velocity heterogeneity and discuss the relative importance of physical dispersion effects due to 3-D Q μ structure. We observed that the performance of WF is poorer than the other two misﬁt functionals in recovering attenuation structure, unless anelastic dispersion effects are taken into account in the calculation of partial derivatives. WF also turns out to be more sensitive to seismic noise than E-WF and SA. Overall, SA performs best for attenuation imaging. Our tests show that it is important to account for 3-D elastic effects (focusing) before inverting for Q μ . Additionally, we show that including high signal-to-noise ratio overtone wave packets is necessary to resolve Q μ structure at depths greater than 250 km.


808
H. Karaoglu and B. Romanowicz agreement. While some recent models, built by following the same procedure as proposed by Dalton & Ekström (2006) with varying data sets and approximate forward modelling techniques (Dalton & Ekström 2006;Bao et al. 2016;Ma et al. 2016), show improved agreement between each other (Dalton et al. 2017), the agreement with models built using different procedures is still poor even at the longest wavelengths (e.g. Adenis et al. 2017). This is primarily due to the difficulty of separating elastic and anelastic effects in the measurement of seismic wave amplitudes.
The anelastic nature of earth materials manifests itself through amplitude decay (attenuation) and velocity dispersion of seismic waves travelling through them. The main challenges with isolating the attenuation from seismic wave amplitudes are that: (i) Amplitudes are sensitive to second transverse gradients in the 3-D elastic structure (e.g. Woodhouse & Wong 1986;Romanowicz 1987), which causes de-/focusing (local de-/amplification), and, scattering at higher frequencies, (ii) the amplitude response of seismographs is not as accurately known as their phase response, (iii) the earthquake source mechanisms and seismic moments are not perfectly known, introducing possible biases in the amplitudes.
Attenuation imaging studies can be grouped into three categories based on the period range they consider. The first group uses short-period body waves (e.g. Bhattacharyya et al. 1993Bhattacharyya et al. , 1996Matheney & Nowack 1995;Bhattacharyya et al. 1996;Warren & Shearer 2002;Lawrence & Wysession 2006;Hwang et al. 2011) and relies on time or frequency domain differential waveform analysis (e.g. Kanamori 1967;Teng 1968), which has the advantage of suppressing uncertainties related to the sources and receivers. They also apply stacking and averaging to address the challenges listed above.
The second and more common approach is to use intermediateand long-period surface waves. Some of the corresponding studies try to minimize focusing effects by combining data from consecutive Rayleigh wave trains and overcome source and receiver uncertainties through careful data selection (e.g. Romanowicz 1990Romanowicz , 1994Romanowicz , 1995Durek et al. 1993). Others ignore focusing after a careful data selection by reasoning that its effect is negligible for long wavelength structures (e.g. Selby & Woodhouse 2000;Gung & Romanowicz 2004). Some others try to account for focusing using asymptotic methods (e.g Dalton & Ekström 2006;Dalton et al. 2008;Bao et al. 2016;Ma et al. 2016). The third group of attenuation imaging studies relies on the decay of amplitudes of free oscillations with time (e.g. Roult 1975;Sailor & Dziewonski 1978;Masters & Gilbert 1983;Romanowicz et al. 1987;Roult et al. 1990;Widmer et al. 1991). The advantage of these studies is that they are insensitive to source and receiver site uncertainties. However, normal mode studies require records from earthquakes large enough to excite Earth's free oscillations. This limits the available data and, combined with the long wavelength sensitivity of normal modes, renders normal-mode-based methods less suitable for high-resolution 3-D global attenuation imaging.
Using surface and body wave amplitude data, focusing effects need to be accounted for to attain higher resolution attenuation mapping. The focusing effects are usually calculated for an elastic 3-D model, which was built assuming a 1-D attenuation model. For these computations, methods based on linear approximations (e.g. Woodhouse & Wong 1986;Park 1987;Romanowicz 1987;Zhou et al. 2004) are generally used. For example, Dalton et al. (2014) and Bao et al. (2016) show a comparison between the exact ray theory and finite frequency theory used for attenuation imaging.
The state-of-the-art for computing elastic effects at the global scale is to employ the spectral-element method (SEM) (Komatitsch & Vilotte 1998) for seismic wavefield computations. Although computationally more expensive than conventional methods, SEM provides higher accuracy in estimating focusing and scattering effects. Recently, Zhu et al. (2013) applied a waveform inversion methodology based on SEM and adjoint kernels to image upper mantle attenuation structure in Europe.
A full-waveform inversion methodology based on using normal mode perturbation theory both for the forward and inverse modelling steps was introduced by Li & Romanowicz (1995). The approach was first applied to elastic shear wave imaging (Li & Romanowicz 1996;Mégnin & Romanowicz 2000;Gung et al. 2003;Panning & Romanowicz 2006) and later extended to shear attenuation imaging of the upper mantle (Romanowicz & Gung 2002;Gung & Romanowicz 2004). More recently, as accurate numerical wavefield computations have become accessible, SEM was adopted for the forward modelling of the wavefield. Employing the SEMbased approach, long-period surface wave and overtone waveforms were used to build global shear velocity models for the upper mantle (Lekić & Romanowicz 2011;French et al. 2013). Later, French & Romanowicz (2014) added body waveforms to extend the model to the whole-mantle. In these studies, a smoothed version of the 1-D attenuation model QL6 (Durek & Ekström 1996) was fixed throughout the iterative inversion process. We are now in a position to extend this method to global attenuation imaging.
The selection of a suitable misfit functional, that quantifies the differences between the data and the synthetic seismograms, is a critical step in any inverse modelling problem. For attenuation imaging, it is desirable to work with the misfit functionals that prioritize the attenuation fingerprint in the seismic waves. Most global attenuation studies to date rely on the analysis of amplitudes in the spectral domain. These analyses are based on measuring the decay rate of normal mode peaks (e.g. Roult & Clévédé 2000) or minimization of the misfit between synthetic and observed spectral amplitudes of individual surface wave trains (e.g. Dalton & Ekström 2006;Zhu et al. 2013;Ma et al. 2016) or body wave differential amplitudes (e.g. Bhattacharyya et al. 1996;Warren & Shearer 2002). In contrast, Gung & Romanowicz (2004) chose to minimize the individual waveform differences in the time domain, after careful selection of data to verify close enough phase alignment. In addition to these, misfit functionals that rely on instantaneous phase matching through Hilbert transform of seismograms have also been suggested for attenuation imaging and used for some regional studies (e.g. Tonn 1991;Matheney & Nowack 1995). More recently, Fichtner et al. (2008) suggested misfit functionals based on timeand frequency-dependent phase and envelope misfit of timefrequency transforms of seismograms allowing a complete analysis of the seismograms for imaging. Additionally, Bozdag et al. (2011) suggested the use of misfit functionals that combine the instantaneous phase and envelope misfit in the time domain. These recent studies advocate the use of envelope misfit for attenuation imaging.
This study focuses on the selection of a proper misfit functional for attenuation imaging within the scope of a full-waveform inversion methodology that relies on the comparison of individual energy wave packets. In this paper, we carry out a comparative assessment of three misfit functionals-time-domain waveform comparison (as was employed by Gung & Romanowicz (2004)), time-domain envelope and spectral amplitude ratios, using our full-waveform inversion method for attenuation imaging. We have evaluated these approaches through synthetic tests, targeting to recover heterogeneous hypothetical models. In these tests, our data set consists of SEMcomputed seismograms in the hypothetical target models, which we will refer to as 'synthetic data' in what follows. For the seismograms computed in the starting models or in models obtained by inversion of the synthetic data, on the other hand, we will use the expressions 'synthetic seismograms' or 'synthetic waveforms '. In what follows, we first introduce the waveform inversion method and the misfit functionals to be tested. Second, we present synthetic tests, beginning with a test in a radially symmetric (1-D) model, in order to evaluate the accuracy of the methodology. Then, we present tests based on a synthetic data set computed in simple 3-D heterogeneous attenuation models with and without noise, and in the presence or absence of unresolved velocity heterogeneities. In all cases presented, we invert only for 3-D Q μ structure, whether or not the 3-D elastic model is known. In particular, we test the effect of unknown elastic structure on the retrieved 3-D Q μ structure.

M E T H O D O L O G Y
We utilize a probabilistic iterative inversion scheme, as originally proposed by Tarantola (1984) and apply it in the context of a hybrid full-waveform inversion approach, which combines the accuracy of seismic wavefield computations using SEM with the efficiency of Nonlinear Asymptotic Coupling Theory (NACT) used for partial derivative computations (Li & Romanowicz 1995).
At each iteration, we minimize the misfit functional given by: where C D is the data covariance matrix that weights the data to account for the non-uniform distribution of ray paths and balances contributions from high-and low-amplitude wave packets (Li & Romanowicz 1996). C M is the covariance matrix in model space that addresses the resolution level and is defined through the introduction of correlation lengths (Lekić & Romanowicz 2011), α is a regularization parameter that is a scaling factor applied to the a priori variance of the model, and m k and m 0 are the kth iteration and reference models respectively. (u, d) is the functional quantifying the difference between the synthetics (u) and data (d) that we will investigate further. During the inversion process, the model space is updated iteratively through small perturbations (δm k ) as follows: where the subscript k denotes the iteration number. We update the partial derivative matrix G k , the model m k and the synthetics u k (waveforms or spectral amplitudes) computed for the current model iteratively.

Model parametrization
In recent transversely isotropic seismic velocity models built in our group (Lekić & Romanowicz 2011;French & Romanowicz 2014) and in some other studies (e.g. Auer et al. 2014;Moulik & Ekström 2014;Chang et al. 2015), the physical model is parametrized in terms of Voigt average isotropic S-wave velocity (V Siso ) and radial anisotropy parameter (ξ = V 2 S H /V 2 SV ). A transverse isotropic medium requires five independent elastic moduli. These are reduced to two by introducing scaling factors that relate the Voigt average P-wave velocity and density perturbations to the S-wave velocity Laterally, spherical splines are defined over quasi-equidistant nodes. For the synthetic tests presented in this paper, we employed a level 4 spherical spline parametrization (Wang & Dahlen 1995), with node spacing of ∼9 • . perturbations, and perturbations in anisotropy parameters φ and γ to the perturbation in radial anisotropy parameter ξ . Further details on this approach and the underlying physical background can be found in appendix A of Panning & Romanowicz (2006).
In our inversion scheme, the model space is parametrized radially in b-splines (B i (r)) and laterally in spherical splines (H j (φ, θ )) (e.g. Wang & Dahlen 1995) as expressed by Eq. (4) defines a continuous model represented by a set of spline coefficients (m ij ). As shown in Fig. 1, we use 20 b-splines with knots distributed from the core-mantle boundary to the shallowest Moho depth of 30 km, as in model SEMUCB-WM1 (French & Romanowicz 2014). The radii of the knots are 3480,3600,3775,4000,4275,4550,4850,5150,5375,5575,5750,5900,6050,6100,6150,6200,6250,6300,6346 and 6361 km. The knots become denser closer to the surface, as relatively higher radial resolution is expected there. Laterally, we use a grid with nodes located approximately 9 • apart. This is fine enough to recover heterogeneities of size >∼2700 km, which is 810 H. Karaoglu and B. Romanowicz more-or-less equivalent to the distance covered by three neighbouring spherical spline nodes along a given direction.

Workflow
Our workflow consists of four major steps: (1) Forward modelling. The synthetic seismograms are computed using the global earthquake simulator, developed by Capdeville et al. (2003) (hereinafter referred to as CSEM). CSEM couples computationally expensive SEM in the heterogeneous mantle with normal mode computations in the core through a Dirichlet-to-Neumann operator.
(2) Wave-packet 'picking'. We select wave packets that correspond to time windows with significant seismic energy arrivals through a comparison of data and synthetic waveforms, where the latter are computed for an iteratively updated 3-D model. The wave packets are selected automatically through a procedure that ensures high signal-to-noise ratio, and avoidance of cycle-skipping (e.g. Li & Romanowicz 1996).
(3) Sensitivity kernels. The normal mode-based sensitivity kernels are computed in the framework of NACT (non-linear asymptotic coupling theory, (Li & Romanowicz 1995)). Improving on the path average approximation (PAVA; Woodhouse & Dziewonski 1984;Romanowicz 1987) that accounts for the coupling of mode multiplets along the same branch, NACT also includes acrossbranch coupling of multiplets. This leads to 2-D sensitivity kernels that bring out the sensitivity of body waves centred on the ray path, an important consideration also for surface wave overtones (e.g. Mégnin & Romanowicz 1999).
(4) Inversion. The predefined misfit functional is minimized through an iterative Gauss-Newton optimization scheme, in which the matrix (G T k C −1 D G k + αC −1 M ) (see eq. 2) is inverted using the 'scalapack' package.
The normal mode-based sensitivity kernels are updated at each iteration for the 3-D model. As argued in Lekić & Romanowicz (2011), and because our kernels already capture the effects of heterogeneities well to first order, the accuracy of the kernels is of second order importance compared to the accuracy of the misfit functional computation. Computing an approximate Hessian efficiently allows us to employ a fast converging Gauss-Newton inversion scheme. At the long wavelengths considered, the efficiency of the inversion methodology outweighs the errors in the kernels.

NACT extension to attenuation imaging
The sensitivity kernels are computed using NACT, which is described in detail by Li & Romanowicz (1995), for the elastic case. Here, we briefly explain how we extend it to the anelastic case assuming a frequency independent absorption band Q model as proposed by Liu et al. (1976) and Kanamori & Anderson (1977). While there is evidence for frequency dependence of attenuation in the earth (e.g. Anderson & Hough 1984;Lekić et al. 2009), the assumption of constant Q is sufficient in the frequency band considered here. In this case, the shear and bulk moduli of an anelastic medium can be represented as complex parameters: where f 0 is the reference frequency at which the shear and bulk moduli are denoted as μ 0 and κ 0 respectively. Modelled as independent of the frequency, Q μ is the shear quality factor and Q κ is the bulk quality factor. The resulting expressions for synthetic seismograms and the partial derivative computations using normal mode perturbation theory are given in appendix A, where it is presented that Q μ and Q κ heterogeneities lead to perturbations not only in the multiplet decay rate but also in the frequency. The latter gives rise to physical/anelastic dispersion, thus phase anomalies. With the absorption band model, the influence of the physical dispersion on the multiplets depends on the definition of the reference frequency (f 0 ). Traditionally, this frequency is set to 1 Hz as originally recommended by Kanamori & Anderson (1977). The seismic velocity models such as PREM of Dziewonski & Anderson (1981), AK135 of Kennett et al. (1995) and SEMUCB-WM1 of French & Romanowicz (2014) were all built with reference frequency set to 1 Hz. Choosing different values of f 0 requires significant redefinition of Q μ profiles used in previously built anelastic models (Oki et al. 2004). In accordance with the traditional approach, in this study, we set f 0 to 1 Hz.
It is common practice to neglect the bulk attenuation and consider Q −1 μ as dominant in the mantle. In accordance with this, we neglect Q −1 κ in our models and inversions, reducing our physical model space to three parameters: V Siso , ξ and Q −1 μ .

Misfit functionals
Anelastic tomography requires isolating the attenuation fingerprint in seismic records as mentioned earlier.
To that end, the definition of the misfit functional, which defines the type of difference between the data (d(t)) and synthetic waveforms (u(t)), is critical. In this section, we present the three misfit functionals tested for attenuation imaging, defined as: Eq. (7) defines what we call the 'waveform misfit functional' (WF). WF is a discrete function of time and computed at each time step used in sampling the waveforms within the selected timewindows. In this form, the misfit is very sensitive to the phase of waveforms, and therefore to elastic heterogeneity (V S , ξ ). Eq. (8) defines the envelope misfit functional (E-WF) which is also defined in the time domain, and requires the computation of the Hilbert transform (H(t)) and the amplitude of the analytical function for the original waveform. This prioritizes the amplitude and suppresses the phase differences at the expense of losing some of the high frequency content in the waveforms. Finally, eq. (9) defines the misfit functional based on the spectral amplitude ratio (SA). SA is defined in the frequency domain at a set of discrete frequencies. As the phase is naturally separated from the amplitude in the spectral domain, we expect this third misfit functional to be less sensitive to elastic effects.
For the three misfit functionals considered, the partial derivative matrix (G k ) is built iteratively for the model m k , in accordance with the definition of (u, d) . For WF, it simply consists in the partial derivatives of time-domain synthetic waveforms with respect to model space parameters. For E-WF, the G matrix (we drop the iteration number k for simplicity) takes the form and for SA: where U(ω) represents the Fourier transform of the original waveform, with and denoting the real and imaginary parts. In all these computations, we employ NACT for the computation of the partial derivatives (∂u(t)/∂m, ∂H(u(t))/∂m, ∂|U(ω)|/∂m) and SEM for synthetic waveforms/spectral amplitudes (u(t), Among the misfit functionals we have tested for attenuation imaging, SA is computationally the most expensive one. This comes from working with selected seismic phase time windows (wave-packet picking). In the time domain, this procedure is equivalent to multiplying the full time history (f(t)) by a boxcar function (w(t)) that is non-zero from the beginning of wave packet (t 0 ) to the end of it (t 0 + T w ) as in eqs (12) and (13). In the spectral domain, this procedure corresponds to circular convolution as in eq. (14). The computation of the Fourier transforms and the circular convolution makes SA more expensive:

S Y N T H E T I C T E S T S
In this section, we present the results of several synthetic tests based on simple hypothetical earth models, aimed at evaluating the performance of the three misfit functionals defined in the previous section, with the goal of guiding the development of a suitable methodology for application to real data.

1-D synthetic tests
First, we test the accuracy of the multiplet frequency and attenuation perturbation computations introduced in Section 2.3, using our model parametrization, for two spherically symmetric (1-D) anelastic models. In these tests, for the elastic model (V S , ξ ), we used the 1-D structure of SEMUCB-WM1 without any 3-D perturbations.
To build the target shear attenuation models, we considered a 1-D, spherically symmetric Q μ model, where the values are set to 165 in the upper mantle (from surface to the 660 km discontinuity). The lower mantle Q μ model is as in Durek & Ekström (1996). This model is used as starting model in the following tests. To obtain the target model, we added smoothly varying 1-D (spherically symmetric) perturbations to the upper-mantle part of this model. For simplicity, in all the tests, we keep the crustal model and Moho depth fixed. We consider two hypothetical 1-D models, one with positive and the other with negative Q −1 μ perturbations (Fig. 2). As we observed

812
H. Karaoglu and B. Romanowicz from our preliminary tests (not shown) that the inversions using our method cannot recover zero-degree Q −1 μ anomalies of amplitude larger than 50 per cent in one iteration, we set the peak anomaly amplitudes in this test to 45 per cent so that we can limit the number of our iterations to one.
The quality factors (Q k ) and frequencies (ω k ) of normal mode multiplet k = (l, n), where l is the angular order and n the radial order of the multiplet, can be computed exactly for any 1-D model using normal mode theory (e.g. Woodhouse 1988). Having calculated them exactly for the starting and target (Q μ perturbed) models, we next calculated the ω k 's and Q k 's in the target model approximately, using first order perturbation theory applied to the starting model. For this calculation, we expanded the perturbation in the Q −1 μ models using our spline model parametrization (Section 2.1).
Next, we chose several different paths with lengths between 60 o -120 o and computed the corresponding Q k and ω k perturbations. This is done to test the approximations introduced from using perturbation theory as well as errors stemming from the spline parametrization. The results are presented in Figs 3 and 4. The match between the two computations is almost perfect both for Q k and ω k with slight differences due to imperfect Q μ model representation using our spline basis. Figure 3. The quality factors of spheroidal and toroidal normal modes with periods longer than 60 s for the first three mode branches (n = 0, 1, 2). The direct Q k computations, where k is the normal mode index, for the unperturbed (black solid) and perturbed models (red and blue solid) are compared with those predicted by normal mode perturbation theory (dashed lines), computed for a low Q and a high Q path in the model, as a function of angular order. The colour coding is the same as in Fig. 2.   Figure 4. The eigenfrequency perturbations (δf k ) of the spheroidal and toroidal normal modes with periods longer than 60 s with respect to the starting model for the first three mode branches (n = 0, 1, 2). The direct f k computations for the unperturbed (black solid) and perturbed models (red and blue solid lines) are compared with those predicted by the normal mode perturbation theory (dashed lines), computed for a low Q and a high Q path in the model, as a function of angular order. The colour coding is the same as in Fig. 2. We now consider a 1-D Q μ profile recovery test using the PAVA approximation for each of the perturbed models. For this simple problem, we expect the PAVA approximation to work well, even though the perturbation is large, because PAVA includes multiple forward scattering (see appendix A) and can accurately represent a constant perturbation over large distances (in contrast to the Born approximation, e.g. Romanowicz et al. 2008;Panning et al. 2009). For the inversion, we simulated a single event (2009/10/24, 14:40:43.7, M w :7.0, Banda Sea) with a hypocentral depth of 154.6 km and chose a single Z-component record of the first arriving Rayleigh wave train (R 1 ) recorded at station ANTO at a distance of ∼100 • (Fig. 5). As expected, the model with higher attenuation leads to faster amplitude decay and positive time delay with respect to the reference model, whereas the model with lower attenuation leads to the opposite.
Using the three definitions of misfit functionals presented in Section 2.4, we carried out inversions for the Q μ structure along the chosen path, starting with the unperturbed model, with and without accounting for anelastic dispersion in the computation of the Fréchet derivatives (see appendix A). The inversions are carried out for the upper mantle and the transition zone, including only the top 10 b-splines (30-621 km depth range). For the synthetic waveforms used in the inversions, the sampling rate is 10 s, to ensure a dense enough sampling for the targeted period range of 60-400 s. It is 0.1 mHz for inversions in the frequency domain, as the corresponding frequency range is 2.5-16.6 mHz.
In our inversions, the regularization is defined with correlation lengths. Following French & Romanowicz (2014), in the followings tests, the radial correlation lengths have varying values from 50 km at the shallowest depths to 200 km around 600 km depth. The radial correlation lengths are defined in accordance with the distance between neighbouring radial nodes and following insights from preliminary tests. The list of chosen radial correlation lengths for the targeted 10 b-splines beginning with the bottom one is 200,150,100,50,50,50,50,50,50,50 km. Laterally, we fixed the correlation lengths to 2400 km, which is approximately three times the mean nodal distance laterally (see Section 2.1). The inversions were carried out with decreasing norm damping parameter (α in eq. 1) until the recovered model showed unrealistically large fluctuations that first appear in depth and next laterally, eventually destabilizing all the model space. The best models recovered in one iteration, using the smallest regularization parameter among the predefined range of values that preserve the stability of the inversions, are presented in Fig. 6 for each case.
WF is quite sensitive to anelastic dispersion, as expected from its sensitivity to the phase difference between data and predicted synthetic seismograms. We cannot recover the depth range of the attenuation perturbations using WF in the absence of anelastic dispersion. E-WF and SA work equally well whether we include anelastic dis-persion or not. However, for the E-WF, if we ignore the anelastic dispersion, it is necessary to increase the regularization parameter value, illustrating the significance of the phase difference in the time domain, whether we consider waveform or envelope fitting. For the SA inversion, there is no such problem.
High sensitivity to anelastic dispersion of WF makes it the least favourable misfit functional to be employed in attenuation tomography, as expected. The other two approaches are more stable.

3-D synthetic tests
In this section, we carry out a comparative assessment of the misfit functionals in the case where 3-D heterogeneities are present. In all the tests presented in this section, we added 3-D heterogeneities to the unperturbed zero-degree structure used in the previous section. This radially symmetric model consists of the global average V S and ξ profiles of SEMUCB-WM1 and the Q μ profile with values set to 300 from surface to fixed Moho depth of 30 km and 165 from Moho depth to 660 km discontinuity. Below 660 km, we use QL6 of Durek & Ekström (1996). In all the inversions presented below, we chose this radially symmetric model as our starting model. For simplicity, we ignored the Moho depth variations, crustal model and topography.
In the presented tests, we tried to recover shear attenuation anomalies in target models built with either only Q μ or both Q μ and V S ellipsoidal heterogeneities added to the radially symmetric model described above. The perturbation amplitudes peak at the centre of the ellipsoids and fade away from it. For a model parameter m, these perturbations are defined as: where A peak is the maximum perturbation at the centre of the anomaly located at (r 0 , θ 0 , φ 0 ), h, w represent its height and width, respectively, and is the distance from the center of the anomaly. We established our synthetic waveform data set using 50 selected events out of 273 events in the event catalogue that was used in developing the SEMUCB-WM1 model. The moment magnitudes (M w ) of these 50 events vary from 6 to 7.3 and their depths range from 12 to 603 km. We used CSEM to compute the seismic wavefield in the period range of 60-400 s, which was the period range targeted in building the SEM-based upper-mantle elastic global models SEMum (Lekić & Romanowicz 2011) and SEMum2 (French et al. 2013). CSEM includes the frequency independent Q μ approximation based on the standard linear solid model (Liu et al. 1976)   coherent with our assumptions (eqs 5 and 6). Here, we restricted the full-waveform inversion to only the first two arriving fundamental and overtone Rayleigh wave phases (R 1 , R 2 , XR 1 , XR 2 ) in Z-component records, and selected the same wave packets as were picked in the construction of SEMUCB-WM1 model. This limits us in terms of the ray coverage, yet provides a realistic basis in assessing the performance of the misfit functionals tested in our synthetic experiments. Fig. 7 shows the distribution of events and receivers together with the ray coverage of the 18 087 (7697 fundamental, 10 390 overtone) wave packets selected. We used the same wave packets for all the inversion results presented below.
We decided on the number of iterations based on our preliminary tests which showed that using our hybrid full-waveform inversion technique, in one iteration, we can recover perturbations with at most 40 per cent amplitude anomalies keeping the inverse problem well-posed. As we will explain further below, the introduced shear attenuation perturbation amplitudes in the 3-D synthetic tests require at least two iterations. While the synthetic seismograms for the starting spherically symmetric model (1-D model) are computed by normal mode summation, for the second iteration, they are computed by using CSEM in the heterogeneous 3-D model recovered in the first iteration.

3-D Q μ test
For the first 3-D test, a hypothetical model was built with no elastic 3-D anomalies and with multiple positive and negative Q μ perturbations with widths ranging from 4000 to 4800 km, heights from 150 to 350 km and peak amplitude 80 per cent of the reference starting model at the corresponding depths (δln(1/Q μ ))(see eq. 16). The anomaly amplitudes are chosen in such a way, that they are not too far from the values present in global mantle attenuation models, such as QRLW8 of Gung & Romanowicz (2004) with peak values around 55 per cent, and QRFSI12 of Dalton et al. (2008) with peak values of 133 per cent, and also so that we can expect to recover them in a few iterations using our hybrid full-waveform inversion technique.
The dimensions of the heterogeneity are chosen in the specified ranges to lower the computational cost in the inversion step by using a coarse model parametrization. The smaller the dimension of the heterogeneity, the more the computational cost, in addition to a possible requirement for more iterations. We introduced heterogeneities with peaks at three depths (150, 250, 350 km). Fig. 8 shows the lateral distribution and depth cross-sections of Q μ heterogeneities of the target model. Given the ray coverage, with our model parametrization (Fig. 1), these heterogeneities should be recoverable, although we might expect better resolution in the northern hemisphere. This test provides insight on the limitations of the misfit functional choice in a more realistic way than the 1-D test presented above, and allows us to test the NACT implementation when perturbations in the attenuation model are considered. As in the case of 1-D tests, in the inversion, we use a sampling rate of 10 s in the time domain, ensuring a dense enough sampling for the targeted period range of 60-400 s. In the frequency domain, the sampling rate is 0.1 mHz.
In these inversions for the attenuation parameter (Q μ ), we computed partial derivatives using NACT, and we used an a priori model covariance matrix (C −1 M ) with correlation lengths specified as 1600 km laterally and varying values from 50 to 200 km radially, in accordance with the distance between neighbouring radial nodes (see Section 3.1 for further detail on radial correlation lengths). The correlation lengths were determined based on preliminary tests and the average distance between the spherical nodes (∼9 o ). The inversions were conducted using a range of norm damping parameter values (α in eq. 1), which were reduced until the inversion became unstable. This was decided qualitatively, observing radial and lateral fluctuations of the recovered anomalies. Below, we present the best models recovered after two iterations using the lowest norm damping parameter that preserves the stability of the inversions, for each of the 3 misfit functionals considered.
E-WF and SA both performed best as they resulted in almost full recovery of the target model, with a little more lateral and in depth smearing in the case of E-WF, as shown in Fig. 9. Here, anelastic dispersion was ignored in the computation of the partial derivatives (G matrix-see Appendix A).
For the WF misfit functional, ignoring the anelastic dispersion in the computation of the partial derivatives leads to a significant loss in depth resolution as shown in Fig. 10. Additionally, a higher regularization parameter value was required for the stability of the inversion, compared to the case with anelastic dispersion. Relatively higher level of background fluctuations in the recovered model and the fact that some heterogeneities (both positive and negative) leak to very shallow depths, even when dispersion corrections are considered in the calculation of the partial derivatives, indicates, again, that this approach is the least favourable of all. The importance of anelastic dispersion when using WF further reflects the fact that this misfit functional emphasizes misfit in the phase part of the waveform. This makes it a suitable misfit functional to image elastic heterogeneity, yet one should be cautious when employing it for attenuation.

The significance of overtones
As a complement to the previous section, we present the models recovered using only the fundamental mode Rayleigh waves. This serves to clarify the importance of including Rayleigh wave overtones for attenuation mapping. Fig. 11 presents the recovered models using only the Z-component fundamental mode Rayleigh waves, with E-WF and SA misfit functionals.
Excluding overtone Rayleigh wave data, we significantly lose the Q μ anomalies below ∼300 km depth, with both SA and E-WF. In the recovered models, the anomalies in the reference hypothetical model that peak at 350 km depth appear with significantly reduced amplitudes and the anomalies that should peak at 250 km are mapped closer to the surface. Including overtone Rayleigh wave data improves the depth resolution of the recovered models and also provides additional constraints on near-surface anomalies, reducing their smearing as a function of depth. Fig. 12 clarifies the contribution of the overtones by comparing the recovered 1-D Q μ profiles at the centre of the anomalies located in the middle of A-A , B-B and C-C paths indicated on Figs 9 and 11.

3-D Q μ test with noise included
Next, we test the E-WF and SA in the presence of seismic noise. In this test, real noise records collected from the stations considered are added to the synthetic waveforms. As presented in Fig. 13, for the recovered model with E-WF, the effect of added real noise is negligible. This is reassuring, as the wave packets we used in the Q μ inversions are the same as the ones used in building SEMUCB-WM1 model, except that they are calculated for the hypothetical models. These wave packets were picked through a careful procedure that eliminates those with low signal-to-noise ratio (see Section 2.2).
As the recovered model in the presence of noise is very similar when using the SA misfit functional as it is for E-WF (Fig. 9), we do not show it here for brevity.
The test with real noise illustrates the validity of our wave packet picking for the Q μ inversions and the robustness of E-WF and SA misfit functionals, working with carefully selected data in the presence of real noise. To test the E-WF and SA at their limits, we present recovered models with high levels of noise defined through a scaling parameter in appendix B.

Test including 3-D V S and Q μ anomalies
Finally, we evaluate the attenuation imaging performance of the three misfit functionals tested in the presence of unresolved V S heterogeneities without seismic noise. By unresolved V S anomalies, we Figure 10. The recovered models after two iterations using the waveform misfit functional with (a) and without (b) the anelastic dispersion correction as mentioned in the methodology section. The locations of the Q μ anomalies in the hypothetical model are circled by dashed lines. Ignoring the anelastic dispersion leads to a significant loss in depth resolution. For this case, we had to use a higher norm damping value for stability. Additionally, the recovered model perturbations are much weaker and they become less pronounced with depth, as indicated in the change in colour scale in (b). mean that although both V S and Q μ 3-D heterogeneities are present in the target model, we invert only for Q μ , ignoring the presence of unresolved lateral variations in V S . This is an important scenario to test potential leaks from unresolved elastic heterogeneities into the Q μ images.
The hypothetical model built for this test with both V S and Q μ heterogeneities is presented in Fig. 14. We utilized the same Q μ heterogeneities as in the 3-D Q μ test presented in the previous section with maximum anomaly amplitudes of 80 per cent, but without the deepest anomalies peaking at 350 km depth. V S heterogeneities are centred at 200 km and spatially distributed very close to Q μ perturbations and even overlap with them in the western Pacific and south-east Asia, in order to evaluate leakage from one to the other in an extreme scenario. The V S heterogeneities are 5000 km in width and 400 km in height (see eq. 16) and 3 per cent (δln(V S )) in amplitude with respect to the 1-D average model. This is meant to represent underestimated amplitudes of V S anomalies recovered from seismic tomography. In the target model, we kept ξ as in our starting model, which is a radially symmetric model, and we did not invert for ξ as we work only with Z-component records of Rayleigh waves. This simplifies the test by reducing the possible interactions between V S , ξ and Q μ , and to the first order presents the key strengths and weaknesses of the misfit functionals due to the interaction between the elastic and attenuation structure.
We present a summary of the best recovered models in Fig. 15. Both E-WF and SA are able to recover Q μ heterogeneities to a certain degree with a slight leak from V S heterogeneities, as visible both in the lateral and depth cross-sectional views. Where both types of heterogeneities overlap, the recovery of the Q μ anomalies seems to deteriorate with increased lateral and in-depth smearing (see the anomalies located in the middle of B-B and C-C paths in Fig. 15).
With WF, Q μ heterogeneities are poorly recovered even with the anelastic dispersion accounted for in the partial derivatives. Additionally a stable inversion with WF requires a relatively high norm-damping parameter value (α in eq. 1)-leading to the recovery of lower perturbation amplitudes. The recovered model is far worse in the absence of the anelastic dispersion term, since V S heterogeneities are mapped into Q μ . Figure 11. The recovered models after two iterations using the envelope waveform (a) and acceleration spectral amplitude (b) misfit functionals. The inversions are carried out using only the Z-component fundamental mode Rayleigh wave time-windows (R 1 , R 2 ). Top panels: map views of distribution of heterogeneities at 150, 250 and 350 km depth; bottom panels: corresponding depth cross-sections along the A-A , B-B and C-C paths indicated on the maps. The locations of the Q μ anomalies in the target model are circled by dashed lines.
The leakage from V S to Q μ is manifested by mapping of high attenuation in fast zones of limited lateral extent and vice versa. It is in line with expectations from focusing effects, as seismic waves passing through slow and fast zones, increase and decrease in amplitude, respectively (e.g Romanowicz 1995; Dalton & Ekström 2006).
The leakage from V S to Q μ heterogeneities is most pronounced in the WF case, especially in the absence of the anelastic dispersion term in the partial derivatives matrix. To further illustrate this last point, we present the A-A depth cross-sections in Fig. 15 (bottom panel), where V S perturbations peak in the target model, and clearly dominate the imaged Q μ structure for WF (with reversed 'colour'). However, these effects are seen even in the case of E-WF and SA, implying that the elastic heterogeneity must be accounted for as best as possible in the forward modelling step (e.g Romanowicz 1995; Dalton & Ekström 2006;Bao et al. 2016).

D I S C U S S I O N A N D C O N C L U S I O N S
We have carried out a comparative assessment of three misfit functionals based on waveform (WF), waveform envelope (E-WF) and spectral amplitude (SA) differences, within the framework of a hybrid full-waveform inversion scheme for attenuation imaging. With these misfit functionals, we also evaluated the relative significance of the two effects of intrinsic attenuation on seismic records, namely amplitude decay and anelastic dispersion. The goal was to identify the best approach in isolating attenuation anomalies from other effects such as seismic noise and elastic heterogeneities. To that end, we carried out inversions targeting only Q μ without inverting for any unresolved elastic heterogeneity.
In these tests, the earth's structure is modelled as radially anisotropic and anelastic with frequency independent Q μ . All the wavefield simulations are performed using CSEM, and anelastic behaviour is introduced by means of standard linear solid models. SEM simulations provide high accuracy in addressing (de)focusing effects due to elastic heterogeneities, thus providing a realistic basis for the comparison of the relative performance of the three misfit functionals considered.
We have presented 1-D and 3-D synthetic tests in a period range of 60-400 s, as was considered in building previous upper-mantle Figure 12. The 1-D Q μ profiles along the centre of the anomalies located at the mid-points of the A-A , B-B and C-C paths shown in Figs 9 and 11. The contribution of the overtones in reducing the smearing in the radial direction and recovering the deep anomalies below ∼300 km is clearly visible. Figure 13. The recovered models in two iterations using E-WF in the presence of seismic noise. Real noise records collected from the stations are added to the synthetic seismograms computed for the target model before the inversion. The effect of the noise is negligible for E-WF and SA. For the latter, we recovered a model very similar to the one shown in Fig. 9. The locations of the Q μ anomalies in the target model are circled by dashed lines. seismic velocity models in our group. In the 3-D tests, inversions were conducted using a realistic and limited ray coverage replicated from a subset of Z-component Rayleigh wave data (R 1 , R 2 , XR 1 , XR 2 ) used in building the SEMUCB-WM1 global elastic model. Using the synthetic waveform data set computed for target models with Q μ and, in some tests, V S anomalies, we tested the three misfit functionals for recovering the anomalies starting with a spherically symmetric Earth model. The size of anomalies in the hypothetical models vary from 4000 km to 5000 km laterally and 100 to 200 km radially. The sizes of anomalies are chosen in these ranges to allow us to employ a coarse model parametrization grid, and to achieve convergence in a few iterations reducing the computational cost. The peak amplitudes of anomalies, on the other hand, are chosen such as to match those in existing global attenuation and seismic velocity models. The conclusions reached can be generalized to the cases with smaller size heterogeneities, which would require more refined model parametrizations, computations to shorter periods, and, likely, more iterations.
In our first 1-D synthetic test, we verified the accuracy of first order normal mode perturbation theory in the PAVA approximation, and carried out a comparative assessment of the three misfit functionals for attenuation imaging along a source-toreceiver path, using a single R 1 wave packet. This exercise gave initial insights into the performance of the misfit functionals with encouraging results for E-WF and SA misfit functionals.
Moving to a more realistic setting, in the first 3-D synthetic test, we considered the recovery of a hypothetical model with both positive and negative 3-D Q μ heterogeneities, but no heterogeneity in 820 H. Karaoglu and B. Romanowicz  shear velocity. In this test, all the misfit functionals performed very well except WF in the absence of anelastic dispersion. In this case, although we were able to recover the sign of the attenuation perturbations, the depth resolution deteriorated with significant lateral and in-depth smearing. Including anelastic dispersion in the computation of the partial derivatives for the inversion led to somewhat improved images.
The significance of including overtone Rayleigh wave data in attenuation imaging is illustrated by comparing inversions with and without overtone wave packets. Excluding overtones led to significantly poorer recovery of the deep Q μ anomalies (below ∼250 km depth), for both E-WF and SA misfit functionals. We conclude from this test that overtones are needed to map 3-D Q μ structure at transition zone depths and to provide additional constraints on near-surface anomalies.
The second 3-D test was designed to evaluate the performance of E-WF and SA in the presence of seismic noise. Both methods performed well when realistic noise levels were added. Increasing the noise level to test the limits of both methods, we observed that larger norm-damping parameter values were necessary to obtain a stable model, thus reducing the recovered amplitude anomalies. Also, as the noise level increased, SA performed better than E-WF especially in recovering the deep structures. This illustrates the drawback of working in the time domain, as the introduced noise distorts not only the amplitude but also the phase of the signal. Although E-WF is less sensitive to phase anomalies than WF, it is still affected by anelastic dispersion. On the other hand, the SA approach relies only on amplitude anomalies in the spectral domain, isolating the attenuation effect more accurately, provided the appropriate time-window chosen.
Notably, the resolution of deep structure is lost with increased noise level, as presented in appendix B, for both E-WF and SA misfit functionals. This is due to the definition of our noise level which leads to the suppression of low amplitude overtones at a higher rate than fundamental mode Rayleigh waves.
For the final 3-D test, we built a more complex target model including both V S and Q μ heterogeneities. Leaving V S heterogeneities as unknown, we tried to recover the Q μ model. As it turns out, the leakage from unresolved V S anomalies to the Q μ model is inevitable regardless of the misfit functionals chosen. The leakage appears as positive attenuation heterogeneities for slow regions and vice versa, as expected for the effects of focusing. However, the level of leakage is very low in the E-WF and SA cases, whereas it is quite dominant for WF. In fact, in the absence of anelastic dispersion terms in the G matrix, WF leads to a Q μ model visibly dominated by V S leaks.
In general, we conclude that for attenuation imaging SA and E-WF are the more appropriate misfit functionals, with the former performing relatively better with increasing noise level and frequency content. WF prioritizes phase information over amplitude information, and is more sensitive to elastic heterogeneities. For all the three misfit functionals, it remains important to try and first determine the best possible elastic 3-D model, in order to accurately account for focusing effects. In accordance with these conclusions, we suggest updating seismic velocity and attenuation models sequentially using WF for the former and E-WF or SA for the latter, as well as including overtones in the inversions to map attenuation heterogeneities below ∼300 km and to better constrain the uppermantle structure.
We note that in our previous efforts at 3-D upper mantle attenuation mapping using waveforms in the time domain, we used the WF misfit functional and corrected for anelastic dispersion effects due to 1-D Q μ structure but not due to 3-D Q μ structure (Gung & Romanowicz 2004). However, in that study, a very rigorous data selection was performed, keeping only those waveforms which were in phase with the synthetic seismograms computed in a 3-D elastic model, and considering only very long wavelength 3-D Q μ structure. This resulted in realistic 3-D Q μ images down to transition zone depths, albeit with likely unreliable amplitudes of lateral variations. In more recent work, Dalton & Ekström (2006); Dalton et al. (2008) and Bao et al. (2016) adopted a SA misfit functional -albeit only for fundamental mode Rayleigh wavesapplied to a large data set (much larger than the one considered in our tests, in which we restricted the number of events due to the heavy SEM computations involved). This plus corrections for Figure 15. Summary of results of synthetic test with V S and Q μ perturbations in the target model, after two iterations. The presence of V S perturbations is ignored in the inversions. E-WF (a) and SA (b) succeed very well in recovering the Q μ perturbations with relatively small leakage from V S perturbations compared to WF. Without considering the anelastic dispersion, WF maps V S perturbations into Q μ perturbations (c). Although considering anelastic dispersion improves the performance of WF (d), the leakage is still quite visible and dominant. In both cases, we have to apply higher damping to keep the inversion stable. The level of leakage from V S to Q μ is shown in the depth cross-sections of the recovered models along A-A (e). The leakage is worst with WF, as can be seen from the amplitude of the imaged perturbations in Q μ , where there should be none. In all the panels, the locations of the Q μ anomalies in the target model are circled by dashed lines.
focusing likely allowed them to reach higher lateral resolution for Q μ in the upper 200 km of the mantle. Still, as shown here, fundamental modes are not sufficient to resolve 3-D Q μ structure in the transition zone.
It is well known that significant amplitude uncertainty in seismic waveforms stem from inaccurately known source parameters and instrument responses. However, these uncertainties cannot be avoided through the definition of the misfit functional. To that end, within the scope of waveform comparison, approaches that rely on differential comparison of similar waveforms that travel through overlapping paths have been suggested by several studies (e.g. Romanowicz 1990;Bhattacharyya et al. 1993;Ford et al. 2012). Using our hybrid full-waveform inversion through the comparison of individual waveforms in time or frequency domain, we suggest a joint or iterative inversion for source parameters and receiver terms (scalar or frequency dependent as in the work of Dalton & Ekström 2006), for 3-D Q μ perturbations, and for perturbations in elastic structure.
This study is part of an effort towards building a new global uppermantle attenuation model based on long-period full-waveform in-version. By evaluating our hybrid full-waveform inversion scheme through synthetic tests, we have developed physical and operational insights into the recovery of anelastic heterogeneities in the case of real data, the result of which are presented in a companion paper (Karaoglu & Romanowicz 2017).

A C K N O W L E D G E M E N T S
The authors thank Gabi Laske and three anonymous reviewers for their constructive comments on the manuscript. This work was supported by the European Research Council under the EC's 7th Framework Programme (FP7-IDEAS-ERC)/ERC Advanced Grant WAVETOMO. The computations were performed on OCCIGEN of the HPC national facility CINES in France, on EDISON and CORI at the National Energy Research Scientific Computing Center (NERSC), USA, and the local cluster (MALBEC) of Institute de Physique du Globe de Paris. BR acknowledges support from NSF grant EAR-1497229. δω kk (θ, φ) = 1 ω k + ω k a 0 δm(r, θ, φ)K m kk (r )r 2 dr where a is the earth's radius. Excluding shear and bulk attenuation perturbations (δ Q −1 μ , δQ −1 κ ) from these expressions leaves us with perturbations in the parameters of the elastic model m. These parameters are reduced to two of the 5 radially anisotropic elastic parameters (V Siso and ξ ) in the present case. The corresponding kernel expressions (K m kk ) are given by Woodhouse & Dahlen (1978) and Romanowicz (1987) for the cases with k = k and k = k respectively. The kernel expressions for Q −1 μ and Q −1 κ are directly related to those for shear (μ) and bulk (κ) moduli respectively, and can be written in terms of wave speed kernels computed for Voigt average isotropic S-and P-wave velocities (V S , V P ) as follows: where we have dropped subscripts (k,k ) and μ 0 , κ 0 denote the shear and bulk moduli for the 1-D average model, respectively. Expressions for K S and K P are provided by Panning & Romanowicz (2006) in terms of the velocity kernels of horizontally and vertically polarized P-and S-waves. The attenuation perturbation (δα kk ) that appears in the frequency perturbation (δω kk ) in eq. (A7) leads to physical dispersion. In the presented synthetic tests, we test the significance of this term when using the three misfit functionals considered, which turns out to be crucial for WF misfit functional.

A P P E N D I X B
We tested the E-WF and SA misfit functionals in the presence of real levels of seismic noise in Section 3.2.3. Here, we increase the noise level to test the misfit functionals, E-WF and SA, in extreme Figure B1. The recovered models using E-WF with two different noise levels. Real noise records collected from the stations are added to the synthetic data (computed in the target model) after multiplication by a factor of 2 per cent (a) and 5 per cent (b) of the peak amplitude value in the full record (3 hr). The peak values in these tests correspond to the first arriving Rayleigh wave. Thus, the noise effect is more pronounced for the overtones which are more sensitive to deeper structure. The recovered model for the 2 per cent noise level shows more smearing laterally and in depth and weaker perturbation amplitudes compared to the case with real noise (Fig. 13). Increasing the noise level to 5 per cent, we lose the depth resolution below 250 km. scenarios until they break down Q μ inversions. In this test, real noise records collected from the stations considered are added to the synthetic data after scaling their root-mean-square amplitudes by a factor defined as the percentage of the peak amplitude value in the full 3 hr synthetic record used as data. By taking the peak amplitude in the full synthetic waveform data as reference for the added noise, we ensure that it is at the same level for all the synthetic data, independent of the event magnitude or the noise level at the station. Next, we increase the noise level until we lose the recovered model accuracy significantly. In what follows, we present the best recovered models for two levels of noise (2 per cent and 5 per cent). The first noise level at 2 per cent is relatively low, and both misfit functionals perform well, whereas the second one (5 per cent) is a high noise level that suppresses the overtones and shows the difference in the performance of the two misfit functionals. Fig. B1 shows the recovered models using E-WF. For a 2 per cent noise level, the recovery of the position and shape of heterogeneities is satisfactory, although we lose the perturbation amplitude accuracy achieved in the absence of noise, with some more smearing. Once the noise level is increased to 5 per cent though, there is significant loss of resolution, especially in depth.
The peak amplitudes in the synthetic data correspond to the first arriving Rayleigh waves, therefore the noise effect is more pronounced for the overtones. This leads to a loss of resolution at larger depths with increased noise level as the overtones are more sensitive to them than the fundamental modes.
Lowering the signal-to-noise ratio for the overtones leads to a leakage of near-surface heterogeneities to greater depths. This is clear in the case with 5 per cent noise level, where heterogeneities peaking at 150 km reappear at 350 km with the reversed sign. We also see the mapping of deep structure heterogeneities to shallower depths as the ones at 350 km weakly appear at 250 km. Although it might be possible to lower the amplitudes of such leaks and fluctuations by imposing a higher damping level (α in eq. 1) in the Q μ inversion at the expense of losing recovered perturbation amplitudes, we prefer to present the figures for this case at the chosen damping level to show: (1) A pattern that can be misinterpreted when working with real data. (2) The significance of including reliable overtone waveforms when imaging the attenuation below ∼250 km depth.
SA performs better with seismic noise compared to E-WF. The recovered models are presented Fig. B2. The relatively better Figure B2. The recovered models using SA with the same noise levels as in Fig. B1. In general, we observe a better performance compared to E-WF for the same level of noise. This is so both for the depth resolution and the amplitude of the recovered perturbations.

826
H. Karaoglu and B. Romanowicz performance of SA is clear for the case with 5 per cent noise level. As opposed to E-WF, SA succeeds in recovering the deep Q μ heterogeneities below ∼250 km, although we lose the accuracy in the recovery of perturbation amplitudes.
The relatively higher performance of SA compared to E-WF shows an inherent difficulty of working in the time domain when imaging attenuation. Although E-WF suppresses the phase anomaly dominance that we see in WF, one needs to be careful as it cannot fully isolate the amplitude anomalies in the presence of high levels of noise.
Common to both the E-WF and SA cases is the necessity of increasing the value of α (eq. 1) in the inversion step in the presence of noise, as expected. The primary effect of that is to reduce the amplitudes of the recovered anomalies.