Three-dimensional elastic wave speeds in the northern Chile subduction zone: variations in hydration in the supraslab mantle

Three-dimensional elastic wave speeds in the northern Chile subduction zone: variations in hydration in the supraslab


I N T RO D U C T I O N
It is generally agreed that subducting oceanic lithosphere transports an abundance of aqueous fluid and hydrated minerals into the Earth (e.g.Ulmer & Trommsdorff 1995;van Keken et al. 2011).While much of this fluid is released at shallow depths, it is thought to be responsible for hydration of the mantle wedge beneath the overriding plate (Peacock 1993;Peacock & Hyndman 1999;Iwamori 2000;Kamiya & Kobayashi 2000;Bostock et al. 2002).Depending on the age and speed of the subducting lithosphere, a significant amount of water may be carried to depths in excess of 200 km (van Keken et al. 2011).The effects of the transportation and release of water in subduction zones are profound.For example, water can lower the melting temperature of the mantle by some 400 • C, resulting in the generation of magma that eventually finds its way to the surface in the form of a volcanic arc (Gill 1981;Moran et al. 1992;Peacock 1993;Plank & Langmuir 1993;Yogodzinski et al. 1995;Stern & Kilian 1996;Kelemen et al. 1998;Kerrick & Connolly 2001;Grove et al. 2006Grove et al. , 2009)).Dehydration reactions and phase transitions within the subducted slab are believed to be responsible for the intermediate-depth seismicity within slabs (e.g.Kirby et al. 1996;Peacock 2001;Omori et al. 2002Omori et al. , 2004;;Hacker et al. 2003;Abers et al. 2013).Water released in subduction zones by phase transition from basalt to eclogite (between 1.2 and 3.3 per cent, Hacker 1996) appears to be important in forming serpentine in the mantle wedge above the slab (Fyfe & McBirney 1975;Bostock et al. 2002;Hacker et al. 2003;Hacker 2007).If the supraslab mantle is serpentinized in this corner of the uppermost mantle, its low shear strength may control the down-dip rupture limit of great megathrust earthquakes (e.g.Hyndman et al. 1997;Bostock et al. 2002).
Because elastic wave speeds are sensitive to both serpentinization and partial melt (e.g.Hammond & Humphreys 2000;Kerrick 2002;Carlson & Miller 2003;Hacker et al. 2003;Christensen 2004;Watanabe et al. 2007;Karato 2012;Ji et al. 2013), empirical evidence for the role of water in subduction has largely come from seismic imaging and earthquake locations.For example, seismic velocity models such as those from Cascadia (Bostock et al. 2002), Costa Rica (DeShon & Schwartz 2004), New Zealand (Eberhart-Phillips & Bannister 2010) and Japan (Kamiya & Kobayashi 2000;Zhang et al. 2010), suggest that serpentinization in the mantle wedge is on the order of 10-20 wt%.While these studies are enlightening, neither the extent to which the supraslab mantle is hydrated nor the amount of the partial melt generated within that part of the mantle is well known.
In this paper, we investigate the pervasiveness of hydration and partial melt in the supraslab mantle beneath the Nazca-South America Plate boundary in northern Chile by generating elastic wave speed images through joint inversion of observations of local earthquake body waves and ambient noise surface waves.Our motivation for choosing this particular area is partly the availability of an extensive data set directly above a mantle wedge, but also because the frequent occurrence of megathrust earthquakes and the prevalent arc volcanism along this plate boundary allows an opportunity to investigate the consequences of variations in supraslab wave speed anomalies for Andean margin seismicity and volcanism.

Overview of the northern Chile Plate boundary
The central Andes evolved over the past ∼200 Ma as a series of magmatic arc systems related to the subduction of oceanic lithosphere beneath the South American continental margin (e.g.Isacks 1988;Scheuber et al. 1994;Allmendinger & Gubbels 1996;Allmendinger et al. 1997).Large volumes of calc-alkaline lavas and related plutons have been emplaced along trench-parallel provinces since the Jurassic, showing a general trend of eastward migration (Coira et al. 1982).The current magmatic arc commenced in Middle Miocene times (∼27 Ma) (Baker & Francis 1978), and deposited significant volumes of volcanic material in northern Chile during Neogene times.The arc remains very active along its ∼2000 km length.
The forearc region of the central Andes typically is subdivided into four morphotectonic units (e.g.Reutter et al. 1988;Fig. 1).From west to east, these include: (1) the Coastal Cordillera, which is the Jurassic age magmatic arc, (2) the Central Depression, a Jurassic back arc basin and mid-Cretaceous magmatic arc covered by younger sedimentary rocks, (3) the Precordillera, a late-Cretaceous magmatic arc with Mesozoic sediments lying unconformably upon metamorphic basement and plutonic rocks and (4) the Western Cordillera, the currently active volcanic arc with peaks in excess of 6000 m.The missing Jurassic forearc is presumed to have been tectonically eroded; the eastward migration of the volcanic front from the Coastal Cordillera to the Western Cordillera suggests that an EW section of more than 200 km has been consumed (Rutland 1971;Ziegler et al. 1981;Scheuber et al. 1994).
Topography in the forearc is highly variable.There is no welldeveloped accretionary wedge near the trench (Von Huene et al. 1999), and an extensional forearc is linked to the coast by a ∼1000 m escarpment.The subdued relief of the Coastal Cordillera and Central Depression give way in the east to the steep western flank of the Precordillera and high altitudes of the Western Cordillera.
The Nazca-South American Plate boundary currently has one of the fastest convergence rates on Earth at about 6-7 cm yr −1 (DeMets et al. 1990, 1994;Altamimi 2002;Sella et al. 2002).As a result, this boundary has produced on average one earthquake with magnitude M w ≥ 7.9 every 10 yr over the past 100 yr.The segment of the boundary in northern Chile presently is considered to be a seismic gap, having accumulated more than 9 m of slip deficit during the last 130 yr of interseismic coupling, and believed capable of generating an M w 9.0 earthquake (Fig. 2).The recent 2014 M w 8.2 Pisagua earthquake in northern Chile is considered by some to be a potential foreshock of that larger event.
Northern Chile and southern Peru together have experienced great earthquakes in the past (e.g. the 1868 M w 8.7 southern Peru and 1877 M w 8.9 northern Chile events (Comte & Pardo 1991)) (Fig. 2).By contrast, few shallow earthquakes related to the interplate coupled region have occurred within the forearc.Among rare examples is the 2007 M w 5.7 Pisagua earthquake, located in the outer forearc region during the interseismic stage of the subduction seismic cycle (Allmendinger & González 2010).
Modelling of GPS and InSAR data suggests that during the interseismic phase of the earthquake cycle, the plate contact in northern Chile is locked to 35 km depth with a transition zone (brittle-ductile) between 35 and 55 km depth, but with variations in those limits along strike (Norabuena et al. 1998;Bevis et al. 2001;Khazaradze & Klotz 2003;Chlieh et al. 2004;Pritchard & Simons 2006;Béjar-Pizarro et al. 2010;Métois et al. 2013).Recently, Ortega-Culaciati et al. (2015) re-analysed decades of geodetic data and obtained a detailed map of interplate coupling along the seismic gap in northern Chile (Fig. 2).To first order, they observe that the part of the boundary between Pisagua (∼21 • S) and Antofagasta (∼23 • S) is highly coupled, while that north of ∼21 • S is heterogeneous but generally less coupled.They found an isolated, highly coupled patch offshore of Pisagua surrounded by low coupled areas, in agreement with the findings of Métois et al. (2013).The coseismic slip of the 2014 M w 8.2 Pisagua earthquake occurred in this isolated patch while the postseismic slip occurred in the neighbouring low coupled regions.Despite this variability, interseismic coupling models suggest that the northern Chile seismic gap remains capable of generating a large megathrust earthquake.

DATA A N D M E T H O D O L O G Y
The data analysed in this study are a combination of arrival times from earthquake generated body waves and phase delay times from ambient noise generated surface waves recorded by seismic stations deployed in northern Chile for different periods of time over the past three decades (Fig. 3).

Body waves from local earthquakes
The abundance of local earthquake data collected from this region (Fig. 3 and Table 1) is largely in response to the frequent megathrust events and the anticipated end of the current seismic cycle in northern Chile.For example, a rapid deployment of 10 broad-band stations before, and 16 broad-band stations after, the 2014 M w 8.2  The initial body wave data set derived from these deployments consists of P and S arrival times from 33 351 events recorded over a period of 25 yr by a variety of networks comprising 360 stations (Table 1; Figs 3 and 4).From this combined data set, we selected earthquakes that were recorded by at least 10 stations and with predicted arrival times within an initial outlier residual threshold of the larger of 2 s and 10 per cent of the total traveltime (essentially this means we presume the initial model is, on average, within about ±10 per cent of the actual structure).Application of these criteria resulted in a reference data set of 11 874 events with 110 640 P and 106 680 S wave arrival times.These arrival times and their associated uncertainties are estimated manually.Nominal uncertainties for P wave arrival times are between 0.1 and 0.5 s while those for S waves typically are about twice that amount.

Ambient noise data and preprocessing
Most of the ambient noise analyzed in this study was recorded by 18 broad-band stations operated by the Integrated Plate Boundary Observatory Chile (IPOC), CSN and ONEMI over a period of 3 yr (2012 January 1-2014 December 31).This primary dataset is supplemented by recordings from 29 broad-band stations deployed by the DGF, CSN and ONEMI near Pisagua for 4 months (2014March 29-2014 July 31) and 12 broad-band stations of the Iquique network operated by GFZ Potsdam for 7 months (2012 January 10-2013 September 5).The 59 total stations in these combined networks (Fig. 3) provide 1422 contemporaneous station pairs from which Green's functions may be estimated.
For the most part we followed the pre-processing steps described in Bensen et al. (2007) to generate estimated Green's functions (EGFs) from the vertical component of cross-correlated noise, using a version of the CU-Boulder ANCC software,1 modified to permit sample rates and frequency bands appropriate for our data.Runningmean normalization was used to reduce contamination by coherent signals (mostly from earthquakes), and we pre-processed a band between periods of 2 and 150 s.
Cross-correlations for each contemporary station-pair were generated for each day of record, and then stacked over the duration of co-recording.In an attempt to improve the signal-to-noise ratio (SNR) of the final stack we iteratively removed daily correlations that deviated by more than one standard deviation from the overall stack.However, we found that generally neither the shape of the EGF nor the SNR were significantly different from those obtained from a simple stack of the entire data set.
We measured phase velocity dispersion using a technique based on that described in Yao et al. (2006).The EGFs are narrow-band filtered at 1 s intervals between 4 and 30 s in a pass-band of ±0.2 s of the central period and windowed within 3 cycles of an estimated arrival time.Noise is sampled in a part of the seismogram before the expected arrival, and seismograms with an SNR less than a given threshold (usually 10) are excluded.Those remaining are transformed to a velocity versus period space and a phase velocity dispersion curve is determined by tracking amplitude maxima over available periods.In practice, simply following the maxima does not guarantee immunity from cycle skips, and hence as a pre-processing step we generated a pilot dispersion curve from a least squares fit to all of the original curves that appear to fall within the correct cycle.In this case the pilot curve for phase velocity c (in km s −1 ) as a function of period T (in seconds) was represented by the polynomial c (T ) = 3.142226 + 0.03087T − 0.0002410516 T 2 − 0.00000065T 3 . (1) For each EGF we choose the cycles that most closely follow this curve.Points without a clear maximum near the pilot curve were visually re-examined and discarded if ambiguous.Dispersion curves were generated for a band between 5 and 20 s period.Examination of dispersion curves from several station pairs suggests that cycle skips could be reliably identified at periods longer than 5 s; hence we adopt that as a minimum period.The choice of 20 s maximum period comes from abiding by the three-wavelength rule of Bensen et al. (2007) which in this case excludes nearly all but the N-S paths at longer periods.
Because of the proximity of the stations to the Pacific Ocean (Fig. 3), and the strong dependence of path length on azimuth (most N-S paths are long, most E-W paths are short), one may be concerned that an azimuthally inhomogeneous distribution of noise could significantly bias the phase velocities determined from the EGFs.This concern was reinforced by both the asymmetry of some of the EGFs and discrepancies in dispersion curves generated by the causal and acausal sides of some of the cross correlations.
To estimate and correct for this bias we employed a technique based on that described by Yao & van der Hilst (2009).To summarize their approach, we first determine the azimuthal distribution of noise E(θ ) by considering that the observed cross correlation C n (ω, t, d n ) for station pair n as a function of frequency ω, time t, and interstation distance d n can be constructed as a weighted sum of monochromatic plane waves.Taking the Fourier transform (FT) of this construction gives where θ m is the mth azimuth and δt nm is the phase delay time, calculated by integrating the inverse of the phase velocity along the path between the two stations.W n and H nm are time windows for the surface wave train in the EGF and the obliquely arriving monochromatic waves, respectively, assuming a range of wave speeds from 2 to 5 km s −1 .The distribution of noise E(θ ) is then determined by minimizing the energy in the difference between the observed and calculated real and imaginary parts of the Fourier transform of the cross correlations for all station pairs.We slightly modify this approach by noting that, for any pair of stations, the cross correlation at each station depends only on that part of E within the 180 • of azimuth that travels towards that station.Hence, the sum in eq. ( 2) is a concatenation of two independent equations relevant to the causal and acausal parts of the cross correlation.Separating the two is advantageous for many station pairs where one side of the cross correlation has little energy above the noise level as the other side can still be used.Hence, we split the single eq.( 2) into two sums over the relevant azimuths and add a row to the normal equations for each side of the cross correlation.
Having estimated E(θ ), we use (1) to calculate a phase shift δφ n between an observed cross-correlation and one which would have resulted from an isotropic distribution of noise (i.e.E(θ ) is a constant) and define a bias μ as where t n is an estimate of the phase delay time between station-pair n derived from a 2-D phase velocity map using dispersion curves generated from averaged cross correlations (causal and acausal).
The estimates of the 'true' phase delay time, t iso , are then used to update the phase velocity maps.This process may be iterated until the estimates of t iso stabilize.For the data set used here, stability was achieved after a single iteration.
As may be expected, the noise distribution we determined for northern Chile (Fig. 5) is dominated by sources from the west, although the direction of propagation for the maximum is slightly north of east (about 70 • azimuth).The biases resulting from this noise distribution are variable but generally small (<0.6 per cent) (Fig. 6).These estimated biases were used to adjust the surface wave phase delay times and new phase velocity maps (Fig. 7) were computed.The phase delay times derived from these maps are then used in the joint inversion with the body wave arrival times.

Joint inversion methodology
Like most joint inversions, the simultaneous fitting of body wave arrival times and surface wave phase delay times is motivated by complementary sensitivities of these observations to different parts of the model.In this case, surface waves are included mostly to reduce potential trade-offs in crustal and mantle structure caused by the lack of local earthquake activity in the crust.Surface waves also provide additional constraints on shear wave speeds that compensate for the lower number of, and higher uncertainties in, shear wave arrival times.
The joint inversion methodology is based on an approach described in Nunn et al. (2014), modified for the local event case using procedures described in Roecker et al. (2004Roecker et al. ( , 2006)).The subsurface is parameterized by specifying P and S wave speeds on a 3-D grid of nodes spaced at 5 km intervals in depth and latitude and about the same distance in longitude (increments in longitude are everywhere 0.04717 • ).Intragrid wave speeds are determined by trilinear interpolation.Body wave traveltimes within the medium are calculated using a 3-D eikonal equation solver in a spherical (Earth-centred) coordinate system (Li et al. 2009;Zhang et al. 2012).Surface wave phase delay times at a given frequency ω in a given 3-D V s model are determined by first assuming that such a model can be constructed by combining 1-D models at each areal grid point (Montagner 1986).With this assumption, we calculate phase velocities c and partial derivatives ∂c/∂ V s for the 1-D model at each areal point using the locked-mode method of Gomberg & Masters (1988).The phase delay times between any two points are then calculated by integrating the reciprocal of the phase velocity along the great circle path between them.
The inverse problem is a standard linearized expansion of the forward problem, and partial derivatives for each observation are calculated along the ray paths.Following a procedure used in Roecker et al. (2004) we solve for perturbations to P slowness (U p = 1/V p ) and the ratio r = V p /V s = U s /U p rather than to shear wave slowness itself by setting U s = (rU p ) = U p r + r U p and including a shear wave observation using where T so and T sc are the observed and calculated shear wave traveltimes, respectively.The partial derivatives are taken with respect to the 4 hypocentre parameters h k and the m values of U si that exhibit sensitivity to the shear wave time (i.e.those that bound the     elementary volumes through which the ray passes).A revised shear wave speed model is then determined from U s = rU p after modifying r and U p with r and U p .We take this approach partly for technical reasons-it allows the S wave model to take advantage of the greater resolving power of P waves and provides a means to couple the sensitivity to U s of the surface waves to perturbations in U p -but also because we ultimately seek to interpret variations in r as proxies for hydration and melt fraction.Determining r by dividing individually determined V p and V s is problematic because of differences in the level of resolution for these quantities.
Phase delay times of surface waves at any frequency ω are included in a similar way by relating derivatives of shear wave slowness U s with respect to phase velocity c(ω) by the chain rule: where the indices i, j and k refer to the latitude, longitude and depth coordinates, respectively, of each node in the model.Each row of the system of linear equations is weighted to reflect the uncertainties in the observations.Body wave arrivals are weighted by the inverse of the square of the associated time uncertainty.Surface wave phase delay times are similarly weighted by estimating an uncertainty based on the SNR of the EGF used in generating the dispersion curve, multiplied by a factor related to the period (longer periods generally have less well defined maxima).Uncertainty estimates thus obtained range from a few tenths of a second to several seconds but generally are about 3 per cent of the phase delay time.Outliers in the surface wave data set are identified and removed during the creation of phase velocity maps as part of the noise bias analysis.For body waves, data quality criteria are enforced at each iteration in order to eliminate potential outliers and to disqualify less well constrained hypocentres.Thresholds for outlier identification are applied at each iteration, and observations that exceed these thresholds are disqualified for that iteration (but may be reintroduced in subsequent iterations if a wave speed adjustment brings the residual under the threshold).These thresholds are generous at the start but gradually reduced over several iterations to the larger of 0.5 seconds or 5 per cent of the traveltime.Hypocentres with fewer than 10 acceptable P and S wave observations after the criteria are applied are excluded.For most runs this culling process retained about 95 per cent of the original observations over the course of inversion.We also applied different strategies for relative weighting of body and surface wave observations in a joint inversion, but in the end found that the best result was obtained by first inverting the surface wave phase delay times alone for several iterations prior to jointly inverting both the body and surface waves with even weighting.This appears to be due to both the greater sensitivity of surface waves to shallow structure and the much larger number of body wave observations available.
The resulting system of linear equations is solved iteratively using the LSQR algorithm of Paige & Saunders (1982).In addition to regularization by damping, perturbations to the P and S models are smoothed after each iteration by averaging neighbouring nodes with a moving window.After trial-and-error tests of trade-off between variance reduction and model complexity we decided to use a window with 5 nodes in the E-W direction (the centre node and two nodes either side), 11 nodes in the NS direction (±5 nodes from the centre) and 3 nodes (±1 node from the centre) in the Z direction.The larger window in the N-S direction reflects an anticipated lesser degree of heterogeneity parallel to the strike of the Andes, but also compensates for the longer ray paths in that direction.Note that at each iteration we are smoothing the perturbations to the model as opposed to the model itself; hence smaller scale variations in wave speed still can appear over several iterations.
Hypocentre coordinates are included as variables in the inversion (eq. 3) but are also relocated in updated models prior to any iteration.The models shown here generally will have gone through about 10 iterations with only surface wave times followed by an additional 10 iterations with combined surface and body wave times.Generally, no significant change in variance nor in values of wave speeds are realized beyond that point.

Preliminary inversions and starting models
As mentioned above, we determined that the best way to incorporate surface wave observations was to allow them to adjust the model prior to the joint inversion.In developing a starting model for the surface wave only (SWO) inversion, we began with a body wave only (BWO) inversion that started with an adaptation of the 1-D model of Husen et al. (1999) for the Antofagasta region (Fig. 8).The resulting 3-D BWO model for V p and V p /V s , obtained after 13 iterations , was averaged laterally to obtain a 1-D estimate of V s .This step ensures some compatibility with the body waves, and also provides a means for determining appropriate values for regularization and thresholds for data misfit.Because surface wave inversions tend to retain biases introduced by interfaces like the Moho, we smoothed the transition from crustal to mantle velocities over a range of depths (40-70 km) suggested by previous estimates of crustal thickness in this area (e.g.Beck et al. 1996;Yuan et al. 2000Yuan et al. , 2002;;McGlashan et al. 2008).This 1-D V s model (Fig. 8) was then used as a starting model for the SWO inversion.
After 10 iterations of the SWO inversion, the wave speeds in the resulting 3-D V s model were then multiplied by a constant V p /V s ratio to construct a starting 3D V p and V s model for the joint inversion.Based on both the results from the BWO inversion and a linear fit to T p versus T s -T p for the entire body wave data set (Fig. 9), we take V p /V s = 1.76 as the preferred value.As discussed below, while the absolute values for the wave speeds and their ratios in the final models are sensitive to the choice of initial value of V p /V s , there are metrics that argue in favour of 1.76.

D E S C R I P T I O N O F T H E P R E F E R R E D M O D E L
Several trial inversions, discussed below, were performed to evaluate the effects of a variety of assumptions and choices of parameters on the final model.Those chosen for our preferred model (Figs 10-12) manage to fit the observations well while suppressing short wavelength artefacts.Relative to the 3-D starting model generated   by the SWO inversion described above, the preferred model reduces the variance in the residuals from 0.063 to 0.044 s 2 , or by about 30 per cent.The final variance is larger than the anticipated noise level variance of 0.024 s 2 , suggesting that the observations are not overfit.The body wave residual variance is slightly less than that generated by the original (and less regularized) BWO inversion (0.046 s 2 ) suggesting that the inclusion of the surface wave observations reduces the space of models that fit the body wave arrival times equally well.At the same time, as discussed below, these models are generally similar.
Because most of the trial runs we performed were designed to test the robustness of certain features of interest in the mantle, we mostly discuss wave speeds in the subducted slab and the supraslab mantle (a discussion of finer scale crustal structure will appear in a companion paper).We also focus on larger scale features that our trial inversions show to be robust, in the sense that they are not overly sensitive to subjective choices concerning how the inversion is carried out.
One of the more conspicuous features of the preferred model is an ubiquitous positive gradient in V p /V s ratio, increasing from about 1.75 within the slab to 1.78-1.82 in the supraslab mantle (Fig. 12).This gradient is in most cases perpendicular to the trend defined by the seismic zone and, while it varies in magnitude along strike, extends along much of the length of the slab within the mantle.Specifically, the contrast in the northern sections (P1-P8 in Fig. 12) is about 3 per cent while that in the south (P9-P16) is closer to 2 per cent.The high V p /V s region also appears to extend over a significantly larger volume in the sections north of about 21 • S. V p /V s within the slab itself is higher in the north (∼1.76) than in the south (∼1.74).In most sections, the supraslab region of high V p /V s extends from about 40 km depth to the end of the seismic zone at about 120 km depth, and may extend to depths as shallow as 20 km in some of the northern sections (sections P1-P3 in Fig. 12a).Note that while V p /V s is high in the supraslab region, both V p and V s are relatively low in the 40-80 km depth range (Figs 10 and 11).V p and V s both increase within the slab at depths of 80-100 km, and then remain high to the end of the seismic zone.
Sections with a well-defined double seismic zone at depths greater than 90 km (P1-P4 in Fig. 12) are characterized by a relatively low value of V p /V s in the area between the two zones.V p and V s at these depths are both lower in the lower part of the double seismic zone than in the upper part (e.g.sections P1-P3 in Figs 10  and 11).

T E S T S O F R E S O L U T I O N A N D RO B U S T N E S S
As no single type of trail inversion provides conclusive evidence about the quality of an image, a variety of tests with both real and synthetic data were performed to estimate the level of robustness of certain features in the preferred wave speed model.In order to account for the effects of observational uncertainty, each synthetic data set is modified by random noise using a normal distribution with standard deviations that reflect the expected uncertainties in the actual data.Our main concerns are the extent to which the data can resolve the features we wish to interpret, and the dependence of the model on subjective choices made while carrying out the inversion (e.g.degree of regularization, weighting of observations, or starting model assumed).

Recovery of prismatic anomalies
A standard approach to estimating resolution in arrival time inversion is the 'checkerboard' test that attempts to recover alternating positive and negative perturbations in abutting prisms.We conducted versions of this test in which ±5 per cent perturbations relative to a background 1-D model are applied to individual rectangular prisms that are 20 × 20 km in area and 10 km in depth.The perturbed elements are separated from one another by at least one prism with no perturbation so that the potential for smearing may be observed.
Results of these tests show that, in general, regions sampled with at least 10 ray paths are reasonably well recovered (Fig. 13).Moreover, despite recovered perturbations being smoothed at every iteration, the amount of smearing outside the anomaly is small over most of the region.In the test shown here (Fig. 5), V p and V s are both perturbed by ±5 per cent, with a zero V p /V s anomaly.The near identical recovery of V p and V s in this example shows that V s can be resolved when solving for V p /V s rather than V s directly, even when V p /V s itself is associated with a null signal.

Reconstruction of the model
While recovery of an isolated perturbation can be a fair representation of resolution of that variable, reconstruction of a prismatic pattern includes effects of coupling between perturbations that would be appropriate only if the Earth were indeed composed of alternatingly perturbed prisms.For example, such a representation requires 3-D ray paths to alternately focus and defocus around each perturbation.
An alternative way to demonstrate the ability of a data set to recover interpretable features at the scale and amplitude as those that appear in the model is to reconstruct the same distribution of imaged anomalies (e.g.Prevot et al. 1991).Hence, we generated a synthetic dataset using the same source-receiver distribution as the actual dataset and followed the same iterative procedure as with real data to attempt to recover the preferred model from a 1-D starting model.The results (Fig. 14) show that the recovered model generally looks very much like the true model both in pattern and amplitude, but there are some exceptions.First, while the recovered model manages to reproduce the absolute wave speeds rather well, it tends to slightly underestimate (by as much as a percent) the anomalies in the southern sections and overestimate them by a similar amount in the north.This is likely a result of the greater density of both earthquakes and stations in the north.Second, there is some lateral smearing of V p in the northernmost sections at depths greater than about 90 km that prevents the appearance of the isolated low V p /V s in the true model.The reason for this lateral smearing is not obvious; indeed, one might expect that the relative abundance of seismicity in this part of the seismic zone would lead to better resolved features at depth.This smearing suggests that real earth structure in the north and at depths greater than 90 km is likely to be laterally smeared in the modelling of real data as well, and hence that the V p and V s anomalies are in reality smaller in lateral extent than they appear in the image.

Thin slab test
One of the main features we seek to interpret is the contrast between wave speeds in the slab and the supraslab mantle, and to test the robustness of this gradient we constructed hypothetical models by placing a thin (10 km thick) body with a relatively low V p /V s (1.67 versus 1.75) ratio representing that part of the slab just beneath the mantle seismicity.The purpose of this test is to investigate the extent to which (1) localized anomalies within the slab spread across steep gradients into surrounding regions and (2) intraslab anomalies may be imaged by body waves.Several bodies with different lateral and vertical dimensions where tested; the salient results can be illustrated by models reconstructed from a V p /V s anomaly located in a thin (15 km thick), curved zone located along the underside of the (upper) seismic zone from about 50 km depth to the base of the model (Figs 15 and 16).We examine the resolution of lateral variations in the slab by restricting the lateral extent of one of the slab models to about 1 • of latitude.As with all tests, we mimic the body wave arrival time data set used in the real inversion, but as only structure below 60 km is perturbed we do not include surface waves in a joint inversion.In this case, V p remains at its 1-D value while V s is perturbed to give an appropriate V p /V s contrast.In addition to V p /V s , this allows us to examine how well V s can be recovered from estimates of V p and V p /V s .
The V p /V s and V s anomalies in the reconstruction (Figs 15 and 16) are both well recovered in both models at depths shallower than the deepest earthquakes (about 120 km depth) but, due to a lack of ray paths, no deeper.The thickness and location of the zone are well recovered, with very little contamination of the surrounding areas by the lower V p /V s in the slab, or of higher V p /V s from outside the slab.The upper termination of the slab at 50 km depth is also well constrained.Finally, the lateral variation in the extent of the slab is well reconstructed.At the same time, while the variations in V p /V s are evident in the reconstruction, the actual values are underestimated in much of the slab and show intraslab heterogeneity on the order of 0.026 along its strike and dip.

Smoothing, weighting and starting models
Among the subjective assumptions made as part of the joint inversion are the extent to which the model is smoothed, the relative influence of different date types, and the starting model chosen.Several tests were performed varying each of these assumptions individually, and the general conclusions can be summarized in a few comparative tests.Specifically, the BWO inversions discussed above use a different set of data (no surface waves), a different starting model (the 1-D Antofagasta model of Husen et al. 1999), and/or a shorter smoothing window (±1 node about the centre) than that used to generate the preferred model.A comparison of BWO models with different moving average window lengths (top and bottom panels in Fig. 17) shows that, as may be expected, the effects of smoothing are akin to passing the rough model through a low pass filter.All of the first order features that we see in the smoothed model can be identified in the rough one, but the fit to the data using the rough model is only marginally better (by about 5 per cent in variance).The difference between the BWO and joint models arises partly from the difference in starting model; in particular, the Moho transition is more gradual in the joint model than the BWO models.Moreover, the inclusion of surface waves helps constrain the wave speeds in the upper few tens of km.These choices account for differences in the supraslab mantle wave speeds determined by the BWO inversion.For example, the lower wave speeds in the upper mantle in the BWO model are likely compensating for crustal thickening towards the central Andes.Indeed, the added constraint on crust and uppermost mantle structure provided by surface waves is a principle motivation for a joint inversion.Note that while the rough BWO model manages to fit the body waves better than the joint model by about 2 per cent, the joint model does a slightly better job (by 3 per cent) than the equally smoothed BWO model.Of course, neither of the BWO models manages to fit the surface wave data nearly as well as the joint model (surface wave variance in the joint model is reduced by about 70 per cent compared to that in either of the BWO models).
Creating a starting model for a joint inversion from the results of an SWO inversion requires a choice of V p /V s ratio.As discussed above, a V p /V s of 1.76 is consistent with the arrival times in the body wave data set.At the same time, tests with a range of values from 1.73 to 1.78 shows that while the patterns and gradients in V p /V s are virtually invariant, the average value of V p /V s remains close to its starting value.The fit to the entire data set is better with a V p /V s of 1094 D. Comte et al. 1.76 than with either 1.73 or 1.78 (by about 3 per cent), and is nearly the same as 1.75.Moreover, there are about 1100 earthquakes (or roughly 10 per cent of the entire dataset) distributed throughout the study area, for which the fit to the data improves on an event by event basis with a V p /V s of 1.76 (Fig. 18).Since events are relocated prior to each iteration, the minimization of residuals on an event by event basis is a valid metric for judging the significance of a given model, and on this basis, as well as the improvement in overall variance, we take the model generated with a starting value of 1.76 as the preferred model.

Inferences from trial results
As a result of the trials described above, we conclude that the first order features in the preferred model are robust.Specifically, the increase in V p /V s , from the subducted slab into the supraslab mantle, as well as the size and location of the high V p /V s regions, appear well constrained.These features are reasonably independent of the starting model, but the actual values of V p /V s tend to stay close to the starting value.Nevertheless, we can appeal to a preference of the data for a certain range of V p /V s to argue for an optimal value.While large gradients in V p , V s and V p /V s are well constrained, localized regions with small changes (on the order of 1 per cent in velocity and 0.017 in Vp/Vs) are probably not significant.We are less confident in features in the model at depths greater than about 110 km.Although tests that recover specific features such as a low V p /V s in the region between the deeper parts of the double seismic zone were successful, other tests suggest lateral smearing may blur the image.

Hydration of the supraslab mantle
The most conspicuous first order features in our preferred model are (1) the gradient from low to high V p /V s from the slab into the supraslab region, mostly at depths greater than about 40 km, along the entirety of the northern Chile margin, and (2) the extensive region, particularly north of about 21 • S, of V p /V s > 1.8 located mostly above the interface between the slab and the supraslab mantle as represented by the Wadati-Benioff Zone (WBZ) earthquakes.This high V p /V s region is also characterized by relatively low values of V p and V s to depths of about 80 km.Because the introduction of fluids into the mantle can decrease both V p and V s and increase V p /V s by either partial melting (e.g.Hammond & Humphreys 2000;Karato 2012) or serpentinization (e.g.Christensen 1996Christensen , 2004;;Ji et al. 2013), a viable explanation of these features is the dehydration of the subducted slab and the release of volatiles into the supraslab mantle.Variations on this scenario have been suggested in previous studies of subduction zones (e.g.Kamiya & Kobayashi 2000;Bostock et al. 2002;DeShon & Schwartz 2004;Eberhart-Phillips & Bannister 2010;Zhang et al. 2010) including northern Chile (e.g.ANCORP Working Group 2003).Our results suggest that the way in which the mantle is hydrated can depend not only on the type of convergent margin, but on variations in the subduction environment within a particular margin type.
Locating the supraslab mantle along the Andean margin requires an estimation of both the continental Moho and the top of the subducting Nazca Plate.The latter is assumed to coincide with the upper boundary of the WBZ earthquakes.The depth of the continental Moho is known to some extent from previous studies in this area (Beck et al. 1996;Yuan et al. 2000Yuan et al. , 2002;;McGlashan et al. 2008) that suggest that the thickness of the continental crust increases from about 40 km beneath the Coastal Cordillera and Central Depression to about 60 km under the Western Cordillera.Hence, the supraslab mantle 'wedge' at depths of 40-60 km is more like a channel with a width of perhaps a few 10 s of km.Most sections of our model (Figs 10 and 11) show a prominent eastdipping gradient in V p and V s between 40 and 60 km depth about 25 km above the top of the WBZ that we interpret as the crust to mantle transition.
V p and V s are both low (∼7.2-7.6 km s −1 and 4.1-4.3km s −1 , respectively), and V p /V s is generally high (1.78-1.80),within the supramantle at depths between about 40 and 80 km depth, with the more extreme values found in the northern sections.These trends are consistent with serpentinization of this part of the mantle.Thermomechanical modelling of the subducting Nazca Plate (Dorbath et al. 2008) suggest that temperatures are cold enough at these depths (<700 • C) for serpentinite to be stable (Ulmer & Trommsdorff 1995).
In most cross sections, a gradient in V p , V s and V p /V s collocates with the upper part of the WBZ, and correlates quite well with the 'Nazca Reflector' seen in both receiver function (Yuan et al. 2000) and active source (ANCORP Working Group 2003) imaging.Note that while there is a positive gradient in V p and V s from the supraslab mantle into the interior of the slab, V p and V s within the slab are low.In the northern sections, V p /V s within this part of the slab is high, suggesting the presence of serpentinite within this part of the slab.These contrasts are greater north of 21 • S, suggesting more significant hydration of the subducted slab in the north.
While estimates derived from the existing literature disagree on the extent to which this part of the mantle is serpentinized, the reduced V p and V s combined with increased V p /V s is consistent with higher levels of serpentinization in the north.For example, the high P-T model of Ji et al. (2013) would predict ∼45 per cent serpentinite in the south and ∼70 per cent in the north, while predictions from the dunite-antigorite model of Christensen (2004) are closer to 30 and 42 per cent (Fig. 19).
V p and V s both increase in the WBZ between 80 and 100 km and reach 8.2 and 4.6 km s -1 , respectively.V p /V s remains high below 80 km, and in fact is higher in the 80-120 km depth range than anywhere else in the model.For the most part the highest V p /V s is in a narrow zone that runs more or less parallel to the WBZ.As with the shallower parts of the model, V p /V s is in general much higher in the  north (1.82) than in the south (1.78).Combined with the normal to high V p and V s in the supraslab mantle below 80 km depth, plus the instability of serpentinite at higher temperatures in this part of the mantle, we infer that this part of the supraslab mantle is partially melted.Based on the model of Hammond & Humphreys (2000), the observed increase in V p /V s could be achieved with a melt fraction of about 1 per cent in the north and less than half of that in the south.
We interpret the large, downdip positive gradient in V p and V s within the slab at depths between about 80 and 100 km to be a consequence of accelerated phase transitions from basalt to eclog-ite in the crust and from serpentinite to peridotite in the mantle of the slab.The region of high V p /V s extends nearly the length of seismic zone but returns to average values at about 120 km depth, suggesting that phase transitions within the slab have ceased at this depth.The increased level of seismic activity between 100 and 130 km depth is indicative of higher rates of phase transition.Therefore, as similarly suggested by Yuan et al. (2000), the lessening of wave speed contrasts between the WBZ and the supraslab mantle at these depths could signify the completion of phase transitions in the slab.

The north-south contrast and surface geology
As discussed above, while the influence of water in the slab and supraslab mantle can be seen all along the northern Chile subduction zone, there are some significant differences between the northern and southern parts of the model, mostly having to do with the size of the anomalies and the volumes over which they occur.These differences are gradational, but the largest gradients occur at about 21 • S. The most significant of these is the extent of the region of high V p /V s , which, as discussed above, we associate with hydration resulting either in serpentinization of peridotite or the presence of partial melt.The difference in the size of this region (Fig. 21) north and south of 21 • S suggests a more profound influence of water in the north.
We note that the oldest (>50 Ma) oceanic lithosphere subducted along the Nazca-South American plates boundary is located along this part of northern Chile (Fig. 2).Because the effective elastic thickness of oceanic lithosphere and the depth to the brittle-ductile transition increases with age, older oceanic lithosphere has a higher water reservoir capacity (Watts 2001;Contreras-Reyes & Osses 2010;Contreras-Reyes et al. 2011).Hence, the effects of dehydra-tion are likely to be more apparent here than anywhere else along the Andean margin.
The northern region is much more seismically active in the 70-145 km depth range (prior to the 2104 Pisagua earthquake, the rate of activity was about twice as high in the north), consistent with earthquake activity being related to rates of dehydration reactions (Kirby et al. 1996;Omori et al. 2002Omori et al. , 2004)).Moreover, there is a shift in the locations of the deeper hypocentres to the east south of 21 • S (Fig. 4), meaning that the dip of the slab is less in the south.An explanation consistent with dehydration is that the greater loss of volatiles in the north leads to increased production of eclogite within the subducted slab (e.g.Hacker et al. 2003;Green et al. 2010), which in turn increases its negative buoyancy.
There is also a north-south difference in patterns of volcanic activity.In particular, the northern part is associated with an anomalous volume of Neogene ignimbrite deposits (Wörner et al. 1994;Wörner et al. 2002), which is consistent with a larger percentage of volatiles in the ascending magma.Interestingly, the southern boundary of the large hydrated region has an extended region of low V p /V s near the surface that corresponds to the location of the Pica gap in volcanism (Wörner et al. 1994; Figs 1 and 2).The northern part Figure 18.Histogram of the number of earthquakes with standard deviations within bins of 0.05 s for different choices of V p /V s in the starting model.Note that while results from V p /V s of 1.75 (blue) and 1.76 (red) are nearly the same, these choices reduce the standard deviation for more than 1100 events when compared to low (1.73) and high (1.78)values.The starting model refers to the 1-D SWO model shown in Fig. 8. also has prominent linear topographic scarps that deform Neogene to Pleistocene deposits along the forearc, including portions that overlie the interplate zone and the Precordillera piedmont.While these are dramatic topographic features, their activation mechanism remains poorly understood (i.e.Armijo & Thiele 1990;González et al. 2003;Allmendinger et al. 2005;Farías et al. 2005;Loveless et al. 2005;González et al. 2008).
The inferred regions of enhanced hydration correlate with the occurrence of megathrust earthquakes along this part of the margin.The high V p /V s at depths below 40 km is consistent with the downdip extent of large megathrust events being limited by sepertinization (e.g.Hyndman & Peacock 2003).High V p /V s also extends up to 20 km depth between 19 • and 19.8 • S (sections P1-P3 in Fig. 12), suggesting the presence of significant amounts of water at shallower depths along the plate boundary.The importance of fluids for interplate coupling and hence for rupture mechanics is well known (e.g.Tsuji et al. 2008;Nakajima et al. 2009;Uchida et al. 2009;Abers et al. 2013), and appears here in a seismic gap capable of generating a M w ∼9.0 megathrust earthquake.Recent geodetic studies (Métois et al. 2013;Ortega-Culaciati et al. 2015, Fig. 2) show a Note that while the estimates of serpentine per cent vary significantly, both correspond to a higher percentage north of 21 • S. heterogeneous distribution of interseismic coupling within this gap, with the north (18 • -20.5 • S) being significantly less coupled than the south (20.5 • -23 • S).This less coupled region correlates well with the shallow distribution of high V p /V s (Fig. 20), suggesting that hydration of the interface along the interplate contact controls the level of coupling.
These contrasts in the effects of hydration should be related to variations in the level of hydration of the Nazca plate.While it is difficult to know the initial conditions of the subducted plate, there presently is a succession of seamounts located on a larger swell on the Nazca plate, referred to as the Iquique ridge, that intersects the trench at about 21 • S (Contreras-Reyes & Carrizo 2011; Figs 1 and  2).An extrapolation along the azimuth of this ridge into the subducted slab corresponds to the large region of high V p /V s that we see in the north.Because seamounts should be accompanied by enhanced hydrothermal circulation in the lithosphere, the subduction of the Iquique ridge could account for what we interpret as the elevated hydration of the supraslab mantle in this region.Note that this in turn suggests an additional role for subducted seamounts in the generation of megathrust earthquakes beyond the expected roughening of the plate boundary (e.g.Cloos & Shreve 1996;Scholz & Small 1997;Contreras-Reyes & Carrizo 2011;Geersen et al. 2015).

Comparison with other studies in northern Chile
A number of local seismic investigations of the crust and upper mantle beneath this part of the Andean margin have been published (e.g.Comte et al. 1994;Schmitz & Kley 1997;Graeber & Asch 1999;Patzwahl et al. 1999;Schmitz et al. 1999;Yuan et al. 2000Yuan et al. , 2002;;ANCORP Working Group 2003;Schurr et al. 2003;Rietbrock & Waldhauser 2004;Koulakov et al. 2006;Dorbath et al. 2008;Bloch et al. 2014).We note that the tomographic study of Graeber & Asch (1999) of the PISCO network deployed between 21 • S and 25 • S shows (1) a similar contrast of V p /V s from about 1.70 in the slab to 1.80 in the supraslab mantle, (2) extensive regions of low V p /V s (<1.70) in the crust, consistent with a largely felsic (i.e.granitic) lithology (Christensen 1999) and (3) a large increase in V p from about 7.5 to >8.5 km s −1 within the slab in the 80-100 km depth range.A similar change in V p within the slab was determined as well by Comte et al. (1994).
As discussed above, the Nazca Reflector (NR) imaged by the Andean Continental Research Project (ANCORP) EW seismic reflection profile at 21 • S (ANCORP Working Group 2003) aligns with the gradient in V p and V s found in our model.The high amplitude reflections of the NR terminate abruptly at 80 km depth, corresponding in our model to the increase in V p and V s to levels found in the surrounding mantle.Bloch et al. (2014) interpret the NR as an indicator of enhanced hydration, and suggest that a region of increased reflectivity above the NR to about 20 km depth corresponds to pathways for fluids within the crust and supraslab mantle, similar to those conjectured in previous studies (Springer 1999;Schurr et al. 2003;Koulakov et al. 2006).We note that this shallower region of enhanced reflectivity corresponds to a region of high V p /V s seen in sections P7 and P8 (Fig. 12a) that are closest to the ANCORP line.
The tomographic study of Dorbath et al. (2008) overlaps the northern section of our region (∼18 • -20 • S).Their estimates of V p and V s in the upper WBZ show the same increases from low to high values at about 80 km depth as our model.In fact, V p in their section P3 looks quite similar to section P3 in Fig. 10(a), including the decrease in V p in the lower seismic zone at depths >90 km.At the same time, their estimates of V s in the lower seismic zone, generally >5 km s −1 , are significantly higher than our estimate.(∼4.4 km s −1 ).As a result, they deduce (by division) a low V p /V s (∼1.68) in this part of the double seismic zone, in contrast to our value of 1.76.Our model does show a region of low V p /V s , but it appears in the region between the two zones.Dorbath et al. (2008) also do not find any coherent increase in V p /V s in the supraslab region, which is a dominant feature in our model.While the causes are not easily discerned, these differences in results are most likely due to differences in data and methodology.

C O N C L U S I O N S
Based on the results of this investigation and a comparison with results from previous studies in this and other subduction zones, we infer that the main features we observe are a consequence of dehydration of the subducting slab, phase transitions from basalt to eclogite and serpentinite to peridotite within the slab, and the hydration of the supraslab mantle.As summarized in Fig. 21, fluids from the subducting Nazca plate serpentinize the supraslab mantle between 40 and 80 km depth.At depths between 80 and 100 km, the phase transformations within the slab accelerate, resulting in higher wave speeds and more intense seismic activity.Water released into the supraslab mantle at these depths induces partial melt which makes its way up to the active arc in the Western Cordillera.At the end of the seismic zone at 140 km depth, the slab is dehydrated and the phase transformations are complete.
We note that this scenario is similar in many respects to that proposed by Oncken et al. (2003) for the ANCORP results at 21 • S from an entirely different set of observations.The current study reveals a significant contrast in the effects of hydration north and south of this latitude.We attribute the more intense effects in the north to a gradient in the level of hydration in the subducted Nazca plate that eventually results in a release of larger volumes of water into the supraslab mantle.These larger quantities of water also appear to influence the nature of volcanism in the magmatic arc and the coupling of the plate boundary at depths of 20-40 km.We suggest that the cause of this greater in-flux of water is the subduction of the Iquique ridge on the Nazca Plate.

A C K N O W L E D G E M E N T S
The ambient noise processing was greatly facilitated by the availability of the CU Boulder software and we especially thank Mike Ritzwoller for helpful advice at various stages of the processing.We also acknowledge GFZ/IPOC (doi:10.14470/PK615318)for use of their high quality data and as well the efforts of the many people who collected the data from the many other networks used in this study.We thank Frederik Tilmann and Ingo Grevemeyer for thoughtful reviews of this manuscript.This study was funded by FONDECYT Low V p and high V p /V s within the supraslab mantle at depths between about 40 and 80 km correspond to serpentinization of the mantle wedge.The gradient to high V p with high V p /V s between about 80 and 120 km transitions to partial melt that ascends (symbolized by the large arrow) to the volcanic arc in the Western Cordillera.Within the subducting Nazca Plate, a basaltic crust and serpentinized mantle at shallower depths transitions to an ecolgitic crust and peridotitic mantle at greater depths.

Figure 1 .
Figure 1.Simplified geologic map of northern Chile.The arrow to the left of the trench indicates the direction of the Nazca plate relative to South America.

Figure 2 .
Figure 2. Seismotectonic setting of northern Chile and southern Peru.Left-hand panel: estimated ruptures of historic seismicity are shown at the far left; the bold purple line corresponds to the last rupture of the northern Chile seismic gap in 1877.Recent ruptures are indicted on the map by the year in which they occurred.Map colours correspond to bathymetry of the Nazca plate as shown in the palette at the lower right.Ages of the subducting Nazca plate are shown next to dashed lines.Thin rectangle locates the map on the right.Right-hand panel: interseismic plate interface backslip rate, coseismic slip (white contours) for 2014 (M w 8.2) Pisagua main shock and largest (M w 7.7) aftershock, post-seismic slip (blue contours) induced by Pisagua main shock event, all determined from GPS observations by Ortega-Culaciati et al. (2015).The epicentre (blue star) and focal mechanism are shown for the 2014 Pisagua main shock event.Coseismic slip (white contours) is also shown for the other recent large earthquakes in the study area: the Antofagasta (M w 8.1) 1995 (Pritchard & Simons 2006) and Tocopilla (M w 7.7) 2007 (Béjar-Pizarro et al. 2010) events.

Figure 3 .
Figure 3. Locations of seismic stations used in this study, with symbols indicating provenance as shown in the insert at the upper right.The location of the northern Chile seismic gap is shown by the dashed purple line.

Figure 4 .
Figure 4. Locations of earthquakes used in this study, indicated by circles filled with colours corresponding to depth ranges shown in the insert at the lower left.White squares joined by dashed lines and labelled P1-P16 locate the ends and midpoint origins (X = 0) in the cross sections shown in Figs 10-12.

Figure 5 .
Figure5.Normalized noise distribution E as a function of propagation azimuth for a selection of periods (5-9 s).Note that most of the noise propagates to the ENE, as may be expected from the proximity of the recording stations to the Pacific coast of South America.

Figure 6 .
Figure 6.Phase velocity bias determined from the noise distributions shown in Fig. 5 as a function of period.Despite the heterogeneity in the noise distribution, the expected bias is small (<0.6 per cent) for all station pairs.

Figure 7 .
Figure 7. Phase velocity maps at selected periods (indicated at the lower right of each panel) determined from ambient noise generated Rayleigh waves.The coastline of northern Chile is plotted as a reference.Values are percent differences from an average 1-D background, and the contour interval is 2 per cent.

Figure 8 .
Figure 8. 1-D starting models used in this study.Blue lines show V p and V s from Husen et al. (1999) used in the BWO inversion.Red lines show a model deduced from a 1-D average of the BWO inversion.Vs from this model is used as a starting model for the SWO inversion.

Figure 9 .
Figure 9.A Wadati plot [P traveltime (Tp) versus S-P traveltime (T s -T p )] for all arrival times in the dataset.The red line is a least-squares fit corresponding to a V p /V s of 1.7622 ± 0.0002.

Figure 10 .
Figure 10.(a) Cross-sections P1-P8 of V p for the preferred model.Locations of sections are shown in Fig. 4, and correspond to the labels in the lower left-hand corner in each section.Depth is relative to sea level, and distance is with respect to the midpoints of each section (plotted as white squares in Fig. 4).Values are indicated by the colour palette in the upper left part of the figure, and are shown for thick contours in between the sections.Thin contours are at 0.2 km s -1 intervals.The plots are shaded to reflect density of sampling, with the brightest regions corresponding to regions samples by at least 10 ray paths.Cross-sections of hypocentres within ±15 km perpendicular to a vertical plane defined by the section are shown as small open circles.(b) Cross-sections P9-P16 of V p for the preferred model.Meanings of symbols and colours are the same as in (a).

Figure 11 .
Figure 11.(a) Cross-sections P1-P8 of V s for the preferred model.Meanings of symbols and colours are the same as in Fig. 10(a), except that the thin contour interval is now 0.5 km s -1 .(b) Cross-sections P9-P16 of V s for the preferred model.Meanings of symbols and colours are the same as in Fig. 10(a), except that the thin contour interval is now 0.5 km s -1 .

Figure 12 .
Figure 12.(a) Cross-sections P1-P8 of V p /V s for the preferred model.Meanings of symbols and colours are the same as in Fig. 10(a), except that the thin contour interval is now 0.005 and the thick contour interval is 0.02.Thick contour values start at 1.72.(b) Cross-sections P9-P16 of V p /V s for the preferred model.Meanings of symbols and colours are the same as in Fig. 10(a), except that the thin contour interval is now 0.005 and the thick contour interval is 0.02.Thick contour values start at 1.72.

Figure 13 .
Figure 13.Map view of results of the prism reconstruction test at 50 km depth for V p (left-hand panel) and V s (right-hand panel).Values are indicated by the colour palette in the upper left-hand part of the figure.The plots are shaded to reflect density of sampling, with the brightest regions corresponding to regions samples by at least 10 ray paths.The rectangles locate the bounds of the prisms that are perturbed from the background 1-D model.Note that the reconstructed images are nearly identical for V p and V s .

Figure 14 .
Figure 14.Results of the reconstruction tests for V p (top two panels), V s (middle two panels) and V p /V s (bottom two panels).In each case the 'True' model, taken from the preferred model, is shown at the top, and the reconstructed model directly beneath.These sections correspond to P12 in Figs 4 and 10-12.

Figure 15 .
Figure 15.Map views of V p /V s for the short (top panel) and long (bottom panel) hypothetical slab models shown at 100 km depth.The true model is at the right, and the reconstructed model is at the left.

Figure 16 .
Figure 16.Cross-section views reconstruction of V p /V s for the long (top panel) and short (bottom panel) slab models shown at 20 • S. The true model is shown in the middle.Bright areas are those sampled by 10 or more rays.This section corresponds to P5 in Fig. 4.

Figure 17 .
Figure 17.Comparison of models generated with body waves only (BWO) with minimal smoothing (top row), the preferred model (middle row), and BWO with the same smoothing as the preferred model (bottom) row.Shown are V p (left-hand column), V s (middle column) and V p /V s (right-hand column).Bright areas correspond to regions sampled by at least 10 rays.Each cross section corresponds to P3 in Fig. 4.

Figure 19 .
Figure 19.Values of V p , V s and V p /V s taken from the mantle wedge region of the preferred model for regions north (red symbols) and south (blue symbols) of 21 • S, plotted on the Dunite-Antigorite model of Christensen (2004) (squares) and the high PT measurements of Ji et al. (2013) (circles).Note that while the estimates of serpentine per cent vary significantly, both correspond to a higher percentage north of 21 • S.

Figure 20 .
Figure 20.Tectonic context of the region of high V p /V s (>1.78) as shown by a blue surface that envelopes the volume.Most of this region is located in the supraslab mantle north of 21 • S, and may extend close to the surface in the region where there 2014 Pisagua earthquake occurred.Seismicity is plotted as small circles filled with colours representing depth range as shown in Fig. 4. Active volcanos are shown by red circles.The dashed red line is the Pica gap.The volume within the blue surface represents a region of enhanced hydration leading to supraslab hydration and partial melting, particularly north of 21 • S.

Figure 21 .
Figure21.Summary of the principle inferences of this study as a simplified cartoon (top panel) and plotted on section P3 for V p and V p /V s (bottom panel).Low V p and high V p /V s within the supraslab mantle at depths between about 40 and 80 km correspond to serpentinization of the mantle wedge.The gradient to high V p with high V p /V s between about 80 and 120 km transitions to partial melt that ascends (symbolized by the large arrow) to the volcanic arc in the Western Cordillera.Within the subducting Nazca Plate, a basaltic crust and serpentinized mantle at shallower depths transitions to an ecolgitic crust and peridotitic mantle at greater depths.

Table 1 .
Summary of the deployments that recorded the data used in this investigation.Operator and network abbreviations refer to the Universidad de Chile (UCH), Universidad de Antofagasta (UA), Institut de Recherche pour le Developpement (IRD) the Institut de Physique du Globe de Strasbourg (IPGS), GeoForschungsZentrum (GFZ), Freie Universität Berlin (FUB), the Integrated Plate Boundary Observatory Chile (IPOC), the Chilean Centro Sismológico Nacional (CSN) seismic networks, the Oficina Nacional de Emergencia (ONEMI), the Departamento de Geofísica of the Universidad de Chile (DGF), and the Iquique Local Network (ILN-IPOC).Instrument types are short period (sp) and broad band (BB).Start and end times of deployment are in year and Julian day (JD).Data from IPOC (doi:10.14470/PK615318) is available online via the GFZ website.All other data available on request from DGF.