Imaging the seismic velocity structure of the crust and upper mantle in the northern East African Rift using Rayleigh wave tomography

This manuscript has been accepted for publication in Geophysical Journal International. It has undergone peer-review and was accepted on 14/04/2022 and available online on 22/04/2022. The manuscript is the unformatted accepted version. The final version of the manuscript is available via the ‘Peer-reviewed Publication DOI link on the right hand side of this webpage and at: https://doi.org/10.1093/gji/ggac156 .


Introduction
Continental rifting is a fundamental component of plate tectonics that has shaped the geological record. Yet understanding the mechanisms and processes of continental rifting is often obscured by the complex geology left behind in ancient rifts, and only a few tectonically active rifts are available for study in the present day. The northern East African Rift (EAR), is a primary example of a tectonically active magma-assisted rift (e.g. Ebinger & Casey, 2001;Mackenzie et al., 2005). In this end-member style of rift, magma injection weakens the relatively strong continental lithosphere, and allows tectonic forces to extend the plate (e.g. Buck, 2006;Koptev et al., 2018;Rooney et al., 2014) (Figure 1). Models and geological observations of magmatic continental rifting suggest melt production and localised emplacement occur beneath the rift valley (Buck, 2006;Corti et al., 2015;Keir et al., 2012;Rooney et al., 2005Rooney et al., , 2014. Recent velocity models also observe punctuated slow velocity zones in the upper asthenosphere, though their cause and extent, both in time and space, are debated. Crucially, many studies have focused to the rift valley and rift flanks with less attention paid to the off-rift regions. Key information about the character of the off-rift lithosphere and asthenosphere structure, and the relationship to the lithosphere-asthenosphere system within the rift, combined with the location and migration of melt, is needed to fully understand plate extension. Below we detail the pertinent information about the northern EAR system, to motivate our present study using Rayleigh wave tomography, in order to characterise the unrifted flanks and rifts and better understand these relationships.

Geophysical constraints on the crust
The crust beneath the region has been modified by rifting and magmatism both in terms of thickness and composition. The crust is 35-40 km thick beneath the Somalian Plateau (Mackenzie et al., 2005;Stuart et al., 2006) and 30-35 km thick beneath the western edge of the Ethiopian Plateau (Ogden et al., 2019). These crustal thicknesses are likely representative of pre-rifted unmodified crust (Mackenzie et al., 2005;Ogden et al., 2019;Stuart et al., 2006). In contrast, crustal thickness for the eastern part of the Ethiopian Plateau ranges from 40-45 km, with the additional crust commonly interpreted to be several kilometres of magmatic addition from the flood basalt volcanism and likely associated lower crustal intrusions (Mackenzie et al., 2005;Ogden et al., 2019;Stuart et al., 2006;Wang et al., 2021).
Crustal thicknesses beneath the rift vary from <16 to 38 km with variable internal structure.
Beneath the MER the crust is predominantly thinner than the plateaus at 25-35 km thick, with the exception of the southern MER, where crustal thickness ranges from 30-38 km Mackenzie et al., 2005;Ogden et al., 2019). Receiver function and active source tomography studies observe multiple discontinuity boundaries and faster velocities at lower crustal depths within the rift, providing evidence for magmatic intrusions Lavayssière et al., 2018;Maguire et al., 2006;Stuart et al., 2006;Wang et al., 2021). In Afar, crustal thickness ranges from <16-26 km, with the Danakil depression having the thinnest crust of the subaerial rift system (Dugda et al., 2005;Hammond et al., 2011;Makris & Ginzburg, 1987). Tomographic P-wave studies at crustal depths for both the MER and Afar image fast velocities which have previously been interpreted as cooled gabbroic intrusions (Daly et al., 2008;Mackenzie et al., 2005;Maguire et al., 2006) which is further corroborated by gravity studies (e.g. Cornwell et al., 2006). S- wave and surface wave velocity studies, which are more sensitive to fluids, observe slow velocities in the MER and Afar suggesting a fluid component, in contrast to P-wave studies (Chambers et al., 2019;Keranen et al., 2009;Kim et al., 2012;Korostelev et al., 2015).
Vp/Vs ratios for the crust are high within the rift (Daly et al., 2008), and in places >1.9, which have been interpreted as fluids and melt emplacement beneath the rift (Dugda et al., 2005;Hammond, et al., 2011;Ogden et al., 2019;Stuart et al., 2006). Magnetotelluric surveys observe high conductivity bodies within the crust beneath the rift and favour a similar interpretation of fluids and melt (Samrock et al., 2018;Whaler & Hautot, 2006).
SKS splitting studies of azimuthal anisotropy Kendall et al., 2005), coupled with surface wave studies that constrain radial (Chambers et al., 2021), or radial and azimuthal anisotropy (Bastow et al., 2010), provide evidence for both dykes and sills in the lithosphere. Within the MER anisotropic receiver function and local earthquake splitting results show the largest amounts of anisotropy in the rift (Hammond, 2014), and surface waves make similar observations which were interpreted as oriented melt pockets (Bastow et al., 2010;Chambers et al., 2021). In Afar, anisotropic studies suggest a larger contribution from dyke intrusions and vertical cracks than the MER (Hammond, 2014;Keir et al., 2011).
These observations taken together have previously been interpreted as evidence for rift aligned melt pockets within the crust and lithospheric mantle (Bastow et al., 2010;Chambers et al., 2021;Hammond, 2014).
Off-rift, there is similar evidence for ongoing lower crustal intrusions and magmatic activity.

Geophysical constraints on the mantle lithosphere and asthenosphere
Receiver functions have been used to constrain lithospheric thickness across the region. For instance, S-to-P receiver functions image negative phases interpreted as the lithosphereasthenosphere-boundary (LAB) at 60-80 km depth beneath the Ethiopian Plateau, whereas beneath the MER and Afar, S-to-P does not find significant negative LAB discontinuities (Lavayssière et al., 2018;Rychert et al., 2012). The lack of a strong LAB phase could be consistent with either a very gradual velocity decrease with depth, or the lack of a fast mantle lithosphere (e.g. Fischer et al., 2020). Given that thermal gradients with depth are not likely to be very gradual in a rifting environment, the mantle lithosphere was interpreted as missing (Rychert et al., 2012) or very thin (<~10 km) (Lavayssière et al., 2018), which would make detection by S-to-P receiver functions difficult Rychert et al., 2020). A joint inversion of Rayleigh wave group velocities and P-to-S receiver functions found similarly thick lithosphere beneath the plateau and virtually no fast lithospheric lid beneath the MER and Afar (Dugda et al., 2007). Beneath two stations in Afar the study found a thin (~10 km) layer at 40-50 km depth with a shear velocity of ~4.1 km/s that was interpreted as the mantle lithosphere.
Previous studies that considered surface waves or jointly considered surface waves and receiver functions suggest the upper mantle in the region is slow, both on and off-rift, with Vs = 3.80-4.25 km/s (Gallacher et al., 2016;Keranen et al., 2009). These velocities are substantially slower than those observed at similar depths beneath continental interiors, where shear velocities are typically >4.45 km/s (Kennett et al., 1995). The area is also characterised by Vp values that are 4-10% slower than the global mean Boyce et al., 2021;Fishwick, 2010). One explanation for the observed slow velocities are elevated mantle potential temperatures where petrological modelling suggested values elevated by 100-170°C beneath the northern EAR (Ferguson et al., 2013;Rooney et al., 2012b). While elevated temperatures may account for off-rift velocities (e.g. Gallacher et al., 2016;Rooney et al., 2012b), velocities beneath the rift require another component such as a small amount of partial melt (Bastow et al., 2010Chambers et al., 2019;Gallacher et al., 2016;Hammond et al., 2013).
At mantle depths, the slow velocities bodies are not continuous, but punctuated in character in regional and local scale studies (e.g. Bastow et al., 2005;Civiero et al., 2015Civiero et al., , 2016Gallacher et al., 2016;Hammond et al., 2013). Body wave tomography studies find evidence for punctuated slow velocity anomalies at asthenospheric depths, beneath and off set from the rift . A teleseismic surface wave study by Gallacher et al. (2016) also found punctuated slow velocity zones, although did not observe the offset from the rift in the upper mantle, possibly due to lower horizontal resolution when compared to body waves.
They interpreted these slow velocity zones as segmented buoyancy driven upwellings.

Motivation
In summary, the northern EAR ( Figure 1) has been extensively imaged creating multiple velocity models at a range of depths (e.g. Bastow et al., 2008Bastow et al., , 2005Civiero et al., 2015;Keranen et al., 2004;Kim et al., 2012). A common feature of the models that image the mantle (e.g. Bastow et al., 2008Bastow et al., , 2005Civiero et al., 2015;Gallacher et al., 2016) is the presence of punctuated slow velocity zones in the asthenosphere, though their cause is debated, with interpretations ranging from mantle plumes or plumelets (Civiero et al., 2015;Hammond et al., 2013), to focused upwellings and/or decompressing melting resulting from mechanical thinning of the lithosphere , to buoyancy driven mantle upwelling (Gallacher et al., 2016). The depth and lateral extent of the slow velocity bodies have been less well constrained leading to much debate regarding the primary controls on melt generation and whether the slow velocity zones are located solely beneath the rift (Gallacher et al., 2016), offset towards the rift flanks , or located beneath both the rift and plateaus (Civiero et al., 2015(Civiero et al., , 2016. In addition, melt migration pathways between the mantle and crust, and the nature of crustal magma storage are not wellknown and consequently debated. At lower crustal depths and uppermost mantle depths, melt may be stored at the rift scale (Jin et al., 2015;Smith 1994), or as more isolated bodies throughout the full crust (Cashman et al., 2017;Hübert et al., 2018). off-rift regions and on-rift regions were complicated to produce given the large range in methodologies, resolutions, spatial scales and assumptions between studies.
An example of variability in approach is demonstrated in the three ambient noise tomography (ANT) studies in the region (Chambers et al., 2019;Kim et al., 2012;Korostelev et al., 2015).
One study presented group velocity maps between 7 -28 s period and a full 3-D shear velocity model (Kim et al., 2012), another presented phase velocity maps from 9-25.5 s period (Korostelev et al., 2015), and a further study presented phase velocity maps from 8-33 s and a full 3-D shear velocity model (Chambers et al., 2019). Each study also covered a different region (Afar (Korostelev et al., 2015), central MER (Kim et al., 2012) and broader rifted regions (Chambers et al., 2019)). In addition, while the group and phase velocity maps give a general sense of lateral variability, they cannot be quantitatively compared due to the differing sensitivity/depth averaging of the methods. Therefore, quantitative comparisons among these studies and across the entire region are not possible.
A similar example can be given for body wave tomography studies. Previous body wave tomography in the region used relative arrival-time analysis Benoit et al., 2006;Hammond et al., 2013). This provides information relative to a regional, not global background mean making quantitative comparisons among these studies from different areas challenging. Furthermore, it is difficult to directly infer physical properties such as temperature and/or melt from these studies (e.g. Bastow, 2012). structures. The joint inversion also allows us to compare the presence/absence of slow velocity anomalies likely related to melt in the asthenosphere to those in the crust in a consistent manner to better constrain the pathways of melt and associated dynamics.

Datasets
We used data from 13 temporary seismic networks and 5 permanent stations installed

Data Processing
The vertical component data was downsampled to 1Hz, normalised and whitened with a 4 th order Butterworth bandpass filter between 0.005-0.4 Hz following the method of Bensen et al. (2007). The data were cross-correlated on the 24-hour long waveforms for each concurrently running station pair. The cross-correlograms for every day and each station pair were then stacked to improve the signal to noise ratio (SNR). Station pairs with less than 10 days of continuous recording, 10 days' worth of stacked cross-correlation functions, interstation distances <3 or a SNR <3 were removed (Bensen et al., 2007;Chambers et al., 2019;Harmon et al., 2007) resulting in 34991 total cross-correlation functions and 6716 unique cross-correlation functions (Supplementary Figure S1). We examined the stability of the cross-correlations through time, and found that typically >1 month stacks produced waveforms with phases that were within 1-2 s of the long term stack. SNR increased as expected with longer time period stacks and is consistent with numerous previous studies (e.g. Bensen et al., 2007). Then the fundamental mode Rayleigh wave data were windowed using a time variable filter (Landisman et al., 1969), and the Fourier amplitude and phase calculated at each frequency of interest via a fast Fourier transform. For the ambient noise phase data we assign an a priori error (square root of the diagonals of Cnn, the data covariance matrix from the iterative damped least squares inversion) of 6% of a cycle.

1-D Phase Velocity
The phase velocity dispersion across the region was estimated using a spatial domain technique across the entire array. A zero order Bessel function of the first kind was fitted to the real part of the Noise Correlation Function (NCF) in the Fourier domain by searching over phase velocities from 2.5-5 km/s in 0.01 km/s steps for every period of interest (Aki, 1957). For each stacked NCF the phase was measured at each period by unwrapping the phase, using the average phase velocity curve at the longest periods, to resolve cycle ambiguity (Bensen et al., 2007;Harmon et al., 2008).

2-D Phase Velocity
The phase velocity maps were then generated by inverting the phase data using the Born approximation 2-D phase sensitivity kernels (Zhou et al., 2004) and an iterative damped least squares approach (Tarantola & Valette, 1982) using the equation: where is the current model at iteration , is the change to the model after the next iteration, is the matrix of partial derivatives from the kernel at each node (Saito, 1988), is the data covariance matrix, is the model covariance matrix, is the difference between the observed and predicted phase, and is the starting model (Harmon et al., 2007;Tarantola et al., 1982). The final term ( ) damps the model towards the initial model using the a priori model covariance.
We used a regular 0.25° x 0.25° grid of nodes as our parameterisation for the inversion ( Figure 2) and averaged the sensitivity kernel between each station pair onto the nodes (Harmon et al., 2013;Yang & Forsyth, 2006). The sensitivity kernel is calculated at every period for each station pair on a densely sampled grid (0.1° x 0.1°) and then the Gaussian distance-weighted average value is taken to determine the value at each node on the coarser grid with a Gaussian width (2-sigma) of 40 km. The inversion estimates the average phase velocity at each node, then we -undo‖ the Gaussian weighted average to recover a 0.1° x 0.1° sampled grid by determining the Gaussian weighted contribution of the nearest nodes to each pixel using the same 40 km Gaussian width which further smooths the kernels as well. We present the formal uncertainty from the last iteration of the inversion, which is propagated from the nodal parameterisation to the higher density grid using the Gaussian weights of each node at each pixel using the full covariance matrix. We used an a priori model covariance of 0.2 km/s 2 in the phase velocity inversion along the diagonals of C mm (Supplementary Figure   S3). This choice stabilises the inversion but is not restrictive as the value is much larger than the standard deviations from the mean velocity at each period (Forsyth & Li, 2005). Choices smaller than 0.2 km/s resulted in damped velocity variations. This resulted in well resolved phase velocity maps between 8-26 s which are indicated in the average 1-D profiles in Figure   3 and S4, with lower uncertainties in the rift and at shallower depths.

Data Processing
We extracted amplitude and phase information from vertical component seismograms for earthquakes with >5.5 magnitude and epicentral distances of 25-150 (1053 events, Figure 1 inset and supplementary Figure S1). For each teleseismic event, raw data were demeaned and detrended and their instrument response was removed. The data were then bandpass filtered using a 4 th order Butterworth filter with corner frequencies between 0.005-0.4 Hz. Then the fundamental mode Rayleigh wave data were windowed using a time variable filter (Landisman et al., 1969), and the Fourier amplitude and phase calculated at each frequency of interest via a fast Fourier transform. The data were manually inspected to ensure a continuous dispersion curve, a SNR greater than 3 and peak energy between the approximate frequencies of interest, 0.01 Hz to 0.07 Hz resulting in 1053 events and 47227 ray paths going into the final inversion.

1-D phase velocity inversion.
We determined the average dispersion curve for the area using a 1-D version of the two-plane wave inversion method (Forsyth et al., 2005). The inversion was completed in two steps, with the first stage utilising a simulated annealing method to fit the two plane wave parameters for each event, while trying a range of starting phase velocities for the model between 3.00-4.40 km/s (Press et al., 1992). This ensured a global starting model was found for input into the second stage which utilised an iterative damped least squares inversion (Tarantola et al., 1982) ( Figure 3). The inversion simultaneously solves for the phase velocity, azimuthal anisotropy and wave parameters for each event. We do not calculate azimuthal anisotropy for the ambient noise as a systematic variation in phase with azimuth due to an inhomogeneous source distribution could easily be mapped into estimates of azimuthal anisotropy (Harmon et al., 2010;Yao & van der Hilst, 2009). We obtained an azimuthal anisotropy result for the teleseisms, but we do not interpret it as it is beyond the scope of this paper. There are some differences between the anisotropy model and that of Gallacher et al. (2016), which are due to the nodal spacing and additional data included in the inversion. For completeness we plot the anisotropy result in Supplementary Figure S10 and proceed focusing on the isotropic result. We assume an a priori error of the phase velocity model parameters (square root of the diagonals of Cmm) of 0.2 km/s (Supplementary figure S3) (Forsyth et al., 2005). The a priori error of the data (square root of the diagonals of Cnn, the data covariance matrix, eq. 1) are initially assigned as arbitrary value of 0.2, and as the data is normalized to a maximum value of 1 for each event, this yields a relatively conservative 20% error for the data. After the initial inversion the a priori error in the data for each event is defined by the misfit between the data for the event and the starting model of the previous iteration (Forsyth et al., 2005).
We set a minimum threshold for the a priori data covariance matrix of 0.03 2 or roughly 3% error of our normalized data, which are determined from the fits to the data as the inversion evolves. In other words we can fit 97% of the data to the normalized value of 1. Then we assign the average event misfit to the individual teleseismic data a priori error for a given teleseismic event and update at each step during the inversion. The reweighting of the data has an effect similar to using an L1-norm, by minimizing the effect of outliers or poorly fit data.

2-D phase velocity inversion.
We used the two plane wave method of Forsyth & Li (2005) at each period to invert for a phase velocity map using the amplitude and phase measurements described above from the teleseismic events, using 2-D finite frequency kernels (Yang & Forsyth 2006;Zhou et al., 2004). We used the same nodal parameterisation as used in the ambient noise tomography, i.e., a 0.25° x 0.25° nodal grid with the outermost row and column spaced at 1° to absorb velocity heterogeneities outside the target region ( Figure 2). The average 1-D phase velocity described in the previous paragraph is used as our starting model at each period. The phase velocity inversion used 2-D finite frequency kernels (Forsyth et al., 2005;Nishida, 2011;Tromp et al., 2010;Yang et al., 2006) and an iterative damped least squares approach (Tarantola & Valette 1982) (eq. 1), which solves for phase velocities at each node, and the plane wave parameters for each event. We use the same Gaussian averaging scheme described above to generate higher resolution phase velocity grids at 0.1 x 0.1°. The phase velocity at the nodes represent an average phase over the smoothed area around the node, so the final phase velocity maps at 0.1 x 0.1 resolution are determined from the Gaussian distance weighted contributions of the nearest nodes to each pixel. For our a priori model covariance we use 0.2 km/s 2 for the inner nodes and 2 km/s 2 for the outer nodes used to absorb velocity heterogeneity outside the array along the diagonals of C mm . We use the same scheme described above for the 1-D teleseismic phase velocity estimate, to determine the values for the diagonals of the data covariance matrix described in section 2.3.2. We present the formal uncertainty from the last iteration of the inversion, which is propagated from the nodal parameterisation to the higher density grid using the Gaussian weights of each node at each pixel using the full covariance matrix. The inversion is run twice. After the first set of inversions, events with phase misfits of >4 s are removed from the starting dataset and this is the input used for the final set of inversions. The removal of these poorly fit events is necessary as it removes waveforms with complicated source radiation patterns and other effects not accounted for in the inversion.

Joint Inversion for Shear Velocities
For the shear velocity inversion, we inverted each pixel of the phase velocity maps across all periods for a 1-D shear velocity structure at every node as a function of depth ( Figure 3). The combined 1-D shear velocities at every pixel collectively form the 3-D volume. We used the phase velocity maps from the ambient noise for 8-26 s and the phase velocity maps from the teleseismic results for 29-100 s. The transition at 26 s from one data-type to another is chosen based on the relative amounts of data, i.e. where the teleseismic has a greater number of ray paths. Where the ambient noise and teleseismic phase velocity maps overlap, they are within uncertainty of each other, e.g. at 20-33 s ( Figure S2).
We used an iterative damped least squares inversion (Tarantola & Valette 1982) and parameterised the shear velocity every 5 km vertically with 0.1 x 0.1 pixel size for the upper 50 km. For deeper depths we use an irregular spacing increasing from 10-50 km spacings to match that of Gallacher et al. (2016). We further tested a regularly spaced 5 km model and found results within the uncertainties of our preferred model (Supplementary Figure S5). The partial derivatives that relate variations in shear velocity to changes in phase velocity were calculated using DISPER80 (Saito, 1988). We assigned a nominal a priori standard error of 0.2 km/s for the shear velocity starting model and fixed the Vp/Vs ratio to 1.8, which is the crustal average from receiver function analyses Stuart et al., 2006) and also a typical mantle value (Dziewonski & Anderson, 1981).
Variations in the choice of Vp/Vs (1.5-2.1 the observed Vp/Vs ratios in this area) produced results within uncertainty and we present the formal uncertainty from the inversion in Figure   3. We show the improvement between the 1-D inversion and 2-D inversion in Supplementary Figure S6 and Table S1, and give the median values for the total phase time misfit, the RMS of the phase time misfit and RMS of the Real and Imaginary parts of the teleseismic inversions, and median and RMS of the phase misfit for the ambient noise inversion. Finally, we interpolated the velocity structure to 1 km depth for presentation purposes using a linear interpolation.

Uncertainties and resolution
To examine the resolution of the phase velocity tomography, we produced checkerboard resolution tests ( Figure 4) and synthetic structure recovery tests ( Figure 5). Checkerboard tests were produced at lateral length scales of 70 km (periods 8-40 s), and 165 km (~1.5, periods 8-100 s)(See supplementary S7 and S8 for further checkerboard tests at length scales of 110 km (1) and 220 km (2)). We show the checkerboards at the shortest and longest periods for the ambient noise (8 and 26 s) and teleseisms (29 and 100 s). In the period range where the two data sources overlap, the teleseisms offer improved resolution at periods longer than 29s, without losing resolution for crustal structure provided by the ambient noise.
We also mask results outside the 2σ standard error contour from the phase velocity inversion.
This is the formal uncertainty of the inversion from the linearised iterative least squares at the last iteration. Errors are propagated to the higher density grid in a similar fashion to the phase velocities.
The checkerboard tests indicate that 70 km length scale anomalies are well recovered for 8-40 s period, and 165 km length scale anomalies are resolved for all periods within the 2σ error contour (Figure 4). Inside the rift, anomalies are resolved at 70 km length scales for 8-40 s period, and 110 km anomaly length scale anomalies can be resolved using periods of less than 71s ( Figure S7). These periods are reflective of crustal and mantle depths which are discussed here down to ~120 km depth ( Figure S7). Off-rift, 70 km length scale anomalies are well resolved at periods less than 17s for the ambient noise and at periods less than 40 s from the teleseisms. The 110 km length scales anomalies off-rift ( Figure S7), can resolve anomalies at periods less than 71s. The 165 km length scale anomalies for all periods, both on and off-rift, are well resolved, however amplitude decreases at the longer periods. For checker anomaly length scales of 220 km, features are resolved at periods up to 100s ( Figure   S8). The decrease in amplitude recovery is likely related to the broad sensitivity kernels for the longest periods (first Fresnel zone ~500-1000 km wide for 100s period (Yoshizawa & Kennett, 2002)) and fewer ray paths. For our model, based on the recovery tests, it is apparent that we are likely underestimating the amplitude of the velocity anomalies at asthenospheric depths. This is particularly noticeable beneath the Ethiopian Plateau deeper To examine vertical resolution, we perform a spike test (Backus & Gilbert, 1970) for a range of model depths ( Figure 6). These kernels show the recovery of a spike function based on the diagonals of the formal resolution matrix. The formal resolution matrix is derived from the damped least squares inversion where Resolution: is the matrix of partial derivatives from the kernel at each node, is the data covariance matrix and is the model covariance matrix (Saito, 1988). Resolution ranges between 0-1, where 1 indicates a completely resolved model parameter (e.g. velocity of a layer) and 0 means not resolved. The kernels suggest shear velocities are resolvable down to 150 km depth (Figure 3 and Figure 6) after which the resolution starts to deviate from the typical bell curve shape. At the shallowest depths (7-22 km), depth slices averaged over ±10 km are resolved, whereas the deepest slices at 142-162 km depth also include sensitivity to a broader depth range between approximately 100 and 250 km depth.  Figure 3c we present the sensitivity kernels, which indicate the depths of peak sensitivity for the shear velocity inversion at 8,15,20,26,29,40, 71 and 100 s period. The kernels suggest 150 km depth is the limit of our sensitivity.

2-D Phase Velocities
We generate phase velocity maps from 8-100 s (Figure 7 and Figure S10)  In the phase velocity maps the slowest anomalies are typically beneath the MER (Figure 7), whereas this is only true at <40 km and >150 km depth in the shear velocity maps (Figure 8).
This likely reflects the broad depth sensitivity of phase velocity measurements and the large contrast in crustal thickness and velocity between Afar and the MER. In other words, the slow thick crust with a slow velocity mantle in the MER generates slow velocities at all periods in the phase velocity maps. In Afar, thin crust with faster but anomalously slow mantle beneath yields relatively high phase velocities at short periods (8-20 s) and lower velocities at longer periods (50-100 s) when the sensitivities are averaging more mantle than crust.

3-D Shear Velocity Structure
At lithospheric depths (10-80 km), we observe strong lateral variations in shear velocity, up to 0.85 km/s across our study region, which likely reflect a combination of significant changes in crustal thickness and variability in mantle structure. Figure  At asthenospheric depths (>60 km in Afar and >80 km in the MER and Plateau), we also observe several punctuated slow velocity regions. Specifically, in the 80-120 km depth range, the region beneath the MER is the slowest in the region with velocities <4.15 km/s (Figure 8 and S11). The slow velocity anomaly is not centred beneath the rift but is offset towards the west, beneath Addis Ababa and straddling the rift. There are two other slow velocity regions with velocities <4.15 km/s located near the Red Sea and the Gulf of Aden. Within Afar, in this depth range, the slow velocities are not as strong.
The slow velocity anomalies within the rift appear to systematically extend to shallower depths going from the MER northwards. Profile A-A', along the MER rift axis (Figure 9a), extending into Afar, shows the relationship between the slow velocity anomalies. Near the MER, the slow velocities are visible beneath the fast lid, extending from 80-120 km depth using the 4.05 km/s contour (Figure 9a). Going northwards, the slow velocity anomalies are centred at shallower depths going to 75 km and then to 65 km depth beneath Afar. Although the progression is intriguing, given our depth resolution, the shallowing itself is not particularly well-resolved. The base of the anomalies appears fairly constant at ~120 km depth. There does not appear to be much variation in structure at greater depths in our models though this is close to the limits of our depth resolution.
The strongest anomalies at crustal depths within the MER and the Ethiopian Plateau, are displaced from the strongest anomalies in the asthenosphere, while in Afar the anomalies in the asthenosphere are close to regions of geologically recent volcanism. In the MER the slow velocities at 20 km depth are located ~100 km southwest of the slowest velocities in the asthenosphere. Beneath the Ethiopian Plateau, velocities at asthenospheric depths are faster than those within the MER and the slowest asthenospheric velocities that may link to those within the crust off-rift, are again located within the asthenosphere beneath the MER.

Interpretation
Our

Crustal Structure
At shallow depths (10-40 km) we observe the largest range in shear velocity beneath the rift. This is likely owing to changes in crustal thickness, melt and temperature (Bastow et al., 2010;Chambers et al., 2019;Dugda et al., 2007;Hammond, 2014;Kim et al., 2012;Korostelev et al., 2015). Faster velocities (3.8-4.10 ±0.04 km/s) beneath Afar at 10-40 km depth are consistent with the presence of the mantle in this depth range in the region and consistent with a thinner crust (16-25 km thick crust    (Cornwell et al., 2010;Dugda et al., 2005;Ogden et al., 2019;Stuart et al., 2006). The slower crustal velocities within the MER are likely a result of higher temperatures and/or the presence of partial melt in the crust, which is similar to interpretations by previous studies which observe slow crustal velocities (e.g. Bastow et al., 2010;Chambers et al., 2019;Hammond et al., 2014) and high conductivities (Hübert et al., 2018;Whaler & Hautot 2006). If we assume the slower velocities in the MER are solely from melt we can relate the velocity reduction to melt using the experimental relationship of 0.1 km/s decrease in shear velocity requires 1% melt (Caricchi et al., 2008). Comparing the velocity reduction in the slowest region of the MER with respect to the western Ethiopian Plateau, we require 3.0% partial melt which is within the range interpreted in previous teleseismic tomography studies (0.6-4.0% partial melt) (Bastow et al., 2011;Civiero et al., 2015;Ferguson et al., 2013;Gallacher et al., 2016) and surface wave tomography (1.1-2.0%) (Chambers et al., 2019). It should be noted that while velocities are affected by volume of melt, the shape and aspect ratio of melt inclusions likely have a larger influence on velocity (Blackman & Kendall, 1997;Hammond & Kendall, 2016), thus the 4.0% estimate from previous studies is an upper bound on melt fraction.
Where slow velocities are observed beneath crustal magmatic segments in our model (>0.2 km/s slower than surroundings, Vs = 3.50 ±0.02 km/s), magmatic emplacement is also likely.
A similar interpretation has been made for the origin of the slow velocities in a surface wave derived model and geochemical analysis (e.g. Chambers et al., 2019;Rooney, 2020b) ( Figure   8). Seismicity studies find evidence for lower crustal earthquakes in the rift in areas where temperatures should be too high for brittle failure and suggest lower crustal earthquakes are generated by fluid/magmatic injection from the mantle (La Rosa et al., 2021;Muluneh et al., 2021). Furthermore, studies find higher Vp values at lower crustal depths (6.8-7.2 km/s) (e.g. Daly et al., 2008;Keranen et al., 2004;Mackenzie et al., 2005), which when coupled with high Vp/Vs ratios (1.8-2.1) (Daly et al., 2008;Hammond et al., 2011) and high conductivities (resistivity 0.8 log( m)) (Whaler et al., 2006) suggest the lower crust is undergoing significant modification by active magmatic intrusions (e.g. Hammond et al., 2010;Keir et al., 2009;Kim et al., 2012). These results are also supported by anisotropy which requires partial melt stored in dykes and sills throughout the crust (Bastow et al., 2010;Chambers et al., 2021;Hammond, 2014;Keir et al., 2011).
At the westernmost part of the Ethiopian Plateau, located 250-500 km from the rift, we can obtain information about the pre-rift structure. The distance from the rift, coupled with geological studies which find little evidence for eruptions (e.g. Jones, (1976); Rooney, (2019)), suggests the westernmost part of the Ethiopian Plateau (Figure 1) has been minimally impacted by rifting and hotspot tectonism (Jones, 1976;Rooney, 2019). We interpret the crustal velocity observations as being most similar to original plate structure before rifting. We choose this region as it is significantly further from the rift than the  (Rooney, 2017) though there is evidence for limited Quaternary activity along the YTVL and southwest of lake Tana (Kieffer et al., 2004;Meshesha & Shinjo, 2007). Simple conductive cooling calculations for the 1-D thermal heat equation indicate that a 1300C thermal anomaly would dissipate in <10 Myr ( Figure S12 and Text S1). Therefore, it is unlikely that a remnant thermal anomaly could explain our observations, and we suggest there has been further recent magmatic emplacement. Our recovery tests at > 15 s period ( Figure 5), suggest there may be some smearing between the anomalies in the region, so we cannot differentiate whether there are multiple isolated magma bodies or a broader region of melt. In addition, although there does not appear to be a slow velocity region at > 40 km depth, our lateral resolution may prevent us from imaging a small conduit for melt at these depths or if the slow velocity anomaly is an isolated and shallow magma body with no feeder system.

Lithospheric Mantle Structure
In the upper mantle we require a fast lid in the average 1-D shear velocity model for the region (Figure 3 Peninsula. The base of the fast lid is commonly associated with the LAB (Fishwick, 2010;Lavayssière et al., 2018;Rychert et al., 2012Rychert et al., , 2005, and we therefore plot the S-to-P LAB results of Lavayssière et al. (2018) on our cross-sections ( Figure 9 red diamonds in lower panels). Overall, we find good agreement between the locations of our fastest velocities (Vs >4.15 km/s) and the existence of a strong LAB phase from the S-to-P results. This suggests that the S-to-P result is detecting a strong, sharp velocity drop at the base of the fast lid where it exists. Our result, with no discernible fast lid in Afar, is consistent with S-to-P results which also did not find a strong, significant LAB phase beneath the majority of the rift (Lavayssière et al., 2018;Rychert et al., 2012). In summary, a fast velocity lid is required to achieve a negative LAB velocity gradient that is imageable by S-to-P. Since, S-to-P imaging requires strong, sharp velocity gradients it also implies that the velocity drop at the base of the lid in our model may be sharper than that in our smooth model, although our resolution precludes further investigation. The presence of the slowest asthenospheric velocities (3.98-4.06 ±0.03 km/s) beneath areas where receiver functions do not detect a significant LAB phase (Lavayssière et al., 2018;Rychert et al., 2012), is consistent with a lack of fast lid caused by a melt-infiltrated lithosphere (Bastow et al., 2010;Kendall et al., 2006Kendall et al., , 2005Lavayssière et al., 2018). We note that we do not see strong evidence for a lithospheric drip beneath the Ethiopian Plateau as hypothesized based on geochemical analyses, although we cannot exclude the possibility that it has descended deeper into the mantle outside of our resolution (e.g. Furman et al., 2016).

Asthenospheric Anomalies
At asthenospheric depths, >80 km beneath the Ethiopian Plateau we observe velocities of 4.20 ±0.05 km/s which by itself does not necessarily require the presence of partial melt. The asthenospheric velocity is lower than the global average at that depth (4.45 km/s) using ak135 (Kennett et al., 1995), but is consistent with a previous surface wave derived shear velocity model in the region which also observed mantle velocities of ~4.20 km/s (Gallacher et al., 2016) (Figure 3b and Figure 8). The velocity can be explained by a thermal anomaly of 100-170°C (e.g. Gallacher et al., 2016), which is consistent with petrological constraints of the regional mantle potential temperature Ferguson et al., 2013;Rooney et al., 2012b). However, receiver functions image a strong, sharp discontinuity beneath the Ethiopian Plateau (Lavayssière et al., 2018;Rychert et al., 2012), and these studies as well as many others, have shown that another factor besides elevated temperatures is required to explain strong sharp velocity drops with depth. Partial melt is the most straightforward explanation for a host of similar observations globally (e.g. Harmon et al., 2021;. Therefore, if melt does exist in the asthenosphere beneath the Ethiopian Plateau, it must be in a more limited depth and/or lateral area than resolvable by surface waves in this region. Within the rift we observe punctuated slow velocity anomalies (Vs <4.05 km/s) at asthenospheric depths of > 60 km, which are ~110x80 km wide, spaced ~70 km apart, and have a base at ~120 km depth Figure 9a. The deep punctuated slow velocity anomalies that we observe are similar in scale and position to features imaged in a number of previous tomographic studies Gallacher et al., 2016;Hammond et al., 2013). However, they do not correspond spatially with the crustal magmatic segments at the surface. The northernmost anomaly with the shallowest starting depth (60 km) exists beneath Afar, the region in the latest stage of rifting (9% velocity decrease when comparing to the rift flanks), while for the MER the shallowest starting depth for the slow velocity bodies is at ~80 km depth. We note that absolute depths could vary by 20-30 km given our sensitivity at asthenospheric depths, although the trend is compelling. Mantle potential temperatures in the rift are only moderately elevated, ~1450°C (Ferguson et al., 2013;Petersen et al., 2015;Rooney, Herzberg, et al., 2012) based on models of volcanic rock geochemistry. With this temperature, we would expect velocities to be reduced by ~3% using a Burgers model (Jackson & Faul, 2010) for a peridotite mantle and an estimated geotherm for the MER (Chambers et al., 2019). Composition and temperature are therefore not sufficient to account for the velocity reduction, which suggests a fluid component is required in the asthenosphere beneath the rift. We postulate the fluid component is most likely partial melt, from the abundance of Quaternary volcanism at the surface, which is in agreement with previous tomographic studies Chambers et al., 2019;Gallacher et al., 2016). This is further supported by anisotropy studies which require melt as inclusions along the rifts (Bastow et al., 2010;Chambers et al., 2021;Hammond, 2014;Hammond et al., 2010; Kendall et al., 2006). The anomalies occur at depths consistent with previous geochemical estimates of melt generation (53-120 km depth) (Ferguson et al., 2013;Rooney et al., 2005).
The punctuated slow velocities in the asthenosphere (Figure 9a), are present beneath younger areas of the rift where the crust and lithosphere is not significantly thinner than beneath the Ethiopian Plateau (40 km thick crust and 80 km thick lithosphere (Lavayssière et al., 2018;Stuart et al., 2006)). Furthermore the anomalies persist into areas of later stage rifting in the northern section of A-A' in Afar (Figure 9a) where the crust is thinner (crustal thicknesses of 22 km ). These observations suggests punctuated melt supply starts prior to significant crustal and plate thinning.
Punctuated slow velocity anomalies beneath rifts are not isolated to this study and have been observed beneath the EAR Civiero et al., 2015Civiero et al., , 2019Gallacher et al., 2016;Hammond et al., 2013) and more mature rifts and mid-ocean ridges, e.g. Gulf of California, Red Sea Rift, the Mid Atlantic Ridge Lekic et al., 2011;Ligi et al., 2012;Wang et al., 2009). The processes generating separated slow velocity anomalies beneath the EAR are debated (e.g. Bastow et al., 2008;Civiero et al., 2015;Gallacher et al., 2016;Ligi et al., 2012). The main hypotheses for the slow anomalies are decompression melting focused by buoyancy driven upwelling from the release of melt (Gallacher et al., 2016) or decompression melting focused by non symmetrical mechanical thinning of the lithosphere  and steep lithosphere asthenosphere boundary gradients that developed during early-stage rifting (Holtzman & Kendall, 2010;Kendall et al., 2005). Other hypotheses are the presence of smaller diapiric upwellings (Hammond et al., 2013) and mantle plumelets (e.g. Civiero et al., 2019).  . The offset in the anomalies suggests that preexisting structures and topography of the LAB may also have a control on melt migration.
Anisotropy in the MER is strongest near the rift flanks and studies have suggested strong azimuthal anisotropy coupled with velocities slow enough to contain melt reflect regions of steepest LAB gradient Holtzman et al., 2010;Kendall et al., 2006). One possibility is that the punctuated slow velocity zones are in part controlled by variations in LAB topography in the earlier stages of rifting, where the rift is narrow, but their shallowing and more central position at more advanced rifting suggest the current locus of extension is the primary control.
Within the MER the slowest crustal velocity anomalies at 10-40 km depth are not located directly above the slowest velocity anomalies at asthenospheric depths (> 80 km) ( Figure 9).
As previously discussed, the anomalies at both depths are slow enough to require the presence of partial melt. One possible explanation is that melt migrates laterally to the crustal anomalies, potentially guided by a permeability boundary at the base of the plate that thins towards the rift, focusing the melt there (Sparks & Parmentier, 1991). Lateral melt migration has also been proposed by previous studies in the region Gallacher et al., 2016;Holtzman and Kendall 2010;Kendall et al., 2005).
Off-rift, beneath the Ethiopian Plateau, slow crustal velocities (located at ~38° E, 11° N and ~39.5° E, 11° N Figure 8 and 9) are also not located above the slowest asthenospheric velocities which are located beneath the rift. This situation is harder to reconcile via lateral melt migration unless there are pre-existing lithospheric cracks (Sandwell & Fialko, 2004) from the rift to beneath the Ethiopian Plateau. Overall, lateral melt migration is unlikely given that neither our result nor that from receiver functions supports lithospheric thinning toward the off rift slower crustal velocities (Lavayssière et al., 2018). Another possibility is that melt is ephemeral. In other words, melt was once present in the asthenosphere beneath the slow crustal anomalies of the plateau but it has now drained to the crust and/or frozen into the mantle. This idea is based on geodyanmic modeling work that shows that melt need not necessarily immediately leave the mantle via a vertical ascent path (Hebert & Montési, 2010;Sim et al., 2020;Sparks & Parmentier 1991). For instance, recent geodynamic modelling of two-phase flow shows that for spreading rates similar to those of the MER, melt is predicted to rise and accumulate beneath a permeability boundary at the base of the plate (Sim et al., 2020). The melt eventually freezes into the base of the plate and/or moves along the base of the permeability barrier towards the surface, essentially draining the mantle of melt before eventually being replenished by upwellings from below. Both the episodic nature of the melt supply and the fact that some melt is predicted to freeze into the plate mean that the amount of melt imaged in the asthenosphere at any given time need not synchronously balance the amount of melt in the crust and/or at the surface of the Earth. Of course, this modelling work has not yet evolved to incorporate the variations in lithospheric and crustal thicknesses that occur in a continental rift. However, such work offers strong evidence of such a possibility in rift systems. Ephemeral melt has also been invoked to explain punctuated anomalies and variable presence of channelised melt beneath the oceanic lithosphere (Carbotte et al., 2021;Harmon et al., 2020Harmon et al., , 2021Rychert et al., 2021Rychert et al., , 2020Wang et al., 2020). It presumes that melt generation is episodic and that melt migration is faster than generation so that the mantle is effectively drained. A third possibility is that melt is present in the asthenosphere beneath the slow crustal velocity anomalies of the plateau (Civiero et al., 2015;Keranen et al., 2009), but is localised in pockets that are below our resolving power. This possibility is supported by our resolution tests that show that amplitude recovery beneath the plateau is muted, and therefore velocities could be slower than those of our model ( Figure 5). It is also supported by receiver function imaging of strong sharp velcoity drops beneath the plateau, which are most easily explained by a small amount of melt in the asthenosphere (Figure 9) (Lavayssière et al., 2018;Rychert et al., 2012). It should be noted that there is ongoing extension below the Ethiopian Plateau (Birhanu et al., 2016), which has been attributed to high gravitational potential energy or intrusive magmatism away from rift. While the areas of greatest extension are not in exactly the same location as the slow off-rift crustal velocities, it provides a mechanism to facilitate off rift crustal intrusion, and a mechanism to generate melt beneath it.
Overall, our result suggests melt generation and migration may be dynamic processes, which require further study to fully understand rifting processes and the factors that dictate the locations of active volcanic/hydrothermal regions.

Conclusions
We present the results from a joint inversion of ambient noise and teleseismic Rayleigh waves to produce a 3-D absolute shear velocity map from 10-150 km depth, for the northern   (Lavayssière et al., 2018), magenta , yellow , Orange , Cyan (Cornwell et al., 2010), white (Ogden et al., 2019)) and LAB in bottom sections (red (Lavayssière et al., 2018)). White rings are the same as black rings in Figure 8, for location reference, and purple tick marks indicate the crossing points of A-A' and B-B' on the respective figures.