Tomography of crust and lithosphere in the western Indian Ocean from noise cross-correlations of land and ocean bottom seismometers

wave type propagating at 0.8– 1.5 km s − 1 over distances exceeding 1000 km, presumably Scholte waves travelling through seaﬂoor sediments. The 100 highest-quality group velocity curves are selected for tomographic inversion at crustal and lithospheric depths. The inversion is executed jointly with a data set of longer-period, Rayleigh-wave phase and group velocity measurements from earthquakes, which had previously yielded a 3-D model of Indian Ocean lithosphere and asthenosphere. Robust resolution tests and plausible structural ﬁndings in the upper 30 km validate the use of noise-derived OBS correlations for adding crustal structure to earthquake-derived tomography of the oceanic mantle. Relative to crustal reference model CRUST1.0, our new shear-velocity model tends to enhance both slow and fast anomalies. It reveals slow anomalies at 20 km depth beneath La R´eunion, Mauritius, Rodrigues Ridge, Madagascar Rise, and beneath the Central Indian spreading ridge. These structures can clearly be associated with increased crustal thickness and/or volcanic activity. Locally thickened crust beneath La R´eunion and Mauritius is probably related to magmatic underplating by the hotspot. In addition, these islands are characterized by a thickened lithosphere that may reﬂect the depleted, dehydrated mantle regions from which the crustal melts where sourced. Our tomography model is available as electronic supplement.

Western Indian Ocean noise tomography 925 propagation (and therefore subsurface structures) between the two stations, CCFs can be used for tomographic studies. As microseismic noise is mainly composed of surface waves with wave periods of 3-20 s (e.g. Friedrich et al. 1998), noise cross-correlations in this period band put constraints mainly on the earth's crust and uppermost mantle. Over the past 15 yr, the number of noise tomography studies has exploded, ranging from local scale (e.g. Brenguier et al. 2007;Mordret et al. 2014Mordret et al. , 2015, to regional and continental scale (e.g. Guo et al. 2013;Zigone et al. 2015;Goutorbe et al. 2015;Corela et al. 2017;Lu et al. 2018;Xie et al. 2018) and up to global scale (e.g. Nishida et al. 2009;Haned et al. 2016). The majority of these studies used cross-correlations between land stations to investigate continental crust and/or mantle. For oceanic plates, structural details far away from continents are known only for a few areas because long-term seismological broadband arrays have rarely been deployed in the open oceans. Applied to relatively recent ocean bottom seismometer (OBS) arrays, the method of noise cross-correlations has the potential to increase our knowledge about ocean basins significantly. Only a few studies have used OBSs for noise tomography (e.g. Yao et al. 2011;Tian et al. 2013;Mordret et al. 2014;Zha et al. 2014;Ball et al. 2016;Corela et al. 2017;Ryberg et al. 2017;Tomar et al. 2018;Janiszewski et al. 2019), and the interstation distances were relatively small, ranging from several metres (e.g. Mordret et al. 2014;Tomar et al. 2018) to at most hundreds of kilometres (e.g. Yao et al. 2011;Tian et al. 2013;Zha et al. 2014;Ball et al. 2016;Corela et al. 2017;Ryberg et al. 2017;Janiszewski et al. 2019).
Here, we present an OBS noise tomography study of the western Indian Ocean with very large interstation distances of up to >2000 km. Due to very sparse instrumentation of this ocean, the few crustal tomographies that exist have low lateral resolution (e.g. Montagner & Jobert 1988;Debayle & Lévêque 1997;Ma & Dalton 2017).
From 2011 to 2015, the Réunion Hotspot and Upper Mantle-Réunions Unterer Mantel (RHUM-RUM) experiment was conducted in the western Indian Ocean around the volcanic island of La Réunion, which is located 800 km east of Madagascar and 200 km southwest of Mauritius (Fig. 1a). The hotspot volcanism of La Réunion since 65 Ma, combined with a time-progressive hotspot track that leads to a large igneous province (the Deccan Traps in India and the Seychelles) point both to a deep mantle plume beneath the island. RHUM-RUM's main objective was seismological imaging of the area from crust to core, in order to strengthen or reject the mantle plume hypothesis (Barruol & Sigloch 2013). A large array of 57 broadband OBSs was deployed over an area of 2000 × 2000 km 2 during 13 months in 2012-2013, and 37 temporary land stations were installed between 2011 and 2015 on La Réunion, Mauritius, Rodrigues Island, theÎlesÉparses, southern Seychelles and in southern Madagascar. Mazzullo et al. (2017) used earthquake data recorded by the RHUM-RUM stations to retrieve Rayleigh-wave phase velocities for periods of 30-300 s and group velocities for 16-250 s. A 3-D, regional shear-velocity model was obtained by joint inversion of phase and group velocities, but the wave periods were too long to reliably constrain crustal depths, which were essentially filled in a priori by a crustal reference model, CRUST1.0 (Laske et al. 2013).
The scope of our study is to tomographically image the crust and lithosphere beneath the RHUM-RUM deployment, using crosscorrelations of ambient seismic noise recorded by RHUM-RUM OBSs and island stations. We use phase cross-correlations (PCCs, Schimmel 1999) which do not require special pre-processing, unlike classical CCF calculations. Subsurface structure is constrained by group velocity measurements, which are derived from the PCCs. Finally, our group velocity values are inverted jointly with the earthquake-generated Rayleigh-wave data previously measured and inverted by Mazzullo et al. (2017). We obtain a 3-D, regional shearvelocity model from the surface down to ∼250 km depth, of which we discuss the upper 80 km, where the new ambient noise data appreciably add constraints.
Section 2 describes the seismic stations used in this study. Section 3 describes the calculation of the PCCs and the stacking procedure. The estimation of group velocity curves from the PCC stacks and the results of the group velocity measurements are presented in Section 4. Section 5 focuses on tomography based on these noisederived group velocity data and on the earthquake-derived data of Mazzullo et al. (2017). The results are discussed in Section 6, followed by Conclusions in Section 7.

DATA D E S C R I P T I O N
The study area is shown in Fig. 1. We use continuous waveform data of 57 OBSs that were deployed in the course of the RHUM-RUM experiment around La Réunion for 13 months (October 2012 to November 2013) at water depths of 2200-5400 m . The network comprised 44 wideband OBSs from the German DEPAS pool, 4 wideband OBSs from GEOMAR Kiel (identical in construction to the DEPAS OBSs), and 9 broadband OBSs from the French INSU pool. The DEPAS and GEOMAR OBSs were equipped with the OBS version of the Güralp CMG-40T sensor, named CMG-40T-OBS. Its corner period is usually 60 s, although 9 of the DEPAS OBSs were equipped with a prototype sensor of corner period 120 s (but these instruments failed to deliver data). The INSU OBSs were equipped with Nanometrics Trillium 240 sensors, featuring a corner period of 240 s. INSU OBSs are of the LCPO2000-BBOBS type, based on the Scripps Institution of Oceanography 'L-CHEAPO' design. More details about OBS types and their performances during the RHUM-RUM deployment can be found in Stähler et al. (2016). The actual horizontal OBS orientations on the ocean floor can be found in Scholz et al. (2017).
Our study is based on vertical component seismograms. Unfortunately, the seismometers of several OBS stations failed, mostly due to a stuck levelling mechanisms, which affected all nine 120 s-DEPAS instruments and a few others (Stähler et al. 2016). We included only one station (RR47) of a densely spaced subarray of eight OBSs on the Southwest Indian Ridge ('SWIR array' in Fig. 1), since ours was a large-distance study. In total, we could make use of 38 3 GEOMAR,9 INSU), marked by red symbols in Fig. 1(a).
In order to enhance the wave path coverage of our study area, we use 10 land stations in addition to the 38 OBSs: three stations on La Réunion (MAID, PRO, RER, see   ) are shown as red symbols (stations that worked) or unfilled symbols (stations that failed). Different red symbols mark different OBS types (DEPAS, INSU or GEOMAR), which are discussed in the text. Unfilled stations in a local sub-array ("SWIR array" in the inset map) worked but were too closely spaced to be useful. Upright triangles mark land stations that were deployed contemporaneously by RHUM-RUM (fill colour yellow); by the permanent GEOSCOPE network (white) (IPGP & EOST 1982); by the temporary MACOMO experiment (green) (Wysession et al. 2011); by the Observatoire Volcanologique du Piton de la Fournaise (blue), and by the Mauritius Seismic Network (purple). Stations are labelled by their names if they are explicitly mentioned in the text. Grey lines indicate the area's three mid-ocean spreading ridges: Central Indian Ridge (striking N-S), Southeast Indian Ridge (NW-SE) and Southwest Indian Ridge (SW-NE) plate boundaries. (b) Map zoom shows the island of La Réunion and three land stations that are used.
Madagascar (Wysession et al. 2011);FOMA, RER and ROCAM are permanent GEOSCOPE stations (IPGP & EOST 1982); PRO is a permanent station of Observatoire Volcanologique du Piton de la Fournaise (OVPF); and MRIV is a permanent station of the Mauritius Seismic Network. Each of these land stations operated for at least five months during the 13-month period of the OBS deployment.
A crucial requirement for our study is the correct timing of the seismic records. Unlike for land stations, a GPS signal acting as external reference clock cannot be received at the ocean bottom. Instead, the timing of OBS records has to rely on the OBS-internal clock, which is however prone to drift. The conventional procedure is to perform a plausible, but unverified timing correction after OBS Downloaded from https://academic.oup.com/gji/article-abstract/219/2/924/5539531 by Biblio Planets user on 29 August 2019 recovery, the so-called skew correction: the timing of the seismograms is corrected linearly based on the skew, which is the measured time difference between OBS clock and GPS clock immediately after the recovery of the OBS. A GPS synchronization immediately before the OBS deployment ensures a correct timing at the beginning of the recordings. This procedure became problematic in the RHUM-RUM experiment, where almost half of the OBSs lacked the (second) GPS synchronization after recovery, due to faulty shutdown of the OBS clocks prior to OBS recovery (Stähler et al. 2016). Therefore, Hable et al. (2018) conducted an extensive clock error study to estimate and correct clock drifts for OBSs without skew, and to verify commonly assumed (linear) clock drift patterns used to justify the skew correction procedure. Hable et al. (2018) demonstrated that noise cross-correlations can be used to measure clock drifts for large interstation distances (>300 km) with a high temporal resolution of 2 d. Hable et al. (2018) demonstrated that averaging over multiple station and component pairs improves the method's timing accuracy severalfold, to ∼20 ms. Hable et al. (2018) verified that the RHUM-RUM OBS clocks indeed drifted almost linearly. For all OBSs (except one very noisy station not included in this study), a linear clock drift value could be measured from the correlation functions (or verified, if a skew measurement existed). Drift magnitudes ranged between 0.2 and 8.8 ms d -1 . Additionally, Hable et al. (2018) could detect that three OBSs were affected by apparent clock jumps of ∼1 s, caused by (very rare) failure to successfully write batches of digital samples to disk.
Land stations should not be affected by clock errors, because frequent synchronization with a GPS signal can usually be ensured on land. An exception was station MAID on La Réunion, which suffered from large clock errors of several minutes due to a GPS failure. The timing problem of MAID (and three other land stations not part of our study) was also detected and successfully corrected by Hable et al. (2018).
For this study, we used time-corrected data for all OBSs and for land station MAID according to Hable et al. (2018).

PCC calculation
The subsurface properties between two points are reflected by the Green's function. This function represents the impulse response, that is the seismogram measured at one point if a delta pulse acts as input at the other point. Since cross-correlation functions of ambient seismic noise recorded at two seismometers (s 1 and s 2 ) converge towards the Green's function between the two stations, CCFs are a powerful tool to investigate subsurface structure, and more specifically crustal structure, because the periods that dominate in ambient microseismic noise correspond to Rayleigh waves propagating at crustal depths (e.g. Shapiro & Campillo 2004;Sabra et al. 2005;Shapiro et al. 2005). In practice, CCFs are calculated on relatively short time-series (e.g. 1 d), and are subsequently stacked over a longer duration (e.g. 1 yr) in order to achieve the best possible convergence towards the Green's function (Bensen et al. 2007;Schimmel et al. 2011). The stacking procedure enhances the signal part common to all daily CCFs, and suppresses the random noise components.
CCFs comprise a causal part and an acausal part. The causal part represents seismic waves recorded at station s 2 as a response to an impulse at the location of s 1 . Conversely, the acausal part represents waves propagating from station s 2 to station s 1 . Ideally, both sides of a CCF should be symmetric because the waves travel through the same medium. In practice, CCFs are usually not symmetric because unevenly distributed noise sources lead to different amounts of wave energies travelling in either direction (Stehly et al. 2007). Since CCF asymmetry is mainly limited to unequal amplitudes of the causal and acausal parts, stacking over both CCF parts is an appropriate technique to enhance the signal-to-noise ratio of the CCF stack.
The classical method of CCF requires pre-processing to reduce the effect of high-energy, non-random signals like earthquakes, which could deteriorate the CCFs (Bensen et al. 2007). Here we use cross-correlations based on the alignment of the phases in seismograms from two stations, termed phase cross correlations by Schimmel (1999). Schimmel et al. (2011) showed that these PCCs can improve convergence to the Green's function. PCCs are insensitive to amplitudes, in contrast to classical CCFs which are based on the alignment of high amplitudes (Schimmel et al. 2011). Hence no time-domain normalization (such as a 1-bit normalization) described by Bensen et al. (2007) is required for the PCC preprocessing. A detailed comparison of PCCs with classical CCFs is given by Schimmel et al. (2011).
The PCC definition has its roots in analytical signal theory. The analytic signal s(t) of a real time-series u(t) can be described in the complex number domain by following equation (Schimmel et al. 2011): where H(u(t)) denotes the Hilbert transform of u(t); a(t) represents the envelope and φ(t) the instantaneous phase of the waveform. Based on this representation, the following definition of PCC calculations was introduced by Schimmel (1999) and further investigated by Schimmel et al. (2011): where φ 1 and φ 2 are the instantaneous phases of two stations s 1 and s 2 ; t is the lapse time of the PCC; τ 0 is the start time of the PCC; and T is the length of its correlation window. The sensitivity of the PCC is determined by ν; we follow Schimmel et al. (2011) and use ν=1.
Our study uses the Z-component of day-long seismograms. The pre-processing consists of only three steps: (1) downsampling of the time-series to 2 Hz, from originally 50 Hz, 62.5 Hz or 100 Hz; (2) correction for the instrument response and (3) removal of the mean and linear trend. Downsampling renders the PCC calculation less time-consuming, and instrument correction is particularly important because we are cross-correlating data from different seismometer types (Stähler et al. 2016).
Before the PCCs are calculated, we split each pre-processed, daylong time-series into four 6-hr-long traces in order to increase the number of PCCs available for stacking. Next the traces are filtered by three zerophase bandpass filters of periods of 3-10, 10-20 and 20-50 s. Use of a single, widebanded filter (e.g. 3-50 s) is not recommended because this tends to hamper robust group velocity estimation, as shown by Haned et al. (2016). We demonstrate in Section 4.1 that group velocity estimation can indeed be improved by using the three narrower sub-bands.
We noticed that PCCs correlating two INSU OBSs were afflicted by almost monochromatic frequency artefacts at multiples of 0.05 Hz (0.05, 0.1 and 0.15 Hz), that is a 'ringing' at these frequencies across all lapse times of the PCC can be observed. This instrument problem necessitated the selective removal of very narrow frequency bands around these three frequencies in all INSU OBS seismograms (although only specific stations might actually have been affected). This was successfully achieved by applying three narrow bandstop filters centred on 0.05, 0.1, and 0.15 Hz; the outcome is visualized in Fig. 2. Although the problem affected only INSU-INSU correlations, it could not be observed at all INSU-INSU station pairs, probably because only the individual stations RR38, RR40, RR50 and RR52 were affected. Figs 2(a)-(c) analyses a 10-d-long record of RR38. The spectrogram in Fig. 2(b) reveals two monochromatic peaks at 0.05 and 0.1 Hz as horizontal lines throughout the recording period. Expected features are the two diffuse, high-energy bands centred on 14 and 7 s, which correspond to the primary and secondary microseismic noise bands, respectively. The 0.1 Hz artefact is largely overprinted by the secondary microseism in Fig. 2(b). In order to diagnose further, we calculated the power spectrum of each of the 10 d and performed an average over these 10 power spectra in Fig. 2(c). The frequency artefacts appear as sharp peaks at 0.05 and 0.1 Hz in the averaged daily power spectrum. Investigation of other INSU stations identified 0.15 Hz as a third potentially contaminating frequency. These monochromatic frequencies are newly discovered artefacts that occur only in INSU OBSs. Their cause remains unclear, but they are almost certainly instrument artefacts due to their unnatural, narrow-banded nature. Since the PCC definition is based on phase coherence, phase correlations of two records affected by the same persistent frequencies are dominated by exactly these frequencies. This is shown in Fig. 2(d) for INSU-INSU station pair RR38-RR50. After removal of the frequency artefacts by bandstop filtering, Rayleigh-wave arrivals emerge clearly in the PCC stack (Fig. 2e), enabling a robust group velocity measurement (Fig. 2f).
For all station pairs (OBS-to-OBS, OBS-to-land, land-to-land), we calculate PCCs of 6-hr-long traces according to eq. (2), with lapse times running from -1000 to +1000 s. It is empirically known that proper group velocity estimation requires interstation distances of at least two or three wavelengths of the longest occurring wave period (e.g. Haned et al. 2016). Hence we skip the computation of PCCs in the 20-50 s period band for station pairs spaced by less than 450 km. Next we split the PCCs into a causal and a time-reversed acausal part, which yields eight PCCs (four causal and four acausal) wave amplitudes are coloured red and blue, respectively. The move-out times of plausible Rayleigh-wave arrivals are bounded by velocities of 5.5 km s −1 (light blue) and 2.5 km s −1 (dark blue). Rayleigh waves with ∼3.5 km s −1 are most prominent in the 20-50 s period band, still clear at 10-20 s, and weak at 3-10 s. PCCs in the 20-50 s band are not calculated for distances too short (<450 km) to yield useful group velocity values. A second type of arrival is observed in the period bands of 10-20 and 3-10 s, travelling at low wave velocities between 0.8 km s −1 (red line) and 1.5 km s −1 (orange line). These are interpreted as Scholte waves propagating through seafloor sediments, evidently over distances exceeding 1000 km.
per day for each station pair. The resulting large number of more than 3000 PCCs for the 13-month-long recording period (8 PCCs × 30 d × 13 months) should ensure a high signal-to-noise ratio of the PCC stack.

Phase-weighted stacking
In contrast to the conventional linear stacking procedure, we apply a time-frequency domain phase-weighted stacking (tf-PWS), which was developed by Schimmel et al. (2011) as an extension of the phase-weighted stacking introduced by Schimmel & Paulssen (1997). The purpose of tf-PWS is to downweight the incoherent parts of the linear stack in the time-frequency domain (τ , f), which is achieved by the time-frequency phase coherence c ps (τ , f) (Schimmel et al. 2011): where S j (τ , f) denotes the S-transform of the jth individual PCC and N is the number of individual PCCs. The S-transform is used to transform the PCCs in the time-frequency domain (Stockwell et al. 1996). As before, we follow Schimmel et al. (2011) who suggested to use ν=2 for eq. (3). The tf-PWS is then defined as the product of the phase coherence c ps (τ , f) and the S-transform of the linear stack S ls (τ , f) of all PCCs (Schimmel et al. 2011):

930
S. Hable et al. Finally, an inverse S-transform is performed to transform the PCC stack from the time-frequency domain (S pws (τ , f)) back to the time domain (s pws (t)) (Schimmel et al. 2011). Schimmel et al. (2011) and Corela et al. (2017) demonstrated in detail the gain in clearer wave arrivals when using tf-PWS instead of conventional linear stacking. This means that tf-PWS provides a faster convergence towards the Green's function and results in more robust group velocity measurements.
In this study, the PCC calculation as well as the tf-PWS are performed by the software package of Schimmel et al. (2011). We make use of all possible correlations that can be derived from our data set of 48 stations (38 OBSs and 10 land stations, see Section 2); a few exceptions are correlations between station pairs on La Réunion (like MAID-PRO) or on Madagascar (like ANLA-FOMA), which cannot provide information about the western Indian Ocean. (Investigation of the crustal structure beneath La Réunion from 23 island stations is the subject of a forthcoming study.) The number of possible station pairs n is determined by following equation: where k is the number of stations used. This yields 1119 station pairs processed in this study, considering that we excluded 9 stations pairs on La Réunion and on Madagascar. Fig. 3 shows the 1119 PCC stacks calculated according to eq. (4) for each of the 1119 station pairs and sorted by interstation distance. Since we are working with ZZ correlations, we expect the PCCs to consist mainly of Rayleigh waves. Indeed, the 20-50 and 10-20 s period bands show prominent wave package arrivals corresponding to propagation velocities of roughly 3.5 km s −1 (Figs 3a and b), which can be associated with Rayleigh waves. The same observations can be made in Fig. 4, where all 47 PCC stacks of INSU OBS RR52 are presented. However, the 3-10 s period band in Fig. 3(c) is dominated by a wave package of significantly lower velocity, in the range of 0.8-1.5 km s −1 , while Rayleigh wave trains are almost absent. The same low-velocity wave package is visible in the 10-20 s period band (Fig. 3b). We think that these slow waves correspond to near-surface waves that propagate through sediments deposited on top of the oceanic crust. This notion is supported by the slow wave velocities and by the observation that the slow waves are visible mainly at shorter periods, which only penetrate the shallowest subsurface. These waves can probably be assigned to a group of interface waves called Scholte waves (Scholte. 1947). Scholte waves of similar velocities were also observed by Le et al. (2018) in data of eleven OBSs deployed in the South China Sea (interstation distances of 60-270 km).

Group velocity calculation
For measuring group velocities, we use the algorithm developed by Schimmel et al. (2017, but without their data resampling approach). Within a given velocity band, the algorithm localizes the amplitude maximum of a PCC stack in the time-frequency domain as a function of frequency f. The arrival time t(f) of the maximum can be transformed to the group velocity v g (f) by taking the distance d between the cross-correlated stations into account: The algorithm starts with determining the maximum of the lowest frequency and gradually advances to higher frequencies. For each frequency (except the first), four maxima are calculated, and the one that shows the smallest velocity jump to the preceding group velocity value is chosen. In some cases, no velocity value is determined, for example when the velocity jump is too large or the amplitude maximum is weakly developed (the latter is relative and not easily quantified by one fixed number).
For extracting group velocity values that can be associated with Rayleigh waves, we specify the velocity range of 2.5-5.5 km s −1 (cf. Fig. 3). In all PCC plots, this velocity range is indicated by a blue-shaded area (e.g. Fig. 2e). Group velocity values are illustrated Figure 5. Measurement of PCC stacks and group velocities for station pair RR40-RR50, two OBSs spaced by 729 km. PCC stacks for (a) the wide period band of 3-50 s, versus sub-bands of (b) 20-50 s, (c) 10-20 s and (d) 3-10 s. Blue-shaded areas bracket the arrival times of wave velocities between 2.5 and 5.5 km s −1 . For this velocity range, estimated group velocities are shown in (e) for the wide band, and in (f)-(h) for the three subbands. Yellow and blue colours indicate high and low amplitudes, respectively, and amplitude is normalized so that its maximum value (yellow) is identical at each frequency. Black dots mark the group velocity values selected by the algorithm of Schimmel et al. (2017), and black bars denote their 95 per cent confidence ranges. Red dots replacing black dots are group velocity values we subjectively accepted for tomography, based on the simple, smooth appearance of the sequence of black dot (see text for discussion). According to this criterion, no group velocity measurement in the 3-50 s period band would be acceptable at frequencies below ∼0.1 Hz, whereas subdivision into three narrower bands (20-50 s, 10-20 s, 3-10 s) allows confident group velocity estimation in all three bands. as amplitude spectra, which are vertically normalized (e.g. Fig. 2f). This means that each frequency column contains the maximum value of 1, which is represented by yellow colour; smaller amplitude values are shown in green and blue, with dark blue corresponding to 0. If the algorithm could retrieve a maximum, the associated group velocity value is represented by a black dot (e.g. Fig. 5).
Group velocity values whose amplitudes are within 95 per cent of the maximum amplitude are marked by dashes.
At this point, we want to emphasize the benefit of using several narrower frequency bands instead of one wide band (see Fig. 5). We calculate PCC stacks for RR40-RR50 for a wide band of 3-50 s (Fig. 5a), and for its three narrower sub-bands of 20-50, 10-20 and 3-10 s (Figs 5b-d), which we actually proceeded to use in our study. In each of the PCC stacks, clear Rayleigh-wave arrivals are observed, regardless of the width of the band. For each of the period bands, we calculate group velocity values between 2.5 and 5.5 km s −1 (blue-shaded areas in Figs 5a-d). The group velocity plots are shown in Figs 5(e)-(h), the colouring corresponds to the description given above. The scaling of the frequency axis of Figs 5(f)-(h) is adjusted such that it matches the frequency axis of Fig. 5(e). Group velocity values selected by the algorithm in the 3-10 s period band are very similar to the values selected in the 3-50 s band for the same frequencies. By contrast, no reliable group velocity measurement can be retrieved in the 3-50 s band for frequencies below ∼0.1 Hz, as the yellow ridges of high-amplitude region become oscillatory and/or multivalued. For a reliable result, we would expect the high amplitudes gradually approach to one maximum value for each frequency, forming a smooth group velocity curve as function of frequency. Such benign behaviour is indeed observed in the narrower subbands in Figs 5(f) and (g). Similar observations were made by Haned et al. (2016). We concluded that the three   Peterson (1993). Orange vertical lines at 3 and 50 s delimit the period range investigated here. At periods longer than ∼5 s, the INSU OBS is much quieter than the DEPAS OBS. This comparison between the two OBS types holds true generally and seems to be due to high self-noise on the vertical components of the Güralp CMG-40T sensors in the DEPAS seismometers (Stähler et al. 2016). Hence station pairs comprising at least one INSU station are characterized by PCC stacks with clear Rayleigh-wave arrivals and group velocity measurements of accordingly higher confidence. The best 100 group velocity curves used in this study are coloured dark blue. The orange curve denotes the average group velocities for the RHUM-RUM region, obtained by Mazzullo et al. (2017) from Rayleigh-wave tomography using earthquake sources. narrow sub-bands should be adopted to obtain more numerous and more robust group velocity measurements. We visually evaluated all 1119 group velocity plots of the kind shown in Fig. 6 in order to decide on the subsets of automatically picked values (black dots) that seemed robust and plausible enough to be considered for further processing. This value extraction is to some extent subjective, but the desired curve shape is a smooth curve that is not distorted by blue holes. An example of blue holes deforming the velocity curve can be seen in Fig. 5(g) for frequencies above ∼0.09 Hz. Moreover, the existence of only one clear maximum per frequency points towards a reliable measurement. Frequencies in the 3-10 s period band are often characterized by several maxima of similar amplitudes, i.e. by several yellow bands in the group velocity plots.

Group velocity results
We depict the manually selected group velocity values by red dots superimposed on the black dots (see Figs 2f, 5f-h, 6d and e, 6j and k, and 6p and q). The results of pairs RR34-RR50 and LAHA-RR52 demonstrate that reliable group velocity estimates can be derived from OBS seismograms for very large interstation distances of ∼2000 km.
We note that group velocity measurements on OBS-to-OBS correlations are particularly successful when at least one of the two stations is an INSU station. The reason for this becomes obvious when we compare the relative noise levels, quantified as probabilistic power spectral density plots (PPSD, McNamara & Buland 2004) of a DEPAS OBS and an INSU OBS (Fig. 7). The two sensors are situated only 150 km apart in the same abyssal plain at similar water depths, and hence presumably in similar ambient noise conditions. Yet DEPAS OBS RR26 features a dramatically higher noise level than INSU OBS RR28 for periods above ∼5 s, which is well within the period bounds of our study (3 and 50 s, orange lines in Fig. 7). The noise levels of RR26 even exceed those of the New High Noise Model (NHNM) derived by Peterson (1993) for terrestrial stations, whereas INSU OBS RR28 stays well below. Stähler et al. (2016) calculated PPSD plots for each RHUM-RUM OBS and showed that the noise level observations for RR26 and RR28 are generally valid for DEPAS/GEOMAR and INSU OBSs of the RHUM-RUM experiment. They concluded that the low power consumption of DEPAS seismometers (Güralp CMG-40T-OBS) probably comes at the cost of very high instrumental self-noise (above NHNM) of the vertical component. This problem was further investigated by Stähler et al. (2018), who recommended to use CMG-40T-OBS sensors only for short-period applications. The presence of this very high self-noise explains the difficulty for DEPAS-to-DEPAS PCCs to converge to the Green's function, which hampers group velocity measurements. Fig. 6 depicts that group velocity estimates of INSU-to-INSU OBS pairs are of comparably high quality as for land-to-OBS pairs (especially land-to-INSU) and for land-to-land correlations. Figs 6(g)-(l) illustrate a successful noise correlation between a land station and an OBS, an achievement that has been reported in only a few prior studies (e.g. Corela et al. 2017;Hable et al. 2018). All three station pairs presented in Fig. 6 lack a clear wave package signal in the 3-10 s period band, both in the Rayleigh-wave arrival window (shaded blue) and in the entire lag window. Considering the large interstation distances of 1000-2000 km, this observation is not surprising because the low-velocity (Scholte) wave trains of Fig. 3(c) would appear at later lapse times >1000 s.
From among the 1119 station pairs, we identify 628 station pairs that could contain group velocity values usable for tomography. These values are given in Fig. 8 by the light blue curves. The abrupt truncation of many curves at 0.1 Hz can be explained by the frequent absence of Rayleigh waves in the 3-10 s period band, which in most cases prevents reliable group velocity estimates. Most of the identified 628 station pairs provide low-to moderate-quality group velocity measurements. One reason is the high local noise level of OBSs (that may be due to bad coupling with the ground, currents, etc.) in comparison to land stations. The other limiting factor is the high self-noise of the German DEPAS/GEOMAR OBSs, which comprise 29 of 38 OBSs. The issue is most pronounced when correlating DEPAS/GEOMAR OBSs with each other, the large majority of OBS station pairs. In order to ensure a robust inversion result for this first long-range tomography of its kind, we opt for strict quality control and accept only the highest-quality group velocity curves for further processing. This selection yields 100 group velocity measurements (the round number is coincidental), represented by the dark blue curves in Fig. 8. All group velocity plots represented in the figures represent one of these 100 best station pairs. Fig. 9 presents group velocity maps derived from our 100 accepted measurements in four different periods (12, 14, 17 and 20 s). 85 out of the 100 best station pairs are either land-to-OBS or OBS-to-OBS correlations. At least one INSU OBS is part of 82 per cent of these 85 station pairs, clearly reflecting the superior data quality of INSU OBSs compared to DEPAS/GEOMAR OBSs for the purpose of group velocity calculations.

Tomographic inversion of the group velocity data
The scientific aim is to improve and complete the Rayleigh-wave tomography of Mazzullo et al. (2017), specifically by constraining 3-D crustal and lithospheric structures which were poorly resolved by the longer-period measurements on earthquake-generated seismograms. Mazzullo et al. (2017) used ∼300 earthquakes recorded by 130 seismic stations (mainly RHUM-RUM stations) to measure Rayleigh-wave group velocities in the period range of 16-250 s and phase velocities of 30-300 s. The sensitivity of these periods to the earth's upper ∼40 km is very limited. Our noise-derived group velocities between 3 and 50 s period are mainly sensitive to depths above ∼50 km. Hence the two data sets are complementary and are used in a joint tomographic inversion here.
First, we check if the ray coverage of our 100 retained, highestquality measurements is sufficient for tomography (as an alternative might be to relax our quality control standards). For this purpose, we perform tomographic resolution (checkerboard) tests, as shown in Fig. 10. We forward-predict group velocity curves through a hypothetic crust consisting of squares of alternating high and low velocity, superimposed on the 3-D reference model, which is described below. The length of the squared velocity anomalies is 500 km; the velocity anomalies range from −4 to +4 per cent. We use the same inversion parameters (correlation length 200 km, a priori error 0.1 km s −1 ) as for the subsequent inversion of real data. Fig. 10 shows the test input and the recovered, regionalized output model for a period of 16 s and the 92 wave paths accepted for this period. The checkerboard pattern is recovered well beneath the instrumented area. Recovery outside this area is not expected since noise correlations yield Green's functions and sensitivities between stations only. The good recovery of the checkerboard pattern validates our decision to admit only the very highest quality measurements (100 out of 1119), even though this means discarding over 90 per cent of unique wave paths.  Mazzullo et al. (2017); second column resulted from the joint inversion of the 100 best group velocity curves obtained by this study, with the data set of Mazzullo et al. (2017). Third column represents the difference between the two results (model of Mazzullo et al. (2017) subtracted from the joint inversion). Velocity variations range from -6 per cent (red) to +6 per cent (blue) for 20 and 30 km depth and from -4 per cent (red) to +4 per cent (blue) for 40-80 km depth. The variations refer to a depth-dependent average velocity value given at the bottom right of each panel. The main tectonic structures of the region are indicated in the bathymetry map in panel (p). Plate boundaries (spreading ridges) are indicated by green lines. The joint inversion of both data sets follows the method described by Mazzullo et al. (2017). A locally modified (i.e. slightly smoothed) version of CRUST1.0 (Laske et al. 2013) is used as a priori 3-D crustal model, which is underlain by the spherically symmetric PREM reference model for the mantle (Dziewonski & Anderson 1981), albeit with a modified (smoothed) 220 km discontinuity. CRUST1.0 can be interactively visualized in the SubMachine tomography web portal (Hosseini et al. 2018). The calculation of the S-wave velocity model is based on the a priori model and the weighted sum of B-spline basis functions: W m N m, 2 (z) is the mth non-uniform quadratic B-spline basis function (De Boor 1978), M is the number of B-spline basis functions, and W m is the weighting coefficient. V 0 s (z) is the a priori S-wave velocity reference model.
The transdimensional inversion is composed of two nested loops. The inner loop calculates the optimum model weighting coefficients W m for a given spline basis N m, 2 , by minimizing the data misfit function (χ 2 d ) between measured and modelled velocities using the simulated annealing optimization algorithm (Press 2007). The outer loop determines the optimum spline basis (the shapes and the number of splines M). It uses the golden section search method (Press 2007) to minimize the expression (χ 2 d + χ 2 m )/2 as a function of M, where χ 2 d is the outcome of the inner-loop minimization and χ 2 m is the model variance quantity defined by eq. (B3) in Haned et al. (2016). The procedure of the transdimensional inversion yields the optimal number of splines and their shapes, which reflects the best compromise between data fit and model smoothness. A more detailed method description of the inversion is given by Haned et al. (2016) and Mazzullo et al. (2017). In practice, the inversion is performed in two steps. First, we use a correlation length of 800 km to regionalize our noise data set and the data set of Mazzullo et al. (2017). The inversion of the regionalized data is performed according to eq. (6) with the smoothed CRUST1.0 and smoothed PREM as a priori model. The obtained low-resolution model together with the original CRUST1.0 serves then as starting model for the mantle and the crust, respectively, for the second inversion, that uses a correlation length of 200 km to retrieve a model of higher resolution (presented in Section 5.2). This twostep inversion procedure is chosen to resolve large-scale velocity structures at deeper layers (from long-period data) as well as smallscale structures at shallower depths (from short-period data).

S V -wave velocity model
We obtain a 3-D S V -wave velocity model by inversion of our noisederived, regionalized group velocity data jointly with the phase and group velocities of Mazzullo et al. (2017) according to eq. (6) and using the 3-D reference model described earlier. Fig. 11 compares the tomography results of Mazzullo et al. (2017) (left column) with those of our joint inversion (middle column) for depths of 20, 30, 40, 60 and 80 km. The last column of Fig. 11 highlights the differences between the two models by plotting the velocity variations of Mazzullo et al. (2017) subtracted from our joint inversion model. Each panel shows S V -wave velocity variation (in per cent) from an average, absolute velocity value at this depth, which is given at the bottom right of each panel. For easier comparison, this average velocity value is chosen to be identical for the two tomography models Figure 13. Five panels show the same geographical section through La Réunion and Mauritius (location is given by the red line R1-R2 on the maps) of five different velocity models, from the surface to 180 km depth: (a) our joint inversion of noise and seismicity data; (b) the reference or starting model for the inversion of (a), a laterally smoothed version of CRUST1.0 (Laske et al. 2013) underlain by PREM (Dziewonski & Anderson 1981); (c) the difference "(a) minus (b)", reflecting the information contained in the noise and seismicity data; (d) the result of Mazzullo et al. (2017), who used the same reference model; and (e) the difference between our joint inversion and the model of Mazzullo et al. (2017), i.e. "(a) minus (d)", which reflects the information added by our noise data set. All cross sections are plotted in the same colour scale against the same, constant reference velocity of 4.31 km s −1 . Topography and bathymetry along the profiles are given by the red curve above the cross sections. at a given depth. Red and yellow colour shades indicate that seismic velocity is estimated to be lower than the average velocity at this depth, whereas blue shades indicate higher-than-average velocities. Fig. 11(p) shows the bathymetry of the studied region together with the main tectonic structures.
The two tomography models differ mainly at crustal and lithospheric depths (∼20-30 km, Figs 11a-f), which are constrained primarily by our new noise data. The panels in Fig. 11 show the following, major differences: (i) Slower crustal velocities at spreading ridges. Yellow and red shades beneath the Central Indian (CIR) and Southwest Indian Ridges (SWIR) in Fig. 11(c) indicate that the noise-derived data sense slower structure at 20 km depth than the earthquake-derived data of Mazzullo et al. (2017), which have good sampling coverage beneath the ridges, but not much sensitivity to these shallow depths. Our noise data set has very good sensitivity since these two midocean ridges were actually instrumented by RHUM-RUM OBSs. As regions of mantle upwelling, decompression melting, and very thin lithosphere, mid-ocean ridges generally appear as localized bands of slow seismic anomalies in seismic tomography models. This is also true for the models of Mazzullo et al. (2017) at depths >30 km, where a velocity gradient perpendicularly away from a ridge is clearly evident (red to yellow to blue). Such a gradient is weak or absent at 20 km in the model of Mazzullo et al. (2017) (Fig. 11a), but is evident in the joint model (Fig. 11b) and in the difference plot (Fig. 11c). Note that below 40 km, the differences between the two tomographies are marginal as expected, given that our noise data set has negligible sensitivity to those depths. Below 40 km, the slowness of the spreading ridges is rendered by the earthquake data of Mazzullo et al. (2017); at shallower depths, the same is accomplished (only) by the noise-derived data. The significant information gain for depths <40 km is reflected in Fig. 12, where dispersion curves and their corresponding S V -wave velocity profiles are compared for a location on the CIR, east of Rodrigues (20.5 • S, 68.5 • E). Our noise-derived group velocities (red in Fig. 12b) cover the period range <30 s, which is not covered by the earthquakederived data of Mazzullo et al. (2017) (blue and cyan) and which is sensitive to shallowest structure. The vertical velocity profile inverted jointly from both data sets requires significantly slower S V -wave velocity in the upper 35 km (black in Fig. 12a) compared to the model of Mazzullo et al. (2017) (orange), and moderately faster velocity between 35 and 100 km. This indication of very slow structure shallowly beneath a mid-ocean ridge segment is plausible, especially beneath this plume-influenced CIR segment.
(ii) Slower Madagascar Plateau. South of Madagascar at latitudes of ∼27 • S and 30 • -35 • S, two velocity anomalies that were already intensely slow in the inversion of Mazzullo et al. (2017) are rendered even slower by the joint inversion at 20 km. These anomalies coincide with the thickened oceanic crust of the Madagascar Plateau (see also Fig. 1a), which is attributed to flood basalts from the plume head of the Marion hotspot (Storey et al. 1995). The crustal structure of Madagascar itself is not modified by our noise data, which basically do not sample beneath the continent, see Fig. 10. Madagascar is underlain by truly continental, 20-35km-thick crust (Rindraharisaona et al. 2017), which is built into the prior crustal model as very slow structure.
(iii) Slower Mascarene Plateau. The Mascarene Plateau extends from Mauritius to the Seychelles and is characterized by very slow velocity anomalies at 20 km depth in Mazzullo et al. (2017) (Fig. 11a). It is rendered even slightly slower by our joint inversion (Figs 11b and c). The Mascarene Plateau is thought to represent part of the Réunion hotspot track, that is its oceanic crust would have been magmatically thickened by mantle plume activity.
(iv) Slow Rodrigues Ridge. Marked by a thickened, east-west striking ridge between Mauritius and the CIR, Rodrigues Ridge showed no slow-velocity anomaly at 20 and 30 km depth in the model of Mazzullo et al. (2017), but it does in the joint inversion (Figs 11b and e). Rodrigues Ridge has long been hypothesized to represent a manifestation of hotspot-ridge interaction: the surface record of a leaky asthenospheric flow channel that transports plume heat and material to the spreading ridge (Morgan 1978). Shallow slow anomalies would be expected here from recent or ongoing volcanism and from crustal thickening (Morgan 1978;Dyment et al. 2007).
(v) Faster old oceanic lithosphere. The high-velocity anomaly that extends beneath much of the Mascarene ocean basin between Madagascar and La Réunion at 20 and 30 km depth is rendered even faster by the joint tomography compared to Mazzullo et al. (2017). This region hosts some of the oldest (hence fastest) lithosphere in the Indian Ocean, and its seismic signature seems to have been underestimated by Mazzullo et al. (2017). More generally, the joint tomography indicates faster oceanic lithosphere in almost all sampled regions that are not sites of recent volcanism or ancient crustal thickening.
(vi) Slower crust and faster mantle beneath the hotspot islands of La Réunion and Mauritius. At 20 km, these two islands show up as circular low-velocity anomalies of ∼100 km diameter, immersed in the seismically fast surroundings of Mascarene basin crust (Fig. 11b). These details are absent in the model of Mazzullo et al. (2017, Fig. 11a), and Fig. 11(c) renders this difference very crisply. At 30 km, the situation is inverted in that the islands now appear as very fast dots in a moderately fast environment, which again is particularly clear in the difference plot Fig. 11(f). By comparison the model of Mazzullo et al. (2017) makes a blurred impression. At 80 km, the localized fast anomaly beneath La Réunion give way to a spatially broader, low-velocity anomaly (Fig. 11n), which can probably be attributed to high-temperature upwelling through the Réunion mantle plume (Mazzullo et al. 2017).

La Réunion and Mauritius
The situation of the hotspot islands is clarified by cross sections through La Réunion and Mauritius in Fig. 13. The joint tomography model in Fig. 13(a) shows the crust as shallow, very slow layer (red), underlain by very fast lithosphere (dark blue), and slow asthenosphere (yellow). The asthenosphere is weak or absent beneath the old lithosphere of the Mascarene Basin and beneath Madagascar; further east where it is well developed, asthenosphere is underlain by faster mesosphere from about 160 km down. Compared to our simple starting model in Fig. 13(b), the noise and earthquake data introduce important changes, most obviously the presence of an asthenosphere, a much faster lithosphere, and significant undulations on the Moho and on the lithosphereasthenosphere boundary. Differences to the result of Mazzullo et al. (2017) (Fig. 13e) are limited to the upper 50 km, which reflects the relatively shallow penetration of our noise data. However, these shallow differences are substantial and are concentrated beneath the La Réunion and Mauritius hotspots.
Primary structural features contributed by the new noise data, which are absent in both the reference model and the model of Mazzullo et al. (2017), are downward undulations of Moho and lithosphere-asthenosphere boundary beneath La Réunion and Mauritius, anticorrelated with the topography of the two volcano islands rising from the deep Mascarene Basin (red profile line in Fig. 13). The Moho is depressed by 5 km beneath La Réunion and by 10 km beneath Mauritius. In the difference plot of Fig. 13(c), this locally thickened crust shows up as two very slow (red) lenses beneath the hotspots, where mantle in the reference model was replaced by crust. The same two lenses in Fig. 13(e) characterize the difference between our result and that of Mazzullo et al. (2017).
This tomographically observed crustal thickening beneath the active hotspot of La Réunion and its predecessor on the hotspot track (Mauritius) is broadly consistent with receiver functions indicating Moho depths of ∼12 km (Fontaine et al. 2015) for La Réunion (using one permanent station), and ∼10-20 km (Fontaine et al. 2015;Singh et al. 2016) for Mauritius (using a permanent station and a network of stations across the island). We concur with these authors that crustal thickening very likely reflects the magmatic underplating due to hotspot activity.
Relative to the common starting model (Fig. 13b), both our joint model (Fig. 13a) and the model of Mazzullo et al. (2017, Fig . 13d) indicate a much faster lithosphere beneath the Mascarene Basin, which includes the La Réunion/Mauritius area. However, the joint model is the only one to resolve localized thickening of the lithosphere under the two hotspots, which spatially mirrors the Moho thickening. If lithosphere southwest of the two islands in Fig. 13(a) may be considered as 'background' Mascarene Basin lithosphere, then the bottom of this layer is depressed by about 10 km beneath La Réunion and also beneath Mauritius. This lithospheric thickening is presumably the fast seismic signature of depleted and dehydrated mantle from which the hotspots sourced and are sourcing the melts for the observed crustal thickening above.
Just northeast of Mauritius, around a longitude of 58.5 • E in Fig. 13(a), the lithosphere in our joint model thins abruptly, accompanied by a transition from seismically very fast (dark blue) to only moderately fast (light blue) velocities. At the surface, this step change coincides with the transition from older seafloor produced by the palaeo-spreading ridge of the Mascarene Basin to significantly younger lithosphere produced by the (still-active) CIR. This contrast in lithospheric thickness is also present in the model of Mazzullo et al. (2017) but is more pronounced in our model, and more sharply localized to 58.5 • . The imaged structures are consistent with receiver function results by Fontaine et al. (2015), who found much thicker lithosphere beneath La Réunion (70 km) and Mauritius (50 km) than beneath Rodrigues Island (25 km).
The asthenosphere is slowest beneath the hotspot area, presumably reflecting ongoing, hot plume upwelling. Even though these asthenospheric depths must be mainly constrained by the earthquake data of Mazzullo et al. (2017), the joint inversion shows the slowest velocities more narrowly focused beneath La Réunion than the model of Mazzullo et al. (2017), which is presumably a benefit of our model's proper accounting for the localized lithospheric thickening overhead.

Rodrigues ridge
When Morgan (1978) first hypothesized the aseismic Rodrigues Ridge to be a manifestation of hotspot-ridge interaction, he called it 'a second type of hotspot island', highlighting that the same process of hot, asthenospheric melting would have thickened the oceanic crust around both types of volcanic edifices. Since our joint tomography sharply resolves crustal and lithospheric thickening beneath La Réunion and Mauritius, the same finding might be anticipated for Rodrigues Ridge. This is however not the case, or to a much lesser extent. Fig. 14 cuts perpendicularly through Rodrigues Ridge. An arrow in the bathymetry maps points to the western tip of this narrow, east-west striking ridge, which runs from near Mauritius almost to the CIR. Our joint model in Fig. 14(a) shows a localized downward undulation of the lithosphere beneath the topographic high of Rodrigues Ridge, analogous to the observed lithospheric thickening beneath La Réunion and Mauritius in Fig. 13(a). But in contrast to those hotspots, the lithosphere is only weakly expressed beneath Rodrigues and the crust does not appear thickened. Hence there is only weak evidence for lithospheric depletion under Rodrigues Ridge, and even less for crustal underplating. The difference between our joint inversion and that of Mazzullo et al. (2017, Fig. 14d) is that a lithospheric anomaly is localized under Rodrigues in the first place, and that the asthenosphere rises to shallower levels in the wider Rodrigues area, which is predicted by the hypothesis of Morgan (1978). This rise of asthenosphere is also seen in Fig. 14(c), which plots the difference between our model and the almost horizontally layered starting model of Fig. 14(b). The noise data constrain a less strongly expressed crust than the starting model (light blue 'correction' layer in Fig. 14c) and a pronounced asthenosphere that rises to its shallowest levels beneath Rodrigues Ridge. Fig. 15 shows cross sections along the length of Rodrigues Ridge. Our joint model (Fig. 15a) and the model of Mazzullo et al. (2017, Fig. 15d) agree on abrupt lithospheric thinning and weakening just east of Mauritius, and on gradual eastward thinning of the lithosphere towards the CIR. While this thinning is monotonous in the model of Mazzullo et al. (2017), our model shows some topography on the lithosphere-asthenosphere boundary, which seems anticorrelated with surface topography (see red profile line). Hence it may indicate lithospheric depletion in the places where surface topography would suggest thickened crust. Our model, however, does not image a corresponding, thickened crustal root. It merely retains the gradual eastward thinning of crust that was already present in the starting model (Fig. 15b). Hence there is no clear evidence of crustal thickening (nor thinning) under Rodrigues Ridge, and only weak evidence for lithospheric depletion-in contrast to clear signals on both counts beneath La Réunion and Mauritius.

D I S C U S S I O N
We present the first crustal and lithospheric tomography of S V -wave velocity of the western Indian Ocean between Madagascar in the west, and the Central and the Southwest Indian ridges in the east and south, respectively; a region that is roughly centred on the hotspot of La Réunion. A new data set of 100 group velocity curves between 3 and 50 s was inverted jointly with the Rayleigh-wave phase and group measurements obtained by Mazzullo et al. (2017) from earthquake sources and the same RHUM-RUM stations used here.
Interstation distances of up to 2000 km far exceed those measured and inverted in previous noise correlation studies in the oceans.
From our total data set of 1119 station pairs, we identify 628 station pairs that might contain group velocity measurements of suitably high confidence to be used in tomography. In the end, we decide to use only 100 measurements corresponding to our best group velocity curves to ensure the most robust tomography possible. The reason for this limited number is the high self-noise of the DEPAS OBS type (Güralp CMG-40T-OBS sensors), which hampers reliable group velocity estimation especially on DEPASto-DEPAS station pairs, which amount to 406 out of 1119 station pairs. INSU OBSs are not affected by such high self-noise and yield results of comparable quality as land stations.
Our inversion of noise-derived group velocities jointly with the longer-period data of Mazzullo et al. (2017) detects pronounced crustal and lithospheric anomalies compared to the reference model (CRUST1.0 plus PREM), most of which reflect known tectonic structures (Section 5.2). Compared to the earthquake-generated data of Mazzullo et al. (2017), our addition of noise-derived group velocities act to render crustal and lithospheric velocity anomalies more pronounced, both in the fast and the slow directions. We find localised crustal and lithospheric thickening beneath the two hotspot-generated islands of La Réunion and Mauritius, which had not been picked up by Mazzullo et al. (2017), but is a priori expected, and independently confirmed by receiver function studies (Fontaine et al. 2015;Singh et al. 2016). In future work, group velocities of the 628 station pairs identified as potentially usable could be used to investigate whether they can further refine the model of Fig. 11.
There is not much prior imaging work to compare to. Ma & Dalton (2017) investigated the lithosphere of Africa and the Indian Ocean with combined earthquake and ambient noise tomography, but limited to the sparse, permanent land and island stations and without the benefit of data from RHUM-RUM stations. Hence a reasonable comparison must be limited to large-scale anomalies in light. They derived phase velocity maps at a period of 30 s, which might allow for a rough comparison with our tomography models at 20-40 km depth, but a detailed comparison of phase velocity maps and a 3-D S-wave velocity model is not straightforward. In agreement with our results, Ma & Dalton (2017) found a slowvelocity anomaly beneath the CIR and a high-velocity signature between Madagascar and La Réunion (Mascarene Basin).
An interesting feature of Figs 11 and 13(a) is the high-velocity anomaly at ∼30-70 km beneath La Réunion. Similar velocity signatures have been reported beneath at least two other hotspot islands. Villagómez et al. (2007) discovered a high-velocity lid beneath the Galapagos archipelago at a depth of 40-70 km, above a slow-velocity plume signature in the asthenosphere. Schlömer et al. (2017) detected a circular high-velocity anomaly at a depth of 50 km beneath the Tristan da Cunha archipelago. The strikingly consistent depths of these observations suggest an identical mechanism for the high-velocity anomaly under La Réunion. Villagómez et al. (2007) suggested that the hot mantle plume caused a remelting of lithosphere, accompanied by depletion and dehydration, which would appear as high-velocity anomaly. Schlömer et al. (2017) speculated that the high-velocity body could be highly depleted plume material that has accreted to the lower lithosphere. The high-velocity anomaly beneath La Réunion had already been detected by Mazzullo et al. (2017), but due to the sparse data coverage in their model at shallow layers, it appears not as localised anomaly, but was blurred by the fast-velocity signature of the cold oceanic lithosphere. This reflects the important information gain due to the joint inversion.

C O N C L U S I O N S
Group velocities between 3 and 50 s period are calculated from OBS-to-OBS, land-to-OBS, and land-to-land noise crosscorrelations. The 100 very best measurements are inverted for 3-D S-wave velocity structure of crust and lithosphere, jointly with the earthquake-generated data set (periods >16 s) of Mazzullo et al. (2017). Our relatively shorter-period data are ideal for adding constraints on the the shallowest (i.e. crustal and lithospheric) depths of the S V -wave velocity model of Mazzullo et al. (2017), who used phase and group velocities with periods of 30-300 and 16-250 s, respectively. The joint inversion provides the first tomographic S Vwave velocity image of the crust and the uppermost mantle of the western Indian Ocean between Madagascar and the three spreading ridges, centred on the hotspot island of La Réunion. The tomography model is made available as electronic supplement to this article.
We demonstrate that high-quality group velocity estimates can be obtained from OBS-to-OBS noise correlations with very large interstation distances of ∼2000 km, while prior studies had reported successful OBS group velocity estimates from distances of at most a few hundred kilometres (e.g. Yao et al. 2011;Zha et al. 2014;Ball et al. 2016;Corela et al. 2017;Ryberg et al. 2017). Our results mean that the noise-correlation method is applicable even for relatively sparsely instrumented areas, which includes most oceanic regions. In addition, we show that land-to-OBS correlations provide highquality group velocity measurements, even though land stations and OBSs sample in different crustal conditions (e.g. Corela et al. 2017;Hable et al. 2018).
We newly image several slow-velocity signatures around 20-30 km depth, not visible in the model of Mazzullo et al. (2017). These signatures can clearly be associated with known tectonic structures of thickened crust and/or volcanic activity (La Réunion, Mauritius, Rodrigues Ridge, Central Indian Ridge and Madagascar Plateau).
The islands of La Réunion and Mauritius are characterized by thickened, slow crust probably indicating magmatic underplating due to hotspot activity. These increased crustal thicknesses are accompanied by undulations of the fast lithosphere extending 10 km deeper than the surrounding area down to a depth of ∼70 km. These high-velocity signatures beneath La Réunion and Mauritius are consistent with tomographic results for the same depths beneath some other hotspot islands (Galapagos, Tristan da Cunha) and can presumably be attributed to highly depleted and dehydrated material, either of remelted lithosphere or of underplated plume material. In contrast to La Réunion and Mauritius, crustal and lithospheric thickening beneath the Rodrigues Ridge can not be observed or is much less pronounced.

A C K N O W L E D G E M E N T S
We thank Céline Davy, Eric Delcher, Fabrice Fontaine and John-Robert Scholz for their contributions to installing and maintaining the temporary RHUM-RUM island seismometers; and Valérie Ferrazzini for providing seismological data from the monitoring network of Observatoire Volcanologique du Piton de la Fournaise on La Réunion island. We acknowledge the GEOSCOPE network (https://doi.org/10.18715/GEOSCOPE.G) and Mauritius Meteorological Services for installing and maintaining permanent stations RER and MRIV, respectively. We thank the participants and crew of cruises R/V Marion Dufresne MD192 and R/V Meteor M101.
This study arose as part of the RHUM-RUM project (www.rh um-rum.net), funded by Deutsche Forschungsgemeinschaft in Germany (grants SI1538/2-1 and SI1538/4-1) and by Agence Nationale de la Recherche in France (project ANR-11-BS56-0013). K.S. received additional funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme