WINTERC-G: mapping the upper mantle thermochemical heterogeneity from coupled geophysical–petrological inversion of seismic waveforms, heat flow, surface elevation and gravity satellite data

S U M M A R Y We present a new global thermochemical model of the lithosphere and underlying upper mantle constrained by state of the art seismic waveform inversion, satellite gravity (geoid and gravity anomalies and gradiometric measurements from ESA’s GOCE mission), surface elevation and heat flow data: WINTERC-G. The model is based upon an integrated geophysical–petrological approach where seismic velocities and density in the mantle are computed within a thermodynamically self-consistent framework, allowing for a direct parametrization in terms of the temperature and composition variables. The complementary sensitivities of the data sets allow us to constrain the geometry of the lithosphere–asthenosphere boundary, to separate thermal and compositional anomalies in the mantle, and to obtain a proxy for dynamic surface topography. At long spatial wavelengths, our model is generally consistent with previous seismic (or seismically derived) global models and earlier integrated studies incorporating surface wave data at lower lateral resolution. At finer scales, the temperature, composition and density distributions in WINTERC-G offer a new state of the art image at a high resolution globally (225 km average interknot spacing). Our model shows that the deepest lithosphere– asthenosphere boundary is associated with cratons and, also, some tectonically active areas (Andes, Persian Gulf). Among cratons we identify considerable differences in temperature and composition. The North American and Siberian Cratons are thick (>260 km) and compositionally refractory, whereas the Sino-Korean, Aldan and Tanzanian Cratons have a thinner, fertile lithosphere, similar to younger continental lithosphere elsewhere. WINTERC-G shows progressive thickening of oceanic lithosphere with age, but with significant regional differences: the lithospheric mantle beneath the Atlantic and Indian Oceans is, on average, colder, more fertile and denser than that beneath the Pacific Ocean. Our results suggest that the composition, temperature and density of the oceanic mantle lithosphere are related to the spreading rate for the rates up to 50–60 mm yr–1: the lower spreading rate, the higher the mantle fertility and density, and the lower the temperature. At greater spreading rates, the relationship disappears. The 1-D radial average of WINTERC-G displays a mantle geothermal gradient of 0.55–0.6 K km–1 and a potential temperature of 1300–1320 ◦C for depths >200 km. At the top of the mantle transition zone the amplitude of the maximum lateral temperature variations (cratons versus hotspots) is about 120 K. The isostatic residual topography values, a proxy for dynamic topography, are large (>1 km) mostly in active subduction settings. The residual isostatic bathymetry from WINTERC-G is remarkably similar to the pattern independently determined based on oceanic crustal data compilations. The amplitude of the continental residual topography is relatively large and positive (>600 m) in the East European Craton, Greenland, and the Andes and Himalayas. By contrast, central Asia, most of Antarctica, southern South America and, to a lesser extent, central Africa are characterized by negative residual topography values (>–400 m). Our results show that a substantial part of the topography signal


I N T RO D U C T I O N
Geochemical and petrological laboratory studies and mineral physics allow us to infer the mechanical, chemical and electrical properties of rock aggregates at the high temperature and pressure conditions that occur within the Earth (e.g. Xu et al. 2008;Connolly 2009;Stixrude & Lithgow-Bertelloni 2011;Kuskov et al. 2014). Direct rock sampling of the Earth provides crucial constraints on the crust and mantle composition and temperature. Unfortunately, such sampling is limited to approximately the upper 12 km of the crust, exhumed deep sections and mantle xenoliths, sparsely distributed over the surface of the Earth. It thus needs to be complemented by indirect geophysical observations collected at and above the surface. Established methods of seismic tomography, topography, and gravity data analysis constrain distributions of seismic velocity and density at depth, all depending on temperature and composition of subsurface rocks. Thermodynamic and petrological links between seismic velocities, density, electrical conductivity, viscosity, melt, water, temperature, pressure and composition within the Earth can be modelled accurately using methods of computational petrology and mineral physics experiments. The growth of very large terrestrial and satellite geophysical data sets over the last years, together with the advancement of petrological and geophysical modelling techniques, now present an opportunity for global, thermochemical 3-D imaging of the lithosphere and underlying upper mantle with unprecedented resolution. However, modelling and interpretation of multiple data sets provide a multifaceted image of the true thermochemical structure of the Earth that needs to be appropriately and consistently integrated. A simple combination of gravity, geodynamics, petrological and seismic models alone is insufficient due to the non-uniqueness and different sensitivities of these models, and the internal consistency relationships that must connect all the intermediate parameters describing the Earth.
The main goal of the work presented here is to combine in an integrative manner global seismic data, terrestrial and satellite gravity data, computational petrology and mineral physics to image the thermochemical heterogeneities in the lithosphere and underlying upper mantle at a global scale. We present WINTERC-G, a Waveform tomography and Gravity (geoid and gravity anomalies and gradiometric measurements from ESA's GOCE mission) INversion for the TEmpeRature and Composition of the lithosphere and upper mantle at global scale. In a first step, we invert surface-wave, Rayleigh and Love dispersion curves from a high resolution global data set measured using waveform inversion, along with surface heat flow and elevation (isostasy), using a point-wise, non-linear, gradient-search inversion over a triangular grid with an average 225-km-lateral interknot spacing. In a second step we use a fully parallelized spherical harmonic formalism to invert satellite gravity field data in order to refine the initial crustal density and mantle composition distributions from the step 1. WINTERC-G is based upon an integrated geophysical-petrological approach where all relevant mantle rock physical properties modelled (seismic velocities and density) are computed within a thermodynamically self-consistent framework allowing for a direct parametrization in terms of the temperature and composition of the lithosphere-upper mantle.

S TAT E O F T H E A RT
The lithosphere is the rigid outermost layer of the Earth and the fundamental mechanical unit of the plate tectonics theory (e.g. Turcotte & Schubert 2002). Yet, there is no general agreement as to how to define the lower limit of the lithosphere globally (e.g. Eaton et al. 2009;Rychert et al. 2010;Vinnik et al. 2016). The various physical parameters/proxies used to define it lead, in some cases, to different estimates of its thickness around the world. Furthermore, the base of the lithosphere, the so called lithosphere-asthenosphere boundary (LAB), is commonly thought of as a first-order boundary or structural discontinuity (at least mechanically, by definition) although there are no strong a priori physical arguments to consider it a global sharp boundary according to the various physical properties commonly used to define it (e.g. seismic velocities, density, electrical conductivity, etc.). Over the last few years, significant scientific efforts have been made to characterize the thermal and compositional structure of the Earth's lithosphere and underlying mantle through geophysical, geochemical and geological data interpretation, mineral physics experiments and geodynamic numerical modelling at both local and global scales. The thermal structure of oceanic lithosphere is relatively simple, compared to continental lithosphere, although the thermal mechanism responsible for the oceanic lithospheric flattening at ages >80 Ma and the interaction with hot plumes are still unclear. The global chemistry of mid oceanic ridge basalts has been interpreted either in terms of large scale chemical (e.g. Niu & O'Hara 2008) or temperature (e.g. Dalton et al. 2014) upper mantle variations. In continental areas, observations from lithospheric mantle xenoliths and peridotitic massifs show that cratonic terranes are, on average, more refractory than younger and active terrains (e.g. Canil 2004;Griffin et al. 2009) and vertically stratified (e.g. Carlson et al. 2005). However, geophysical studies combining seismic, topography and geoid data suggest that lithospheric compositions are, on average, more fertile than reported in petrological databases, especially in thick cratons (e.g. Cammarano et al. 2011). Refractory mantle is depleted in incompatible elements that tend to be removed by melting processes (e.g. Al 2 O 3 , CaO) and enriched in elements that remain in the solid residue (MgO). The situation is reversed in fertile mantle and this affects the abundance and composition of the stable mineral phases (for a given temperature and pressure) as well as the bulk density and seismic velocities. Refractory upper mantle, for example, is relatively rich in olivine and contains little or no clinopyroxene resulting in lower bulk density values compared to fertile mantle under similar temperature and pressure conditions.
Seismic tomography provides a global 3-D image of the Earth's interior as maps of seismic velocity anomalies. Travel times of body (shear and compressional) waves, surface wave speeds, and sometimes normal modes are commonly inverted for the seismic velocity structure of the mantle (e.g. Romanowicz 2003;Rawlinson et al. 2010). Waveform tomography is a method that extracts detailed and accurate information on the Earth's seismic velocity structure from waveforms of both surface and body waves, presently from many hundreds of thousands of seismograms. While computational (spectral-element) full waveform propagation modelling is providing increasing precision and resolution, albeit at a high computational cost (e.g. Komatitsch & Vilotte 1998;Lekic & Romanowicz 2011;Zhu et al. 2012;Fichtner et al. 2013;Rickers et al. 2013;French & Romanowicz 2014;Capdeville & Métivier 2018), asymptotic approaches allow modelling very large data sets and reducing the impact of errors (e.g. event mislocations) by exploiting the data redundancy; they provide increasingly fine lateral resolution (≈300-150 km) globally (e.g. Schaeffer & Lebedev 2013) and at continental scales (e.g. Celli et al. 2000a,b;Fishwick 2010;Meier et al. 2016;Lebedev et al. 2018).
Tomography models are routinely used to infer mantle temperatures in virtue of the well-known dependence of elastic moduli on the mantle geotherm (e.g. Goes & van der Lee 2002;Shapiro & Ritzwoller 2004;McKenzie et al. 2005;Priestley et al. 2006;Priestley & McKenzie 2013;Simmons et al. 2010;Cammarano et al. 2011;Khan et al. 2011;Steinberger 2016;Cammarano & Guerri 2017). A commonly taken approach is to transform seismic velocity anomalies into temperatures, accounting for possible compositional effects in a more or less elaborate manner (e.g. Priestley & McKenzie 2013;Cammarano & Guerri 2017;Steinberger & Becker 2018). However, differences among alternative seismic models translate into temperature differences of 200-600 K,typically larger than chemical composition uncertainties. Seismic data are also sensitive to the density distribution within the Earth and, even though such dependence is generally not weak (e.g. Dahlen & Tromp 1998;Blom et al. 2017), models based on seismic tomography observations usually do not account explicitly for it as a model parameter.
Gravity anomalies (Bouguer, free air) are often used to constrain crustal geometry and density (e.g. Ebbing et al. 2006). Geoid anomalies are affected by mass/density anomalies over a wider range of depths than gravity anomalies (e.g. Bowin 2000;Ricard et al. 2006). Wavelength filtering of the geoid has been applied to study the Earth at different scales, from the crust (e.g. Sandwell & McKenzie 1989;Doin & Fleitout 2000) to the deep mantle (e.g. Hager 1984;Ricard et al. 1984;Forte & Mitrovica 2001). However, splitting the total gravity field via stripping off the effect of density layers is not straightforward, in particular due to: (i) the poor knowledge of the crustal structure both in terms of geometry and petrology (i.e. density) and (ii) the complexity of the Earth's gravity signal on the global scale, with contributions of similar amplitude but different frequency content from the crust, upper and lower mantle (e.g. Sebera et al. 2017). Gravity field measurements have proven to be helpful in determining the Earth's thermochemical field in virtue of density's relatively stronger dependence on rock composition compared to seismic velocities. Consequently, a number of previous studies have used seismic tomography and gravity data (or models based upon them) to infer the upper mantle structure (e.g. Forte & Perry 2000;Goes et al. 2000;Deschamps et al. 2001;Perry et al. 2003;Trampert et al. 2004;Cammarano & Romanowicz 2007;Simmons et al. 2010;Cammarano et al. 2011;Mosca et al. 2012, Kaban et al. 2015Afonso et al. 2019). Such studies generally consider either simplified empirical relationships between shear wave velocity (Vs) and density anomalies (scaling factors) that neglect thermodynamic self-consistency (link of seismic velocities and density of mantle rocks-through temperature, pressure and bulk composition), or a simplified a priori regionalization of the compositional domains (i.e. oceans, continental lithosphere of different age) that can bias the results. Another common limitation arises from the large uncertainty introduced in gravity field and surface elevation (isostasy) modelling by assuming sparsely constrained crustal models based on controlled-source seismic data (e.g. CRUST1.0 and earlier versions, Laske et al. 2013) specially in poorly sampled regions like Africa and south America (e.g. Szwillus et al. 2019).
Satellite gravity data are a unique source of information on the density structure of the Earth due to its global and relatively uniform coverage (lateral resolution of 80-100 km currently), which complements terrestrial gravimetric measurements. The data accumulated over the decades of satellite missions have constrained several spherical harmonic models of the Earth's gravity field with different resolutions (cf. http://icgem.gfz-potsdam.de/home) that are regularly used for modelling purposes at global and regional scales.
The GOCE satellite mission, launched in 2009 by the European Space Agency (ESA), included, for the first time, an on-board threeaxis gradiometer able to measure the Earth's gravity gradients at the satellite height (e.g. Pail et al. 2011). This new type of gradiometric satellite measurement is more sensitive to spatial structure and directional properties of the Earth's density distribution than classical observations on gravitational intensity (gravity and geoid anomalies). Pioneering studies have showed very promising results in imaging the Earth's crustal (e.g. Reguzzoni et al. 2013) and mantle (Panet et al. 2014;Fullea et al. 2015; density structure using this new type of data, although its full potential is yet to be fully exploited. Gravity-related observables sense subsurface mass anomalies but their inversion alone for the density distribution within the Earth is an ill-posed and highly non-unique problem that requires regularization and smoothing (a priori assumptions), often requiring additional and independent constraints. Furthermore, crustal and mantle density models obtained from conversion of tomography velocity models usually fail to explain the observed gravity field data, predicting larger amplitudes than expected (e.g. Cammarano et al. 2011). This is partially due to the limitations of density-velocity conversions (inaccurate/inconsistent petrological and/or thermal modelling) or to the lack of a consistent joint modelling/inversion of seismic tomography and gravity data. In addition, the observed gravity data reflects the Earth's approximate state of lithostatic equilibrium (isostasy), with different layers contributing to the total signal at different wavelengths and with various amplitudes. This circumstance further complicates the task of separating the gravity contributions from the crust, upper and lower mantle (e.g. Sebera et al. 2017;Afonso et al. 2019).
The Earth's surface elevation is the result of various processes acting at different timescales. It is well known from the analysis of the Earth's gravity field that most of the surface elevation is isostatically compensated within the outermost mechanically competent layer: the lithosphere (e.g. Ricard et al. 2006). Therefore, surface topography is the expression of the pressure balance at depth below the highly viscous lithosphere. The concept of dynamic topography stems from the transient (tens of Ma) surface deformation induced by mantle flow. As opposed to quasi-static isostatic elevation, dynamic topography depends upon moving density anomalies and viscosity stratification (e.g. Flament et al. 2013). A proxy for the dynamic topography is the so called residual isostatic topography or elevation: the difference between the observed elevation and an isostatically compensated topography according to some predefined lithospheric model. Such residual topography, disregarding density and geometry uncertainties inherent in any lithospheric model, is then attributed to density variations beneath the lithospheric thermal boundary layer.
In oceans, the isostatic contribution from the thin crust (6-10 km) is small and most of the topography comes from the relatively homogeneous mantle. Therefore, residual isostatic elevation in oceans can be measured comparing the observed bathymetry with the WINTERC-G: upper mantle thermochemical model 149 predictions from an age-dependent lithospheric plate or half-space cooling model after correcting for the sediment load (e.g. Hoggard et al. 2017;Rowley 2019). Continental lithosphere is more complex than its oceanic counterpart, and its surface topography reflects a considerably longer history and laterally diverse density structure. In particular, the contribution from the continental crust (20-60 km thick) is very relevant (e.g. Panasyuk & Hager 2000;Kaban et al. 2003) but remains largely unconstrained in many areas (e.g. Laske et al. 2013;Szwillus et al. 2019).
Geodynamic modelling quantifies the effects of mantle convection on surface tectonics (vertical and lateral flow) based on mantle density anomalies and rheology. First-order tectonic and topographic features can be reproduced by geodynamic approaches assuming a linear conversion/scaling factor between seismic velocity, density and temperature, and a radial viscosity profile with linear rheology, but the predicted flow directions locally fail to explain present-day deformation, as measured with GPS, and seismic anisotropy fabric in the mantle (e.g. Boschi et al. 2010). Furthermore, despite some long wavelength agreement, there are significant regional differences between dynamic topography estimated from mantle flow and the various estimates of residual isostatic topography (e.g. Flament et al. 2013). Other works based on recent high resolution surface wave tomography show a better match at smaller spatial scales in several regions (Steinberger 2016).
Converting seismic velocities or densities into the ultimately relevant temperature and composition conditions within the Earth is not straightforward due to the non-uniqueness of the problem and the different sensitivities of the multiple observables generally involved. Joint inversions of seismic and gravity data parametrized using simplified empirical relationships between shear wave velocity (Vs) and density anomalies have been implemented with a specific focus on the upper mantle (e.g. Forte & Perry 2000;Deschamps et al. 2001;Perry et al. 2003;Moulik & Ekström 2016). The main limitation of assuming scaling factors is due to the neglect of the thermodynamic link between seismic velocities and density of mantle rocks-through temperature, pressure and bulk composition variables (i.e. thermodynamic self-consistency). In order to capture the inherent non-linearity connecting the properties of subsurface rocks, integrated geophysical and petrological approaches built within a self-consistent thermodynamic framework have been developed for both forward (e.g. Afonso et al. 2008;Cobden et al. 2009;Fullea et al. 2009) and inverse, non-linear probabilistic modelling (Khan et al. 2008(Khan et al. , 2011Afonso et al. 2013aAfonso et al. ,b, 2016. A clear strength of such non-linear probabilistic approaches is their ability to produce rubust quantitative estimates of model resolution, nonuniqueness and uncertainty. However, the relatively small amount of parameters that Bayesian probabilistic inversions can recover due to the computational limitations restricts its application to regional mantle models of hundreds of km extent (e.g. Afonso et al. 2016), 1-D global studies (e.g. Khan et al. 2008) or global studies based on tectonic regionalization schemes (Khan et al. 2011).

Surface wave phase velocity curves from waveform-inversion measurements
WINTERC-G is constrained, first, by phase-velocity dispersion curves derived from global, fundamental mode phase-velocity maps (see Section 4.1). Our seismic data set (Celli 2020;Celli et al. 2020b) consists of over 1.17 million Rayleigh-and 300 000 Love-wave fundamental mode phase-velocity curves, spanning a maximum total period range of 10-460. The maximum period range is available in some gridpoints, and in this study we only use the period range common to most gridpoints (25-330 and 33-330 s for Rayleigh and Love waves, respectively). We have cut-off some of the shortest and longest periods to assure uniform data coverage and to avoid comparatively noisier data. The curves are obtained by the Automated Multimode Inversion (AMI, Lebedev et al. 2005) of the waveforms of earthquake recordings, coming from 6242 stations and 25 496 events worldwide (Appendix B, Fig. B2 and Fig. B3). The original purpose of the waveform inversions is to constrain S-wave velocity models directly (Lebedev & van der Hilst 2008;Schaeffer & Lebedev 2015). Phase velocities of the modes are measured, as a by-product, at the end of the waveform inversion of each seismogram, within frequency bands where they are constrained by this particular waveform. They provide a concise summary of the structural information contained in seismic waveforms and are convenient to combine with data of other types in joint inversions and integrated modelling (e.g. Khan et al. 2009;Fullea et al. 2012;Golos et al. 2018).
We used all the earthquakes from the Harvard Centroid Moment Tensor Solutions catalogue Ekström et al. 2012) since 1994, also including some selected data from 1990, with source-receiver distances between 500 and 18 000 km and with magnitudes above a distance-dependent threshold (Schaeffer & Lebedev 2013). Most of the high-frequency measurements are obtained from short paths, whereas the lowest-frequency measurements come predominantly from long paths.
The strength of the multimode waveform inversion is that both the fundamental and higher mode (surface wave and body wave) contributions to the waveform are modelled. Accurate fundamentalmode measurements can thus be made even when the fundamental mode interferes with higher modes. The low misfit thresholds and the anticycle-skip precautions used by AMI (Lebedev et al. 2005) ensure that the errors of the measurement of the phase of the signal are negligible. The main sources of errors in the data set are the uncertainties in the hypocentre locations and origin times of the events and unaccounted-for fundamental-mode diffraction.

Surface heat flow
Surface heat flow (SHF) data comes from the International Heat flow commission (https://engineering.und.edu/research/global-heat -f low-database/data.html). SHF data are sparsely distributed, with relatively good coverage in North America and parts of Europe and poor data sampling in the rest of the continents and oceans. In order to interpolate the measurements onto our triangular model grid (see Section 4.1), we use an average of the measurements within a search radius of 200 km around each knot of the grid. We assign an uncertainty value to each gridpoint based on either the standard deviation of the values used in the averaging or 30% of the SHF value in case of a unique measurement within the 200 km radius. In gridpoints where no measurements are available within the averaging radius we assign a standard SHF value of 50 mW m -2 with a high uncertainty to downweight the SHF data in the inversion.

Elevation
The elevation data, that is topography (including bedrock and ice) and bathymetry, comes from the ETOPO2v2 2-min gridded Global Data Base (Smith & Sandwell 1994;Smith & Sandwell 1997). The input elevation uncertainty in the inversion is computed from the data dispersion over a 2 • radius circle which roughly corresponds with the lateral resolution of the seismic triangular grid in our model (see Section 4.1).

Gravity field data
Gravity field data used in this work includes geoid and free air gravity anomaly and the gravity-gradient data at the satellite height (255 km approx.). Free air gravity and geoid anomalies come from the XGM2016 spherical harmonic global earth model (Pail et al. 2018). The six independent gravity gradient tensor components used in this study were measured by the on-board gradiometer in ESA's GOCE satellite mission (Boumann et al. 2016). GOCE gravity gradients are referred to the Local North Oriented reference Frame (LNOF). LNOF is a local right-handed local Cartesian system with its X-axis pointing north, its Y-axis pointing west and its Z-axis pointing radially outwards, defined with respect to spherical coordinates.
In the model presented here we consider the mass distribution from the surface of the Earth down to 400 km depth. Consequently, the gravity field data must be filtered out to exclude the most likely contribution from deeper masses outside our model domain (depth >400 km). There is no unique way to separate the contribution of Earth masses at different depths (e.g. Bowin 2000). Harmonic degrees 2-3 of the geoid anomaly field (up to 70 % of the degree 2-10 signal) are thought to be contributed, to a large extent, by density variations at the core-mantle boundary (Bowin 2000) or, at least, deeper that 350 km (Deschamps et al. 2001). Therefore, we remove harmonic degrees 2-3 from the gravity field data in our inversions. In Appendix C, we analyse the effect of removing harmonic degrees 2-9 from the total gravity signal as proposed by other authors.

Surface wave dispersion maps
We first invert the fundamental mode phase velocities for phasevelocity maps at each period, independently. In order to obtain a dense sampling of the frequency range of the measurements, we inverted for maps at 98 different periods in the 20-360 s range, with a logarithmic sample-interval increase with period that was designed to balance the structural sensitivity at different periods . At each period, all phase-velocity measurements are inverted together in a large linear system (Paige & Saunders 1982;Lebedev & van der Hilst 2008) for the 2-D distribution of phase velocity and its 2ψ and 4ψ azimuthal anisotropy. Each map is parametrized on a triangular grid (Wang & Dahlen 1995) with an average 225 km interknot spacing. The average over all the phase velocity measurements at the period is used as the reference. For the purposes of this study, we invert for 2ψ and 4ψ azimuthal anisotropy to allow the model to explain anisotropic heterogeneities, so that they do not map into isotropic ones, but use only the isotropicaverage phase velocities in the joint inversion.
In order to minimize the impact of errors in the measurements on phase-velocity maps, we exploit the redundancy of the data set and select only the most mutually consistent data through an a posteriori outlier rejection procedure. At each period, we use the first-iteration phase-velocity map to compute synthetic data by multiplication of the sensitivity matrix and the model vector. We then compare the synthetic and real data and discard the measurements with the largest misfits. In this way we kept the 70% most mutually consistent part of the initial data set and computed the final phase-velocity maps with these 'denoised' data. The resulting set of 98 phase velocity maps allows us to extract phase velocity dispersion curves at each node of the triangular model grid: a data set of 12 252 dispersion curve pairs (Love and Rayleigh at every point), covering the surface of Earth with an approximately even node spacing of 225 km, on average. The relative uncertainties assumed for the phase velocities are 0.2% for periods 20-50 s, 0.2-0.5% for periods 50-200 s and 0.5% for periods >200 s. We note that the phase velocity maps used here as input data for our integrated inversion arise from a linearized inversion procedure subjected to regularization (see Appendix B).
Phase-velocity curves for both Rayleigh and Love waves at a single location depend on, and constrain, the vertical profiles of seismic velocities (Vp, Vs), density, radial anisotropy and seismic attenuation (Q), although phase velocities alone would not be sufficient to resolve trade-offs between all of these parameters. In this study we invert, in a first step, Rayleigh and Love dispersion curves, along with the surface heat flow and isostatic elevation, directly for temperature and composition in the mantle using petrological relationships and thermodynamic databases (see Section 5).

Integrated geophysical-petrological modelling
WINTERC-G global model is based on upon the integrated geophysical-petrological approach LitMod (Afonso et al. 2008;Fullea et al. 2009). The model is parametrized from the beginning in terms of the mantle temperature and composition. Integrated modelling of geophysical-petrological data sets (e.g. gravity, seismic, electromagnetic, heat flow, mantle xenolith composition, etc.) in either its forward or inverse version requires solving a number of equations describing the various physical (e.g. heat transport, Maxwell's equations or seismic wave propagation) and chemical (thermodynamic mineral phase equilibria) processes that link the thermochemical conditions within the Earth to those geophysicalpetrological observables. An essential ingredient of the integrated approach is to determine, within a self-consistent thermodynamic framework, the so called secondary physical parameters in the mantle (e.g. density, elastic moduli) as a function of the fundamental or primary parameters: pressure, temperature and bulk mineralogical composition (e.g. Connolly 2005;Khan et al. 2007;Afonso et al. 2008;Fullea et al. 2011;Kuskov et al. 2014;Khan 2016). In the following, we describe the main ingredients of our methodological approach.

The lithosphere-asthenosphere boundary (LAB)
The LAB is one of the fundamental discontinuities within the upper mantle and a commonly used term in integrated modelling. It has been characterized according to different geophysical and geochemical parameters: seismic velocities, seismic and electrical anisotropy, temperature, composition, electrical resistivity (e.g. Afonso et al. 2008;Eaton et al. 2009;Rychert & Shearer 2009;Yuan & Romanowicz 2010;Bartzsch et al. 2011;Fullea et al. 2011Fullea et al. , 2012Roux et al. 2011). From a thermal perspective, the LAB divides the cold and rigid outermost layer of the Earth (lithosphere) where heat conduction is dominant from the warmer and rheologically weaker underlying sublithospheric or asthenospheric mantle.
An issue in estimating the thickness of the lithosphere from seismic observations is to decide on how to define the LAB. As such, the lithosphere is a dynamic concept defined in terms of rheological variations (rigid outermost lid-plate) and thermal regime (conductive versus convective heat transport). The primary constraint for temperature in our inversion is the surface waveform data (mostly sensitive to Vs variations) and, to a lesser extent, surface heat flow and elevation. A common approach in geophysical studies is to consider the magnitude and vertical gradient of Vs as a proxy to the LAB (e.g. Rychert & Shearer 2009), although there is no straightforward connection between seismic observations, and temperature and viscosity. Furthermore, the choice of a threshold velocity, or velocity gradient, to define the LAB remains controversial (e.g. Bartzsch et al. 2011). Here we take the approach of defining the lithosphere thermally as a predefined isotherm. Our thermal definition is similar to that outlined in Priestley et al. (2006), also used to define the oceanic plate thermal model (Crosby et al. 2006), except in that we explicitly parametrize a thermal buffer or transitional layer of variable thickness defined by a superadiabatic gradient (see Sections 4.2.2 and 5.1).

The geotherm
We compute the steady-state lithospheric geotherm by solving the 1-D heat conduction transfer equation, considering a P-T-dependent thermal conductivity in the lithospheric mantle (Afonso et al. 2008;Fullea et al. 2009). The Dirichlet boundary conditions imposed are fixed temperature at the Earth's surface and at the base of the lithosphere. Hence, the base of the lithosphere in WINTERC-G is defined by a particular isotherm, in our case 1300 • C. We note that the inclusion of the heat conduction equation in the forward problem imposes a level of (physical) regularization in our inversion regarding the seismic velocity and density gradients with depth.
In order to account for non-steady-state thermal contributions in young oceans (age <70 Ma) we add an ad hoc transient term (T trans ) to the steady-state solution: where T trans is a transient temperature perturbation, κ is the thermal diffusivity (here assumed to be 0.8 × 10 −6 m 2 s -1 ), z is the depth, t is the age of the oceanic lithosphere and Z LAB is the steady state LAB depth. The expression in eq. (1) corresponds to the general solution for the 1-D conduction equation in the case of a slab of thickness Z LAB with the initial condition of T trans = T trans and the boundary conditions of fixed T trans = 0 • C at z = 0 and z = Z LAB (Carslaw & Jaeger 1959).
In the sublithospheric upper mantle, the low viscosity permits convective processes and vigorous mantle stirring, and hence the geotherm is close to the adiabatic temperature gradient, generally assumed to be in the range 0.4-0.6 K km -1 (e.g. Katsura et al. 2010). Between the lithosphere and sublithospheric mantle, we parametrize a 'transition' region (a buffer or boundary layer) of variable thickness (typically 5-50 km) and characterized by a superadiabatic gradient (i.e. heat transfer is controlled by both conduction and convection mechanisms, see Fullea et al. (2009Fullea et al. ( , 2012 for details). In our model we compute the sublithospheric mantle geotherm from the bottom of the transition/buffer zone (located at the base of the lithosphere) assuming a reference or prior adiabatic gradient. Departures of the geothermal gradient (thermal anomalies) from the reference adiabatic gradient are allowed as required by the data within some admissible bounds that depend on the tectonic setting (e.g. subduction zones, see Section 5.1). Synthetic surface heat flow values are computed from the temperature gradient at the surface of the model and the crustal thermal conductivity in each model column.

Thermodynamic framework and compositional space
The Earth's mantle compositional space is usually defined within the major oxide system CFMAS (CaO-FeO-MgO-Al 2 O 3 -SiO 2 ). The CFMAS compositional space, which accounts for >98 wt% of the Earth's mantle, can be treated as four independent variables (i.e. the weight amount of each of the major oxides linked by the restriction that the sum of all of them has to be 100 %) or as a single-parameter basalt-harzburgite mixture with either a chemically equilibrated bulk composition (e.g. Khan 2016) or as a mechanical mixture of the basaltic and harzburgitic end-members (Xu et al. 2008). The major oxides are accommodated in the four main upper mantle mineral phases: olivine, pyroxenes and an Al-bearing phase (plagioclase, spinel or garnet, depending on the pressure). Other secondary phases (e.g. zircon, phlogopite, amphibole, etc.) are present as well but representing generally <5 % of the total assemblage (e.g. Pearson et al. 2003). Under the assumption of thermodynamic equilibrium (T > 500 • C), stable mineral assemblages in the mantle are determined using a Gibbs free energy minimization approach as described by Connolly (2005Connolly ( , 2009. The physical properties of interest in this study (density, seismic velocities) are dependent upon the modal distribution of the main mineral phases and their individual compositions. To solve for the equilibrium mineralogy here we use the thermodynamically self-consistent data base of Stixrude & Lithgow-Bertelloni (2011). The bulk density and seismic velocities in the mantle are obtained from the constituent elastic moduli and density of each of the stable end-member minerals (e.g. Connolly & Kerrick 2002;Afonso et al. 2008).
From a chemical composition standpoint, the lithospheric mantle is thought to be primarily peridotitic, with olivine being the volumetrically dominant mineral phase (>40%). Mantle peridotite composition is generally characterized according to fertility, that is enrichment in the basaltic component that relates to melt extraction. A typical assumption is to consider the magnesium number, the ratio between MgO and FeO (Mg# = MgO/[MgO + FeO]), as a chemical indicator. The rationale for this is that: i) Mg# reflects processes of melt extraction and metasomatic refertilization in peridotites: Mg# increases quasi-linearly with melt extraction (e.g. Herzberg 2004), and ii) Mg# in the melt residue strongly controls the bulk rock physical properties. Both i) and ii) above hold valid even if incompatible elements (CaO and Al 2 O 3 ) are reintroduced via metasomatic refertilization. Therefore, Mg# can be regarded as an overall chemical indicator of melt extraction-depletion/metasomatismrefertilization in spite of the fact that FeO can be reintroduced in the rock due to metasomatism, thus altering the Mg# value (e.g. Griffin et al. 2009).
Al 2 O 3 has been shown to be a strong compositional indicator, mainly through its control on modal garnet (Afonso et al. 2013a(Afonso et al. , 2013b. In a similar way as Mg#, Al 2 O 3 content can be used as a proxy to mantle depletion/refertilization by melting/metasomatism. Global peridotite petrological databases covering continental and oceanic xenoliths as well as orogenic massifs and ophiolites show a significant statistical correlation between CaO-MgO and Al 2 O 3 oxides, thus reflecting the melt depletion pattern, regardless of the tectonic environment of the sample. In contrast, FeO and SiO 2 are not always correlated with either CaO or Al 2 O 3 , in virtue of the interplay between mineral chemistry, modes and bulk composition (Afonso et al. 2013a, and references therein).
Within WINTERC-G we define two chemically distinct mantle domains: the lithospheric mantle (with its base defined according to temperature, see section 4.2.2), and the sublithospheric mantle. The bulk composition in both the lithospheric and the sublithospheric mantle layers, parametrized based on a priori information from petrological data bases, is a model parameter inverted for in WINTERC-G (Section 5). In contrast to the mantle, where thermodynamic equilibrium is prevalent, vast portions of the continental crust are thermodynamically metastable. This is because equilibration processes are essentially temperature activated and the temperature in the crust is usually too low to trigger them. Consequently, the mineralogical assemblage of crustal rocks is mostly decoupled from the in situ pressure and temperature conditions, reflecting instead the conditions present at the moment of rock formation. Therefore, the crust is not defined based on thermodynamic equilibrium in our model: we adopt an ad hoc parametrization in terms of the relevant physical properties given the constraining geophysical data sets used in this work (i.e., seismic velocities, density, see Section 5.1).

Seismic velocities and dispersion curves
The seismic velocities derived from the thermodynamic equilibrium are anharmonic and isotropic. Therefore, in order to make a sensible comparison against the measured seismic data (surface wave dispersion curves, in our case), corrections for attenuation (anelasticity), partial melting and radial anisotropy must be applied. The synthetic phase velocity dispersion curves used in our global inversion are computed using a version of the MINEOS modes code (Masters, http://geodynamics.org/cig/software/mineos) adapted for the travelling wave decomposition, appropriate for surface waves (Nolet 2008). Below 660 km depth, the reference model AK135 is assumed for all the relevant parameters (Kennett et al. 1995). In the mantle transition zone (410 km < depth < 660 km) only the density, Vp and attenuation parameters are taken from AK135, with the S-wave velocities at the 410 and 660 km discontinuities being inversion variables inverted for (see Section 5.1). In order to minimize the impact of uncertainties in the attenuation (Q) profile on the accuracy of the Vs-phase velocity relationship, our seismic velocity profiles have a reference period of 50 s, approximately in the middle (in a logarithmic sense) of the frequency range used in this study (Liu et al. 1976;Lebedev & van der Hilst 2008).

Anelasticity
Anelasticity effects in observed seismic velocities are of primary importance, particularly at high temperatures (e.g. Karato & Wu 1993;Sobolev et al. 1996;Afonso et al. 2010). In this work, we treat anelasticity as a posteriori, pressure-temperature-dependent correction to the anharmonic velocities computed from thermodynamics (e.g. Minster & Anderson 1981;Karato 1993;Afonso et al. 2005;Fullea et al. 2012). The anelastic correction applied to highfrequency anharmonic velocities is based on the expressions (e.g. Minster & Anderson 1981;Karato & Wu 1993): where Q P = (9/4)Q S is assumed (this is equivalent to assuming an infinite quality factor for the bulk modulus, i.e. Q K −1 ∼ 0). V P0 and V S0 are the unrelaxed high frequency anharmonic velocities at a given temperature (T), pressure (P) and bulk composition, A = 750 μm α s −α for the empirically determined exponent α = 0.26 (Jackson et al. 2002), E = 424 kJ mol -1 , R is the universal gas constant, d is the grain size, V * the activation volume, and T 0 (50 s in our case, see Section 4.2.4) the reference oscillation period (Jackson et al. 2002;Faul & Jackson 2005). For further discussion on the effects of other parameters (i.e. grain size, activation volume) on the anelasticity correction the interested reader is referred to laboratory (e.g. Faul & Jackson 2005;Jackson & Faul 2010) and modelling (Fullea et al. 2012) studies. Here we use the same attenuation parameters (d, V * and T 0 ) as described in Fullea et al. (2012).

Melt
Melt has a significant impact on both seismic velocities and attenuation. Recent experimental studies indicate a decrease in the Vs and Q S values of up to 50% and an order of magnitude, respectively, for a melt fraction of 4% (Chantel et al. 2016). This demonstrates the importance of taking partial melt into account, even though the melt fraction is typically much lower in the mantle, with the excess melt escaping upwards. In this paper we consider the mantle-peridotitic nominally anhydrous solidus and liquidus from Katz et al. (2003, and references therein). The effects of melt on V S and V P are computed according to the experimental model of Chantel et al. (2016). In addition, laboratory studies suggest a progressive, Vs-decreasing effect of anelasticity below the solidus temperature (e.g. Yamauchi & Takei 2016;Takei 2017). Small changes in the amount of melt have a large effect on seismic velocities and attenuation. This is a classic example of ill-posedness in geophysical inversion. To mitigate ill-posedness, we include a near-solidus, pre-melt contribution to linearly smooth out the effect of melt on seismic velocities over a temperature range in the neighbourhood of the solidus (see Appendix D).

Radial anisotropy
Radial anisotropy is defined as the difference between the vertically and horizontally polarized V S . Ignoring it when modelling thermochemical lithospheric structure has non negligible effects (e.g. Fullea et al. 2012). In order to determine radial anisotropy, both Rayleigh and Love dispersion curves have to be inverted simultaneously (e.g. Gung et al. 2003;Lebedev et al. 2006;Montagner 2007;Fullea et al. 2012;Vozar et al. 2014;Ravenna et al. 2018). Radial anisotropy, R an , is here defined as:

153
where V SV and V SH are the velocities of the vertically and horizontally polarized, attenuated S waves, respectively, and V S is the Voigt isotropic average. V P is assumed to be isotropic.

Isostatic elevation
The predicted surface elevation is computed in WINTERC-G according to local isostasy by integrating the crustal and lithospheric mantle densities. We, therefore, do not account for flexural effects related to the elastic thickness of the lithosphere. The density field is vertically discretized in three crustal layers (plus water and ice if applicable) and a vertical grid with interknot spacing of 2 km in the mantle. The density integration is only over the lithosphere depth range (crust and lithospheric mantle plus sea water and ice), and therefore we do not explicitly include sublithospheric loads as part of the isostatic equilibrium. Isostasy requires that the pressure at the base of every vertical lithospheric column be the same when integrated down to a certain depth, known as the compensation level. Although such condition is never completely met within the Earth, where convection currents occur, the relatively low viscosity of the shallow sublithospheric mantle allows lateral pressure gradients to be relaxed by flow over short timescales. In contrast, lateral pressure (and density) gradients associated with either temperature, topography or compositional variations can exist within the lithosphere over much longer timescales. This creates a mechanical and temporal decoupling between the long-term rigid lithosphere and the much less viscous sublithospheric mantle. According to local lithospheric isostasy, and taking the compensation level at the top of the transition zone (z bot ), the pressure at the compensation level should be constant and equal to some reference column pressure, P ref : We consider a fictitious reference column with average and constant density ρ bot and thickness z bot -0 , therefore P ref = (z bot -0 )g ρ bot , where g is the average gravitational attraction over the model column. For an arbitrary model column with observed surface elevation, E obs , the pressure at the compensation level P(E obs ) will be equal to P ref only in the case of local isostasy, that is, when the integrated lithospheric density is such that the predicted model elevation is E calc = E obs and hence P bot (E obs ) = (z bot -0 ) g ρ bot . When P bot (E obs ) differs from P ref we can estimate the predicted column elevation with respect to the reference column and the observed elevation: (E calc -E obs )g ρ bot . = P ref -P bot (E obs ). Therefore: 0 is a global calibration parameter depending on the reference column and the thermodynamic data base adopted (see Section 4.2.3). Here we consider three reference lithospheric-upper mantle columns to estimate a representative 0 value globally: (i) a standard Mid Oceanic Ridge (MOR), (ii) a thick and chemically refractory craton and (iii) a 100-km-thick Phanerozoic continental lithosphere with no topography and a 30-km-thick crust (see Afonso et al. 2008 for details on the thermochemical parametrization of the MOR and cratonic reference columns).
Using eq. (4) with appropriate parameters from the reference column, 0 is obtained as where ρ RCOL and E RCOL are, respectively, the average density and average elevation of the reference column, and ρ w is the density of sea water if E RCOL < 0 (ρ w = 0 if E RCOL > 0). Note that ρ RCOL and ρ bot depend on the assumed geotherm, as well as on the particular thermodynamic data base adopted (see Section 4.2.3).

Gravity field
In order to compute the synthetic gravity field in WINTERC-G, we consider the Earth to be composed of a number of mass-density layers that are separated by non-intersecting, smooth and closed internal discontinuities. The three top layers have uniform mass density distributions corresponding to the ocean water (density = 1030 kg m -3 ), ice (density = 950 kg m -3 ) and crust (an average crustal density). The layers in the mantle are separated by discontinuities at 20, 36,56,80,110,150,200,260,330 and 400 km depth, respectively, and are characterized by a laterally and radially varying mass density within each layer. The top of the first mantle layer is defined by the Moho discontinuity. The gravitational potential generated by such a layered model is given by the sum of the gravitational potential contributions of the individual layers in virtue of the principle of superposition. Hence, it suffices to derive the formula to compute the gravitational potential generated by a 3-D mass-density layer with non-spherical boundaries. We represent both the mass density of each mass layer and its two non-spherical boundaries in terms of spherical harmonics so that the external gravitational potential outside the Earth can be expressed in spherical harmonics. In order to use the fast algorithm for spherical harmonic analysis and synthesis derived by Martinec (1991) the field under consideration must be given in a regular grid in geographical coordinates rather than in the equidistant triangular grid in which the surface wave data are initially given (Section 4.1). The theoretical details of the gravity field spherical harmonic calculation are given in Appendix A.

W I N T E RC -G : I N V E R S I O N S C H E M E
The primary goal of this work is to produce a comprehensive, highresolution global model of temperature and composition in the lithosphere and underlying mantle by combining terrestrial data (seismic surface waves, surface heat flow and elevation) and satellite gravity data in a consistent manner. The complexity and the size (nearly 400 000 variables) of the task at hand requires a careful analysis of the various constraining data sensitivities in order to simplify the multiple forward problems involved to the maximum possible level. The inversion scheme applied to build WINTERC-G consists of two major steps, described in Sections 5.1 and 5.3. This division is based upon the dimensionality and sensitivity to thermal and compositional heterogeneities of the constraining data sets. In step 1, we invert data sets mostly sensitive to temperature variations that can be modelled in 1-D. In step 2, using the prior temperature field from step 1, we invert the 3-D gravity field data, focusing on the crust and the mantle compositional field.

Step 1: Integrated geophysical inversion of surface-wave, surface heat flow and elevation data
In the first step, we invert the fundamental mode Rayleigh (φ Ray i ) and Love (φ Love j ) surface wave dispersion curves. Our data set comprises 12 252 dispersion curve pairs over a triangular grid with an average 225 km interknot spacing (see Section 4.1). At every gridpoint, we also invert for surface heat flow (SHF) and elevation (E), weighted by their respective uncertainties (see Sections 3.2 154 J. Fullea et al. 3 Density(kg/m ) Vs(km/s) Al O (wt%)  3). In this step the inversion is 1-D, attending to the sensitivity of the constraining data sets and, hence, there is no explicit lateral regularization. The 1-D, laterally independent nature of the physical forward problems involved in this modelling step allows for a natural parallelization of the inversion distributed on different lithospheric columns that can be treated separately. The data vector, d obs , at every triangular grid node is given by: where N Ray and N Love are the number of periods for Rayleigh and Love waves for each dispersion curve, respectively. The maximum number of periods considered here is 73. In some model nodes this number is lower, particularly for Love waves. This is the case where the lack of seismic coverage does not allow the surface wave inversion to recover the shortest periods in the corresponding fundamental mode phase-velocity maps (Section 4.1 and Appendix B). The parameter space at this stage includes crust and mantle variables within the model vector, m: where ρ c i and V s c i are the crustal densities and S-wave velocities for a three-layered crust, z c i is the depth of the base of each crustal layer; z L AB is the depth of the LAB as defined in Sections 4.2.1-4.2.2, T sublit1 , T sublit2 and T bot are the sublithospheric temperatures This can be regarded as a petrological regularization of the compositional parameter space. We assume the amount of FeO oxide constant in both the lithosphere (8.1 wt%) and sublithosphere (8.05 wt%) (Table 1). SiO 2 is simply 100 wt% minus the sum of the other four oxides, determined as described above.
The sublithospheric temperatures are vertically regularized, depending on the lithosphere and thermal buffer thicknesses, in the following way. The temperature at the base of the thermal buffer (Z buff = Z LAB + Z LAB ) is defined as T buff = T a + 100 • C (Fig. 1). We now define prior/reference values for T sublit1 , T sublit2 and T bot for each lithospheric column based on a reference adiabatic gradient (γ ref = 0.5 K km -1 in our case, see Table 1): The sublithospheric temperatures are allowed to deviate from their reference values by up to 100 K, except in areas of strong sublithospheric seismic anomalies of likely thermal origin (subduction zones and hotspots). In the latter case, departures of up to 500 K from the corresponding sublithospheric reference temperature are allowed. The choice of the sublithospheric temperature regularization is based on data misfits within the inversion process.
The Moho depth is a free parameter, allowed to vary from the reference/prior value (CRUST1.0, Laske et al. 2013) within the uncertainties estimated by Szwillus et al. (2019) based on geostatistical interpolation of seismic data from the U.S. Geological Survey [USGS] global Seismic Catalog [GSC] database (Mooney 2015). Since the uncertainties estimated by Szwillus et al. (2019) are rather large in areas with no seismic coverage (10-15 km) and our step 1 inversion scheme is explicitly 1-D (no lateral regularization), we set error floors of 3 km for oceanic areas and 7 km for continental areas (including continental margins with bathymetry shallower than 1 km) in order to homogenize the inversion. Crustal P-wave velocities are computed from the V S values from the inversion and the V P /V S ratio from the reference model. The complete list of fixed physical parameters in the inversion is given in Table 1.
In this step, we apply a nonlinear, least square Lavenberg-Marquardt inversion scheme where the misfit function is defined as: where g(m) is the nonlinear forward operator that maps the model vector (m) into the observational space (d obs ). In our case, g(m) corresponds to the integrated geophysical-petrological modelling approach described in Section 4.2. The first term of S(m) in eq. (10) is a quadratic form corresponding to the square of the weighted data misfit 2-norm including data uncertainties. C −1 D and C −1 M represent the inverse observational and model covariance matrices, respectively, here assumed to be diagonal. The diagonal elements of C D are defined by the (squared) data uncertainties plus the modelization uncertainties, and are in practice used to weight the contribution of the different data sets in d obs .
The second term of S(m) in eq. (10) regularizes the inversion problem. C M is a diagonal matrix of squared weighting factors for the respective elements of the reference (and regularizing) model vector, m ref For the crustal inversion parameters (velocity, density and thickness), radial anisotropy, lithospheric thickness and mantle composition the regularization term is explicit. The geotherm represents an additional regularization term: thermal steady state in the lithosphere (heat conduction domain) and adiabatic gradient(s) in the sublithosphere (heat convection domain). Furthermore, the linearized pre-melt anelastic parametrization in the vicinity of the solidus described in Appendix D adds further smoothing to the solution. The regularization term prevents the inversion from falling into nonphysical solutions outside of the domain of the joint geophysical-petrological forward problem solver.
In general, finding an optimal balance between data fit and regularization is a complex, problem-dependent task that has to be adapted to the particular inversion problem one is dealing with. Our parameter space in this inversion step is rather heterogeneous (composition, temperature, thickness, seismic velocities, and density), pointing to a common joint-inversion problem of selecting the most suitable C M coefficients in the regularization term for the differing data types. To circumvent this problem, we (i) adopt a bimodal crustal reference model with oceanic versus continental crust and (ii) invert, also, for C M variances by running several Lavenberg-Marquardt inversions for each gridpoint (during these second level inversions, we also adapt the reference model based on the isostatic elevation misfit). We adopt a trade-off curve strategy where we begin from a high level of homogeneous regularization (i.e. constant and high-value C M coefficients) and progressively reduce it for the different parameter types guided by the value of m-m ref after each inversion. For instance, if for a given inversion parameter after the inversion the associated m-m ref element increases considerably (especially if outside physically meaningful bounds) as a consequence of reducing its parameter variance in C M , then, in the next iteration, the regularization coefficient for that term is rescaled and kept constant thereafter. The final set of C M coefficients for each model column is chosen so that the preferred solution corresponds to the lowest regularization term compatible with physically meaningful model parameters.

Step 1: sensitivity analysis
We select a number of lithospheric columns representative of the global tectonic variability [i.e. thick craton, rejuvenated Phanerozoic lithosphere, old ocean, mid oceanic ridge (MOR), hotspot] to analyse the uncertainty/sensitivity in our model inversion. Considering Gaussian statistics for the data and model spaces and local linearization, the model posterior covariance matrix, C M , can be approximated as: where J is the Jacobian matrix of the partial derivatives of data elements with respect to the model parameters, evaluated at the solution model point, m post (Tarantola 2005): The comparison of the posterior (post-inversion) and prior (preinversion) model variance and covariances allow us to: (i) estimate how consistently the model parameters have been recovered by the input data, (ii) estimate parameter correlations and (iii) assess the influence of regularization imposed in the misfit function. If the elements of C M are not reduced in relation to C M. then the corresponding model parameters are being controlled by the reference model through regularization, and the inversion has not significantly refined our prior knowledge based on input data. The corresponding model standard deviations, given by the square root of the diagonal elements of C M , are representative of the inversion uncertainties, subject to the choice of the reference model uncertainties in C M .
A useful entity that can be derived from C M is the correlation matrix, Corr, with elements (Tarantola 2005): Model element pairs with indices i and j and Corr ij values close to 1 or −1 are strongly correlated or anticorrelated, respectively. In such cases, the input data cannot constrain the two parameters independently but, in the limit of Corr ij = 1, only a linear combination of them.
Analysing the variance change from C M to C M , it is clear that the largest posterior variance reductions correspond to the LAB and Moho depths, which we set with relatively wide prior model standard deviations (square root of variances, Fig. 2a). The LAB depth,  in particular, is by far the best resolved parameter in this inversion step, since it indirectly controls (via temperature distribution within the lithosphere and adiabatic sublithopheric gradients) the values of many parameters throughout the whole inversion column. The posterior standard deviation for the LAB is generally <5 km except for most cratonic areas, and some active regions (Tibet, Andes).
In the case of the Moho depth the largest uncertainties (>4 km) are located in the South American cratons, the East African Rift, Greenland and Southeast Asia (Fig. 3). The thickness of the thermal buffer beneath the LAB is only weakly constrained by our data sets, except in some areas of thin lithosphere (<80 km) where the LAB is located near the peridotite nominally anhydrous solidus. In cratonic settings (generally z LAB > 150 km), the inversion is sensitive to the lithospheric mantle composition, whereas sublithospheric compositions are not very well resolved, i.e. the posterior variance reduction is small (Fig. 2). The sublithospheric temperatures are better constrained at shallow (near the LAB) rather than deep levels (top of the transition zone); the posterior standard deviations for the temperature in the shallowest sublithospheric node are <20 and <30 K for 72 and 87% of the model columns, respectively, whereas for the temperatures at 400 km depth the standard deviations are <30 and <50K for 63 and 76% of the model columns, respectively. By contrast, the uncertainties for the sublithospheric temperatures are larger, in comparison, beneath some cratons (South America and South Africa, Fig. 3) and craton border regions (e.g. around the North American Craton or north of the East Antarctic Craton, Fig. 3). For thin lithospheres, the sublithospheric layer is thick and has more influence on the predicted surface wave velocities. Similarly, sensitivity to the lithospheric mantle composition is greater where the lithosphere is thick. The sensitivity to crustal parameters (layer thickness, velocities and densities) is, obviously, greater in the relatively thick continental crust compared to the thin oceanic crust. Radial anisotropy and the transition zone velocities (especially at 410 km) are generally well constrained by the joint inversion of both Rayleigh and Love surface waves (Fig. 2). The constraints on the mantle composition at this inversion step are relatively weak. Mantle composition significantly affects density and, therefore, the isostatic mass balance at the lithospheric scale and the predicted surface elevation, but has only a mild effect on anharmonic seismic velocities, which are controlled primarily by temperature variations (e.g. Goes & van der Lee 2002;Schutt & Lesher 2006). Furthermore, the effect of density on surface wave phase velocities is not very large. As a consequence, the low sensitivity to the mantle composition in this inversion step arises from the dominant weight of the surface-wave seismic data, extending over a range of periods and sensitive to seismic velocity variations, compared to the surface elevation data (a single measurement).
The model posterior correlation matrices show strong anticorrelation between the radial anisotropy at different depths suggesting trade-offs and that the inversion is not able to independently constrain the anisotropy value in all the vertical nodes in our discretization (Fig. 2c). Similarly, for columns where the input data constrain the sublithospheric geotherm (oceans, relatively thin continental lithosphere), there are correlations among the vertical temperature nodes, between the shallower nodes and the LAB depth, and between the thermal sublithospheric buffer and the LAB and Moho depths. The Moho depth trades-off with the crustal velocities, especially with the lower crustal velocity in the case of continental areas. The mantle composition anti-correlates with the crustal densities and LAB depth, and tends to correlate with the crustal thickness and seismic velocities. For the sublithospheric composition, correlations are weaker and negative with respect to the crustal and lithospheric thicknesses. Interestingly, the correlations among the mantle composition and sublithospheric temperatures are very weak, in general. The lithospheric and crustal thicknesses are correlated and the strength of the correlation is, for the most part, greater in oceanic than in continental areas (Fig. 2c).
A complementary (albeit computationally expensive) approach is to estimate the model parameter uncertainties by constructing projections of the multi-dimensional parameter space, for instance taking two parameters (e.g. m1 and m2), as introduced by Bartzsch et al. (2011) and Lebedev et al. (2013). If we consider a grid of values for m1 and m2, we can run inversions for every pair of fixed values m1, m2, with the rest of the model parameters free to vary, and evaluate the misfit function. The resulting 2D plot will represent the projection of the total model space onto the m1-m2 plane and illustrate the resolution and possible trade-offs between m1 and m2 parameters. For example, m1 and m2 could correspond to the lithospheric and crustal thicknesses, respectively (Fig. 4). We have run tests for the columns representative of different tectonic settings mentioned above (thick craton, rejuvenated Phanerozoic lithosphere, old ocean, MOR, hotspot). The results confirm the strong sensitivity to the lithospheric (and crustal, to a lesser extent) thickness and the comparatively weak sensitivity to sublithospheric temperatures (and thermal buffer), except for areas characterized by strong seismic anomalies (e.g. hotspots). The mantle composition is partially constrained (particularly in thick cratonic areas), whereas there is only weak sensitivity to sublithospheric composition.

Step 2: Integrated geophysical inversion of satellite gravity field data
In the second step, we invert gravity field satellite data using the outcome of step 1 inversion as prior information. Gravity data are sensitive to both crust and mantle lateral density variations. In our modelling approach, the mantle density field is computed based on the thermal and compositional fields according to a thermodynamic equilibrium formalism (see Section 4.2.3). As illustrated by the sensitivity analysis carried out in step 1 (Section 5.2), surface-wave seismic data are sensitive primarily to the mantle temperature distribution. The lithospheric thickness parameter, in particular, controls synthetic, surface wave phase velocities, surface heat flow and, to some degree, the isostatic elevation. However, the data sensitivity to mantle chemical heterogeneities in the step 1 inversion is lower, especially in the sublithospheric mantle.
Our inversion strategy in step 2 builds on the well-known fact that mantle composition has a relatively minor impact on mantle seismic velocities but a considerable one on the density (e.g. Priestley et al. 2006;Cammarano et al. 2011). In Fig. 5, we show the density and seismic velocity variations with temperature and composition,computed based on thermodynamic equilibrium (Section 4.2.3) according to the compositional parametrization described in section 5.1. Assuming linear behaviour (i.e. excluding Al 2 O 3 values <2 wt% in the compositional field) we can estimate the required temperature (T) and composition (C) variations to obtain a given density variation (say, 15 kg m -3 ): T = ρ(∂ρ/∂ T ) −1 and C = ρ(∂ρ/∂C) −1 (say, 160 K and 1 wt%, respectively). Then, we can estimate what would be the impact on the predicted seismic velocities for such temperature and compositional variations: say, 1.2% and 0.1-0.6% in Vs, respectively). Therefore, for the conditions in our example we can see that, for the same density variation, the associated thermally induced variation in Vs is about 2-12 times larger than the compositional induced Vs variation. The former implies that the mantle density modifications required in order to match gravity satellite data (disregarding crustal density variations), could be obtained either by: (i) modifying considerably the initial thermal field from step 1 (therefore increasing the surface-wave tomography data misfits); or (ii) via a smaller, in comparison, change of the compositional field (with a small impact on the associated seismic velocities).
A natural way of parametrizing the inversion of gravity field data in step 2 is thus to use the average crustal density and the temperature distributions from step 1 as prior information and invert for the mantle composition and crustal densities. In doing so we are changing the fit to the seismic and elevation data achieved in step 1 but considerably less so than if we were to invert the gravity data for both composition/crust density and temperature or temperature alone. In Section 6.2.1 we investigate further the seismic data misfit variation from step 1 to step 2, to validate this inversion strategy.
To invert the gravity data, we adopt a conceptually simple, 1-D driven, gradient-search, iterative inversion scheme where the forward gravity problem is fully solved in 3-D following a rigorous spherical harmonic formalism (Section 4.3 and Appendix A). The spherical harmonic algorithm used here requires a regular grid in geographical coordinates. Therefore, for the gravity field computation we consider a laterally regular geographical coordinates grid with interknot spacing of 0.5 • . Density values in the geographical grid are interpolated using the three nearest neighbouring values from the triangular grid weighted by the inverse of its distance to a given geographical point in the regular grid. The data vector in the regular grid is: where N obs ( = 259 200) is the number of points in the laterally regular grid in geographical coordinates with interknot spacing of 0.5 • , N is the geoid anomaly, F A is the free air gravity anomaly and g ij is the gravity-gradient tensor component corresponding to: z, vertical, N, North, and E, East directions, defined with respect to spherical coordinates. The parameter space in step 2 is defined by a model vector, m: where ρ c is the average crustal density and C Lit , C Sublit are the bulk amount of Al 2 O 3 in the lithosphere and sublithosphere, respectively, where the remaining major oxides are parametrized as described in Section 5.1. We use the output crustal density from step 1 as the    Vs (km/s) initial value in step 2 inversion. Mantle densities are derived based on the output temperature field from step 1 (kept fixed) and the bulk mantle composition, C Lit , C Sublit , as described in Section 4.2.3. The synthetic gravity field, d syn , is determined by solving the forward gravity field problem: where g 3 D is the 3-D forward gravity operator in spherical harmonics (Appendix A). We wish to find an iterative update in the model vector, m+δm, that minimizes the misfit S (m) = d s yn − d obs in a least squares sense. We use: where S(m) min and S(m) max define the range of the minimum and maximum global misfit in each inversion iteration and J M are coefficients inversely proportional to the absolute value of the local 1-D Jacobian (∂ g 3D /∂m). In each iteration, we update the J M coefficients reducing their value when S(m) changes sign, to avoid oscillations in the inversion. There are different possible ways of linking the model vector update δm = [δρ c k , δC Lit,k , δC Sublit,k ] t and S(m). The model update for the mantle composition, δC Lit, k , δC Sublit,k , is computed based on the geoid anomaly misfits (d syn = N). The choice of the geoid anomaly to constrain mantle structure is natural, since the geoid is sensitive to deeper mass anomalies compared to gravity or gravity gradient signals (e.g., Bowin 2000). In the case of the crustal density update, δρ c k , we carried out several tests (summarized in Table 3) and concluded that the overall lowest gravity field misfit (geoid, gravity and all six gravity gradient components) is achieved when δρ c k are obtained from the vertical-vertical gravity gradient component at satellite height in each iteration (d syn = g zz k ). The use of the vertical-horizontal and horizontal-horizontal gravity gradient components to derive δρ c k is not suited for our point-wise iterative inversion since their sensitivity kernels show maximum amplitudes shifted from the mass anomaly point (up to 1.1 • and 1.8 • , respectively, see Martinec 2014).
In step 2 the inversion is laterally regularized both explicitly and implicitly. For the explicit regularization we apply Gaussian filters with widths of 800, 1300 and 2000 km to the fixed sublithospheric temperatures taken from step 1, T sublit1 , T sublit2 and T bot , respectively (Section 5.1). We also apply an explicit regularization to the prior mantle compositional fields C lit , C sublit from step 1 inversion: a Gaussian filter of 800 km width. The implicit regularization arises from the limits we impose on the mantle composition range during the step 2 inversion: 6 wt% > C lit >1 wt%, 5.2 wt% >C sublit >4 wt%. In this way we avoid extreme compositional values linked to high frequency, high amplitude gravity signals that generally reflect short wavelength crustal density structures (e.g., large igneous provinces, subduction trenches) rather than mantle features. By limiting the compositional range in the mantle we prevent such crustal signals to be leaked into the mantle structure, thus implicitly regularizing (smoothing) the gravity data inversion in step 2.

Step 2: sensitivity analysis
The posterior covariance and correlation matrices, C M and Corr ij , for the step 2 gravity field data inversion can be derived in a way similar to that for the step 1. In step 2, the prior model uncertainties for the crustal density and the mantle composition parameters are given by the square root of the corresponding posterior variances from step 1 inversion. Note that in step 2 our inversion variables are the average crustal density and the whole upper mantle composition. Therefore, we determine the average crustal and mantle composition prior uncertainties by averaging over the three crustal layers and over the lithospheric and sublithospheric mantle layers, respectively. The full-resolution gravity-inversion covariance matrix is computationally very expensive, as it would require calculating and inverting a large Jacobian matrix. In order to make the problem tractable, we downscale the model space lateral resolution from 0.5 • to 20 • yet keeping the observations at the full resolution. Therefore, the Jacobian matrix: has dimensions of i = 1, N obs , where N obs ( = 259 200) is the number of points in the laterally regular full grid in geographical coordinates with interknot spacing of 0.5 • , and of j = 1, N var , where N var ( = 162 × 2) is the number of points in the laterally regular grid in geographical coordinates with coarser interknot spacing of 20 • , multiplied by 2 variables per column: crustal density and mantle composition. Fig. 6 displays the square roots of the diagonal elements of the posterior covariance matrices for the crustal density and mantle composition after inversion step 2. The posterior standard deviation (proxy for uncertainty) for the crustal density does not change much over oceans. This is to be expected, as the crustal thickness in oceans is fairly small and, hence, relatively large changes in crustal density produce only a modest gravity effect. In other words, gravity data are not very sensitive to odensity variations in ceanic crust. By contrast, in continental areas the crust is thicker and we observe reductions in the posterior standard deviation for the crustal density of 40-70%, especially in deep Moho areas (thickest crust). In the case of the posterior standard deviation for the mantle composition, the maximum sensitivity is achieved in oceanic domains where we get reductions of 50-60% after step 2 inversion. In continental areas the posterior standard deviation reduction is less, 20-40%, in particular in thick cratonic areas. The correlation values among parameters of the same type (e.g. crustal density or mantle composition from different gridpoints) are small, generally <0.05 in absolute value. However, there is a significant anticorrelation between the crustal density and the mantle composition, especially in continental areas, where the absolute values for correlation reach 0.8-0.6.

6.1
Step 1-surface waves, heat flow and elevation

Data misfit
The relative surface-wave misfits (calculated minus observed Rayleigh-and Love-wave phase velocities, divided by the observed phase velocities) are, for the most part, low, ranging from standard deviation values of 0.26-0.13% in the case of Rayleigh waves, and 0.47-0.16 % for Love waves (Figs 7 and 12). The poorer misfits for Love waves are mainly due to the larger errors in the Love phasevelocity curves, because of the comparatively worse and sparser data coverage producing the curves (fewer periods sampled by fewer seismic paths) compared to Rayleigh waves, particularly at short periods (<40 s).
In terms of the spatial pattern, at short periods (<40 s) the largest misfits for Rayleigh waves (>0.5%) are located in the central Pacific Ocean, south of the Hawaii hotspot, whereas in the case of Love waves the most outstanding mismatch (>1% misfit) is located north of the Louisville Pacific hotspot. Such anomalously high values disappear from the misfits at longer periods and do not have any apparent effect in our inverted model. Other areas exhibiting moderately high misfits at all periods include the Scotia Plate, northwestern Africa, the East African Ridge, the eastern Mediterranean Sea, the Java-Sumatra subduction zone and the East Pacific Rise (Fig. 7). The broad general features of anisotropy in our model are consistent with published 1-D reference and 3-D models (e.g. Dziewonski & Anderson 1981;Visser et al. 2008a;Chang et al. 2015), with a few % positive (V SH > V SV ) anisotropy, on average, in the top 200 km, decreasing with depth. Fine features of the distribution of anisotropy are beyond the scope of this paper and will be discussed in detail in future publications.
Synthetic surface elevation values are computed in WINTERC-G under the hypothesis of local isostatic balance. Therefore, elevation data misfits can be regarded as residual isostatic topography. Local isostasy is generally valid for relatively long topographic wavelengths but may not be apropriate in some tectonic settings. For that reason, surface elevation input data is assigned a relatively low weight in our inversion (step 1), compared to the surface-wave data. A negative isostatic residual (observed minus predicted elevation) in our model means either that our average lithospheric density is too low or that a negative dynamic contribution is required to explain the actual topography (or a combination of the two possible causes), and vice versa for positive isostatic residual elevation. After step 1 inversion, continental areas in WINTERC-G are, in general, isostatically compensated (±100 m) except for Antarctica, many active margins (e.g. Pacific North American margin, Alpine-Himalayan belt), and some localized, small-scale, scattered areas. In the oceans, the most conspicuous positive isostatic residuals correlate with large igneous provinces (LIP) and ocean islands. Negative isostatic residuals are present at some MOR segments, subduction zones and other locations (Louisville Pacific hotspot, mid Atlantic, the Philippine Sea Plate and Southern Ocean).

Thermal structure
The temperature distribution according to our parametrization is described by the crustal thickness, the depth of the thermal LAB (defined by the 1300 • C isotherm), the thickness of the thermal buffer, and the sublithospheric geothermal gradient (Sections 4.2.1, 4.2.2 and 5.1). The posterior Moho depth is allowed to depart from the prior value (from the CRUST1.0 model, Laske et al. 2013) within reasonable uncertainty bounds (see Section 5.1). The largest crustal thickness changes in WINTERC-G with respect to CRUST1.0 correspond to poorly sampled areas with sparse controlled source seismic coverage. For instance, we obtain a thinner crust in west Antarctica, central South America and central-eastern Africa, and a thicker crust in east Antarctica, eastern South America and western Africa (Fig. 8).
As the sensitivity analysis demonstrates (see Section 5.2), the LAB depth is the best resolved parameter in our model (Section 4.2.1). The LAB depth map derived from the inversion of surfacewave, elevation and heat-flow data in step 1 (Fig. 8) is characterized by deep LAB (>150 km), and, hence, cold lithosphere beneath most cratons and also some active areas (Andes, Persian Gulf). Among the Archean terranes the thickest (>260 km) and coldest lithosphere is found in the North American, Baltic and Western Australian cratons. Other cratons show moderately thick lithosphere (>180 km): Siberian, Tarim, Yangtze, Kalahari, Congo, West African, Amazonian and Greenland. Finally, a third group of cratons presents a comparatively thinner lithosphere (Indian, East Antarctica, Sao Paulo cratons) or even a lithospheric thinning (Sino-Korean, Aldan and Tanzanian cratons). Lithospheric thinning is observed in the Pacific North American and Chilean Antarctic margins, Atlas Mountains (northwest Africa), Altai-Hangay Mountains (northwest Mongolia), Oman Sea, Borneo and eastern Asian margin. In the oceans, the lithospheric thickness generally increases with age, from low values (60-70 km) around the MORs to 100-130 km in the oldest oceanic basins (Fig. 8). The thickness of the thermal buffer connecting the conducting lithosphere and the convective sublithosphere is poorly resolved, in general. Our results suggest a thickness of 5-15 km in oceanic and thinned continental areas. In the rest of the continental lithosphere, the thickness of the thermal buffer is 15-35 km, with values of >35 km generally found in the lithosphere surrounding thick cratons (steep LAB gradients) and in subduction zones.
According to WINTERC-G, in the sublithosphere the mantle remains in general cold beneath cratons and some adjacent areas (e.g. eastern American, south western African, south Australian margins). At depths >200-300 km, most cratonic cores are still 60-120 K colder than the ambient mantle, with the temperature contrast decreasing with depth. High temperature sublithospheric areas in our model tend to coincide with known hotspots. The largest thermal anomalies are located southeast of the Louisville hotspot, beneath the Tasman Sea, Iceland hotspot, Azores and Bouvet triple junctions, North America Pacific margin, and the East African Rift (Fig. 9). The high-temperature anomalies with respect to ambient mantle beneath hotspots range from 100-150 K at 200 km depth to 20-40 K at 400 km depth in WINTERC-G. In marine areas, our results suggest that temperatures are overall higher in the Pacific Ocean than in the Atlantic, Southern and Indian oceans. The amplitudes of the sublithospheric lateral thermal variations range from 200 K at 330 km to 120 K at 400 km.
The predicted surface heat flow in WINTERC-G mimics the lithospheric thickness map (Fig. 10). The cratonic continental cores are characterized by values of 45-55 mW m -2 , with the coldest geotherms beneath the greatest LAB depths (North America, Baltica and West Australia cratons). Younger continental terranes typically show 55-70 mW m -2 , with >70 mW m -2 in areas of shallow LAB (e.g. western Mediterranean and Pacific North American margins, East African Rift). In the oceans, there is a clear trend of decreasing heat flow from the MORs (in excess of 115 mW m -2 in some segments, in particular in the Pacific and Southern oceans) towards the older lithosphere (dropping below 35 mW m -2 in west Atlantic American margin).

Chemical structure
Our main compositional parameter in the mantle is Al 2 O 3 , with CaO and MgO being coupled to it based on global petrological data bases (Section 5.1, eqs 8 and 9). Our prior/reference model in the lithosphere corresponds to Al 2 O 3 = 3.6 wt% (and CaO = 3.1 wt%, MgO = 39 wt%) with fixed FeO = 8.1wt %, hence Mg# = 89.6. In the sublithosphere the prior/reference is Al 2 O 3 = 4 .5 wt% (and CaO = 3.9wt %, MgO = 37.9wt %) with fixed FeO = 8.05 wt %, corresponding to Mg# = 89.3 (Tables 1 and 2). As we define a unique compositional variable for the whole lithosphere in each column model, in terranes where a vertical chemical stratification is expected our inverted composition will only represent the average for the entire lithosphere, and similarly for the sublithospheric upper mantle.
The posterior lithospheric mantle composition tends to be refractory in the cratonic continental cores, especially in North America (Superior, Kaminak and North Atlantic cratons), West Australia (Yilgarn and Gawler Cratons), West African Craton and East Antarctica. The rest of the cratons show moderately refractory lithosphere and in some cases even a slightly fertile composition (with respect to the reference value), like the Sino-Korean, Aldan and Tanzanian cratons. The lithosphere from our inversion is also refractory in the Bay of Bengal, eastern Mediterranean, Izu-Bonin and Mariana arcs. Most of the eastern Asian and western North America margins, Central eastern Africa and Arabia show fertile compositions. Beneath the oceans, MORs are, in general, slightly refractory. In contrast, there are large patches of fertile lithosphere in the mid-Atlantic, Pacific (Hawaii, southeast of the Louisville hotspot), the Tasman Sea, the Philippine Sea Plate and the Kuriles Trench. The sublithospheric mantle composition is poorly resolved at this inversion stage, compared, for example, to the lithospheric composition and sublithospheric temperatures in areas of thin or moderately thick lithosphere.

Step 2, Gravity field: chemical structure
In the second step, we invert satellite gravity field data for the crustal density and mantle composition using the outcome from step 1 as prior information which, in the case of the temperature field, is kept fixed.

Data misfit
The gravity data (geoid, free air, gravity gradients) misfits are generally small (see Table 3), except for some localized areas in subduction zones (e.g. western Pacific, Java-Sumatra) or active tectonic areas (e.g. Tibet, the Andes, the eastern Mediterranean basin, Fig. 11). The relative surface-wave misfits from step 1 deteriorate slightly after the inversion of gravity data in step 2. In particular, standard deviation values increase by 0.05-0.2 % in the case of Rayleigh waves, and 0.06-0.13% for Love waves with respect to the misfits in step 1 (Fig. 12). That amounts to final standard deviation values of 0.16-0.47 % in the case of Rayleigh waves, and 0.25-0.58 % for Love waves after step 2 inversion. There are three causes for the increased surface wave misfit in step 2: i) the change in crustal density, which only affects the shortest periods (<50 s); ii) the lateral regularization applied to the input temperature and compositional fields from step 1, which affects smoothly all periods, and iii) the change in mantle composition (affecting density and seismic velocities), which increases the misfit moderately, chiefly at the intermediate and long periods (50-150 s).  Vs at the 660 km depth 5569.9 m s -1 * The three values separated by/correspond to the upper/mid/lower crustal layers. * * The prior for the Moho depth is given by Crust1.0 model (Laske et al. 2013). * * * The prior for the Moho depth is given by Crust1.0 model (Laske et al. 2013); the prior for the upper and mid crustal discontinuities is given by E+(Z Crust1.0 -E)/3 and E + 2(Z Crust1.0 -E)/3, respectively, where E is the bathymetry; for crustal thicknesses of <10 km only two crustal layers are used in the inversion. # The prior sublithospheric temperatures are vertically regularized and depend upon the thermal lithospheric thickness and the prior sublithospheric adiabatic gradient (see section 5.1 for further details and Table 1). Table 3. Residuals for the gravity field data. RMS of gravity field misfits for different inversion schemes in step 2, see text for further details. δρ c is the crustal density variation, δC mant = [δC Lit , δC Sublit ] is the mantle composition variation in the lithospheric and sublithospheric mantle. N F A, g i j , are the geoid, free air anomaly and gravity gradient tensor components i j, respectively. As the average crustal density and lithospheric mantle density (via composition) are modified to match the gravity field in step 2, the isostatic balance (i.e. isostatic residual elevation) is changed accordingly. Most subduction zones appear consistently as short-wavelength, high-amplitude, negative isostatic residuals (isostatic elevations > observed elevation). Other areas of moderately high, negative isostatic residuals are the south Australian margin, Altai Mountains in central Asia, Verkhoyansk Mountains in Eastern Russia, the Horn of Africa, the Barents and Kara Seas in the Artic, and most of Antarctica. The opposite situation, that is positive isostatic residual elevation (isostatic elevations < observed elevation) is found in the Andes, Tibet, the Iceland hotspot, the East African Rift and various LIPs and ocean islands, particularly in the Atlantic and Indian oceans. Cratons with a significant thermal root (except for most of East Antarctica) show moderately positive isostatic residual, with the East European and Greenland Cratons having the largest amplitudes among them (Fig. 13).

Crustal density
The main changes in the posterior crustal density after the step 2 inversion span across the short wavelength field and are related to margins, subduction zones and LIPs. The average crustal density in continental areas in WINTERC-G is within the 2800-2900 kg m -3 range, and its lateral distribution remains almost unchanged from step 1 to step 2 (Fig. 14). In the oceans, the crust is much thinner than in continents generally and the lateral density variations required to match gravity data are, consequently, larger. The lowest oceanic densities are located beneath LIPs and ocean islands, and, to a lesser extent, passive continental margins and most MORs (except the East Pacific rise).

Mantle composition and density
The posterior mantle composition in WINTERC-G after adding gravity field data in step 2 inversion differs only moderately from the prior/initial composition distribution from the step 1 inversion: <1 wt% in Al 2 O 3 , corresponding to approximately 0.45 in Mg# in general (Fig. 14). As a result of adding the gravity field data into the model, the final composition of the mantle is more refractory (i.e. its density decreases) than the prior/initial composition in some subduction zones (e.g. Izu-Bonin and Mariana, Caribbean, Java-Sumatra), passive margins (south Australian, Atlantic North American, Barents and Kara Seas, south Atlantic African and Indian) and eastern Antarctica. By contrast, the final composition of the mantle is more fertile (i.e. its density increases), predominantly around most MORs, the Philippine Sea Plate, the East African Rift and the Tasman Sea (Fig. 14). Within the lithosphere, the mantle composition is not substantially changed when gravity data is added in step 2. In other words, the correction to the lithospheric composition required to match the gravity-field data is small compared to the amplitude of the lateral variations in composition constrained by surface-wave, elevation and heat flow data in step 1. However, the situation is different in the sublithospheric mantle, where the posterior mantle composition is relatively poorly constrained by the input data from step 1 inversion, and remains close to the prior/reference model in most areas. In the sublithospheric mantle, the satellite gravity data used in step 2 inversion constrain the density/compositional field leading to a distinct long wavelength chemical pattern with ±0.5 wt% in Al 2 O 3 , with respect to the initial/reference composition: fertile sublithosphere in the Atlantic and Indian oceans, South America, Eastern Europe and the Sunda Plate; refractory sublithosphere in the Pacific Ocean, west Australia, east Antarctica, Atlantic North American margin and Tibetan Plateau (Fig. 14).

Review of previous studies
The global lithosphere and uppermost mantle model presented in this work is jointly constrained by state of the art surface wave tomography, surface elevation and heat flow, and satellite gravity field data. Seismic tomography models are regularly used to infer mantle temperatures as the elastic moduli depend on the mantle geotherm (e.g. Goes  ). Yet, separating thermal and comparatively weaker compositional effects in seismic data is far from easy (e.g., Afonso et al. 2013a, b). A number of previous studies have combined seismic tomography data (or models) with surface elevation (isostasy) and gravity field data to infer the upper mantle structure. The rationale is that the combination of gravity observations and isostasy together with seismically inferred temperatures helps constraining the upper mantle composition (e.g. Deschamps et al. 2001;Simmons et al. 2010;Cammarano et al. 2011;Afonso et al. 2019). Linearized scaling factors between seismic velocities and densities are typically used to couple seismic and gravity data, sometimes assuming constraints from mineral physics. While these approaches are useful and straightforward, they tend to ignore petrological non-linear effects, including phase transitions, and thermal considerations (lithospheric versus Isostatic residual elevation Figure 13. WINTERC-G: residual isostatic elevation (observed minus calculated) after step 2 inversion. Negative residual elevation (blue) means that the isostatic elevation derived from WINTERC-G density model is above the observed elevation and vice versa. Negative residual elevation could indicate a density deficit within the lithosphere or a negative dynamic topography (sublithospheric) contribution required to balance out the surface elevation.
sublithospheric geotherm and the effect of temperature on the seismic velocity-density relationship). In addition, the discrepancies among alternative tomographic models based upon different data or regularization schemes translate into discrepancies in temperature models. Density anomalies estimated by the conversion from seismic velocity anomalies tend to over-predict the amplitude of the gravity field signals (e.g. Faccena et al. 2004;Cammarano et al. 2011). Inversion of surface wave data for mantle structure using a thermodynamically driven approach has been applied, so far, to a few selected columns globally or to a number of tectonically regionalized terrains (Khan et al. 2009(Khan et al. , 2011. WINTERC-G is built within an integrated geophysical-petrological thermodynamically consistent framework, and is parametrically defined in terms of upper mantle thermochemical structure. The complementary sensitivity of our input data sets allows us to constrain the LAB depth, separate mantle temperature and composition anomalies, and estimate dynamic versus isostatic contributions to surface elevation in a consistent manner. The thermal lithospheric thickness in WINTERC-G is in general agreement with previous seismic (or seismically derived) models (Priestley et al. 2006;Priestley & McKenzie 2013;Simmons et al. 2010;Cammarano et al. 2011;Cammarano & Guerri 2017;Steinberger 2016;Dalton et al. 2017;Afonso et al. 2019) and earlier thermochemical models incorporating surface wave data at lower lateral resolution (Khan et al. 2009(Khan et al. , 2011. In terms of upper mantle composition and density, our model offers a new state of the art image at a high resolution globally (225 km average interknot spacing). WINTERC-G shows rather different chemical patterns in the lithospheric and in the sublithospheric mantle, broadly in line with earlier work results (Khan et al. 2009(Khan et al. , 2011. A commonly taken approach to infer the mantle geotherm is to transform seismic velocity anomalies (generally Vs as their amplitudes are less sensitive to model regularization and damping than Vp) into temperatures, accounting for possible compositional effects in a more or less sophisticated manner. Steinberger & Becker (2018) converted 5 seismic Vs tomography models into temperatures and calculated lithospheric thickness maps assuming a simple thermal half-space cooling model calibrated in oceans and including ad hoc corrections for possible compositional effects. These authors found a thick lithosphere (up to 250 km) in many cratons and an overall good correlation with independent elastic thickness estimates, suggesting a connection between the strength and seismically derived mantle temperatures. Priestley et al. (2006) parametrized Vs in terms of temperature using an empirical approach calibrated with thermobarometric data and temperature estimates from an oceanic plate cooling model; the temperature field converted from their surface wave Rayleigh tomography model was subsequently mapped into thermal lithospheric thickness. In subsequent work, Priestley & McKenzie (2013) adopted an experimentally derived model for Vs, seismic attenuation and viscosity, where the viscosity is used to calculate the Maxwell relaxation time and the dimensionless frequency (McCarthy et al. 2011). The lithospheric Vs model derived by Priestley & McKenzie (2013) includes an order of magnitude more data with respect to the earlier work in Priestley et al. (2006), resulting in a higher lateral resolution (about 250 km). Cammarano & Guerri (2017) converted 11 different global seismic tomography Vs models to upper mantle temperature maps using a thermodynamically driven formalism and the data base of Stixrude & Lithgow-Bertelloni (2005. These authors considered laterally constant end member compositions (pyrolite and refractory Archean), and additional constraints from regionally extrapolated surface heat flow values (Davies 2013) and ocean lithosphere cooling models. Cammarano & Guerri (2017) reported disparities in absolute Vs values and its vertical gradient (equivalent to temperature differences of 200-600 K) larger than Vs variations of chemically origin (equivalent to temperature anomalies of 100 K). At 180 km depth cold regions correspond to cratons, with overall larger lateral temperature variations compared to oceans (Cammarano & Guerri 2017). Dalton et al. (2017)  (2010) to compute attenuation and Vs from temperature explicitly, excluding possible melt related effects. Cratons appear as regions with low attenuation and thermally thick lithosphere, particularly in the N. American, Amazonia and NW Africa Cratons, Baltic and Siberian Shields and the northern Gawler Craton (Dalton et al. 2017). Deschamps et al. (2001) used a surface and body wave tomography model along with geoid anomaly data to estimate a scaling factor connecting relative density and Vs anomalies in the mantle. These authors computed separate averages for ocean and continents, finding in general smaller absolute values than other scaling studies, and differences between oceans and continents up to a depth of 260 km; both oceanic and continental lithospheric mantle showed positive scaling factors in the topmost mantle going to negative or around zero values at different depths, i.e. 220 km in continents and 140 km in oceans (Deschamps et al. 2001). Simmons et al. (2010) presented a global whole mantle model, GyPSuM, based on a joint linearized inversion of P and S wave arrival times, free air gravity anomaly, tectonic plate models, residual crustal isostatic topography, and ellipticity of the core-mantle boundary. GyPSuM was built assuming scaling ratios for density, Vs and Vp bounded by empirical mineral physics considerations. The volumetric distribution of velocities and densities in GyPSuM implicitly considers that most of the 3-D variation is coming from temperature rather than compositional differences, which in the upper mantle section of GyPSuM are only important under cratons. According to Simmons et al. (2010) the fastest and coldest cratons are North America, East Europe, Siberia, North Africa and West Australia. In terms of density in GyPSuM the largest negative density non-thermal contributions are located beneath the North American, West African and East European cratons. Cammarano et al. (2011) derived temperatures and densities for 4 shear wave tomography models and computed the corresponding synthetic surface topography and geoid anomalies (including harmonic degrees < 10) for 4 different compositional models (pyrolite, Tecton, Proton and Archon as per Griffin et al. 2009). They found that the discrepancies among tomography models led to larger variations than the uncertainties coming from modelling the elastic and anelastic Vs values from mineral physics. Although Cammarano et al. (2011) did not, in general, match the observed geoid and surface elevation within reasonable bounds, they were able to show that the combination of gravity field data together with seismically inferred temperatures helped in constraining the upper mantle composition. Afonso et al. (2019) presented a global model inverting gravity field satellite data and surface elevation for crust and upper mantle density and temperature using as prior information data from 6 global tomography models along with the crustal model CRUST1.0 (Laske et al. 2013). In particular, Afonso et al. (2019) used a hybrid lithospheric model biased towards LITHO1.0 (Pasyanos et al. 2014) and SL2013sv (Schaeffer & Lebedev 2013) over continents, whereas in oceans they considered a plate cooling model constrained by bathymetry, surface heat flow, petrological and mineral physics data (Grose & Afonso 2013). The results derived from Afonso et al. (2019) are in general consistent with SL2013sv (Schaeffer & Lebedev 2013) and SEMUCB-WM1 (French & Romanowicz 2014) with thick stable continental lithosphere being denser than oceans (especially MORs).
More comparable to our current work are studies inverting surface wave tomography data directly for temperature and composition. Khan et al. (2009Khan et al. ( , 2011 inverted global surface wave data to map the temperature and composition of the Earth's mantle using a thermodynamic formalism similar to the one used in this work. These authors inverted, within a Bayesian probabilistic framework, the Rayleigh and Love dispersion curves (fundamental and higher modes) extracted from the 5 • lateral resolution phase velocity maps of Vissers et al. (2008b). Khan et al. (2009) inverted for the mantle thermochemical structure in 12 1-D columns in selected locations, whereas Khan et al. (2011) inverted averaged dispersion curves in 27 regions following the tectonic regionalization model GTR1 from Jordan (1981). Khan et al. (2009) concluded that, in general, there is a gradient of decreasing fertility with depth in the upper mantle (represented as the basal fraction). A relatively fertile lithosphere is found in columns sampling subduction zones (north South America), plumes (Iceland, East African rift), and in the East European craton (the most fertile column and the only one with an increasing fertility gradient with depth); in contrast, the most refractory lithosphere is located beneath the East Pacific Rise (Khan et al. 2009). Regarding temperature, Khan et al. (2009) found that cratons and continental areas are colder than oceans (up to 100 K). A similar conclusion was derived by Khan et al. (2011): continental roots are colder than oceanic mantle down to 350 km depth. In addition, Khan et al. (2011) inferred that: i) most cratons are colder than the surrounding mantle down to at least 250 km; ii) oceans are increasingly colder with age, and iii) there is a reversal of the temperature pattern atop of the transition zone (400 km), where continental mantle turns warmer than oceanic mantle. Our results agree with Khan et al. (2011) in i) and ii) but not in iii) at least in the thickest cratons, as we will discuss in the next section.

Stable continental cores: cratons
Cratons are the building blocks of continents and, as such, they preserve traits of tectonic activity over billions of years. It is well known that most cratons appear as high velocity regions in seismic tomography images (e.g. Woodhouse & Dziewonski 1984;Gung et al. 2003;Schaeffer & Lebedev 2013). Cratons are also characterized by refractory compositions at least in their shallow part, according to available xenolith samples (e.g. Carlson et al. 2005;Griffin et al. 2009). Our inversion results confirm that most cratons are thick, cold and relatively refractory compared to the global average composition; however, there are considerable differences among them. According to WINTERC-G, the sublithospheric mantle beneath the thickest cratons on Earth (e.g. East European, North American, West Australian) remains cold compared to the surrounding ambient mantle down to the top of the transition zone (Fig. 9). Among the thick cratons, the North American and Siberian cratons show a comparatively more refractory lithospheric mantle composition and, therefore, are on average less dense than the other thick cratons at lithospheric depths. The most fertile and densest sublithospheric mantle in our model globally is located under the East European Craton. In contrast, other thick cratons seem to be partially underlain by refractory sublithospheric mantle, e.g. the eastern and southwestern margins of the North American and West Australian cratons, respectively ( Fig. 14 and Fig. F1). At the other side of the spectrum, we find thinned cratons, with no cold/fast keel, that are underlain by warm and fertile sublithospheric mantle (Sino-Korean, Aldan and Tanzanian Cratons.). This group of cratons is defined by a fertile lithospheric mantle and, from a structural point of view, they do not differ significantly from younger continental lithosphere elsewhere ( Fig. 14 and Fig. F1).
Chemically, we can identify three groups of cratons based on their lithospheric composition (Fig. 14): i) refractory (Al 2 O 3 < 2.5 wt%, or Mg# > 90.1): NAm., EAn, WAf., Sib. and Tar. Cratons; ii) intermediate (2.5 wt%, < Al 2 O 3 < 4 wt% or 89.2 < Mg#< 90.1): EEu., WAus., Arab., Kal., Con., Gre. and SP. Cratons and iii) fertile (Al2O3 > 4 wt % or Mg# < 89.2): Ind., SK, Ald. and Tan. Cratons. The EEur. and Amz. and SP Cratons have areas that fall within both the fertile and intermediate groups. As for the sublithospheric mantle, there are, broadly, two end-member compositional groups: refractory (Tar., WAus., NAm., EAn. and Con. and fertile (EEur., Sib., And., SK, Arab. and Amz). The rest of cratons range somewhere between those two end members. Our chemical model over continents is broadly consistent with the regionalized integrated Bayesian inversion of Khan et al. (2011) except for the Ind. and WAf. Cratons, where WINTERC-G shows comparatively less refractory and less fertile lithospheric compositions respectively. Interestingly, in an earlier work Khan et al. (2009) found that the most fertile mantle region sampled in their study and the only column location with an increasing fertility gradient with depth was located in the EEur. Craton, in line with our results.
Thermobarometry data from mantle xenoliths record the palaeogeotherm and composition at the time of their inclusion in the host magma, which does not have to coincide with the present day situation in general. In Fig. 15 we compare our model thermal results against thermobarometric data in four cratonic locations (Garber et al. 2018, and references therein): Udachnaya (Sib.), Slave (Nam.), Kaapvaal (Kal.) and Adelaida folded belt (WAus.). In the cases of Siberia and Slave we obtain excellent agreement between WINTERC-G and xenolith thermobarometric geotherms up to 150 km depth. Below 150 km the xenoliths record a hotter geotherm likely reflecting the tectonothermal event originating them (Figs 15a and b). Regardless of the nature of such magmatic events, our model shows that the (thermal) integrity of the two cratons has been preserved till present, i.e. a thermal relaxation rather than a lithospheric erosion process followed the magmatic event recorded in these xenoliths. In Kaapvaal we observe the opposite situation: below 100 km the present-day geotherm from WINTERC-G is hotter (LAB depth of 180 km) than the palaeo-geotherm recorded by the xenoliths (LAB depths of 200-260 km, Fig. 15c). We interpret this feature in relation to the deep African superplume thinning the lithosphere after the timing recorded in the xenoliths, 90-120 Ma. Finally, the xenoliths from Adelaida folded belt show a considerably colder geotherm compared to our model, which matches perfectly with the present-day geotherm in the adjacent Gawler craton (Fig. 15d). The implication is that, at the time recorded by the xenoliths, the Adelaida folded belt was part of the Gawler craton.

Upper mantle compositional heterogeneity
The long wavelength lithospheric chemical pattern we obtain in WINTERC-G is correlated with the thermal structure. In continental areas WINTERC-G shows wide chemical variability for relatively thin lithospheres (<140 km) whereas for thicker lithospheres there is a tendency for fertility to decrease with increasing thickness (Fig. 16). The former pattern is in agreement with observations from mantle xenoliths and peridotitic massifs showing that cratonic terranes are on average more refractory than younger and active terrains (e.g. Canil 2004;Griffin et al. 2009). In the oceans the lithospheric composition predicted by WINTERC-G is most refractory beneath MORs, as expected from the melting processes that originate the oceanic crust, and as reflected also in the mostly infertile compositions reported for MOR abyssal peridotites (e.g., Warren 2016). The lithospheric mantle composition becomes more fertile as the oceanic lithosphere ages and thickens likely due to thermal accretion of relatively fertile sublithospheric mantle at the base of the lithosphere. The lithospheric mantle fertility in oceans according to our results seems to peak for a lithospheric thickness of about 80-90 km which corresponds to a 40-80 Ma old lithosphere.
In the sublithosphere the mantle composition distribution from our inversions is rather different from that in the lithosphere above. WINTERC-G model shows that around the Pacific ridge the sublithospheric composition is averagely more refractory than in any other ridge; conversely the eastern South Indian and the mid-Atlantic ridges (especially nearby the Romanche transform) are characterized by relatively fertile compositions (Fig. 14). It is remarkable that our results, entirely derived from geophysical observations with no direct prior petrological constraints (e.g. from xenoliths), coincide partially with geochemical analysis of abyssal peridotites. According to the mineral modes, the most infertile ridge segment sampled by abyssal peridotites is the east Pacific rise, whereas the most fertile one are the south Indian and mid-Atlantic ridges (Warren 2016, and references therein). Under the continental blocks our results suggest that there are considerable differences among the sublithosphere under cratonic cores. Whereas the composition is fertile beneath the Eastern European and South American cratons, underneath other cratons we found a relatively refractory composition: east Antarctica, Western Australia, North America, Congo and Tibetan Plateau (Fig. 14).
While mantle temperature represents approximately the presentday physical state, the compositional field may reflect not only the current state of affairs but also traits of longer timescale processes yet not as long as the ones responsible for lithospheric chemical variations. Such sublithospheric processes may have left a chemical imprint in the upper mantle, further conditioning its dynamics (e.g. Yu et al. 2018;Munch et al. 2020;Yan et al. 2020). The mantle is commonly thought to be chemically heterogeneous at different scales (e.g. Anderson 2007). The continuous influx of subducted material (basalt, harzburgite and peridotite that fundamentally make up slabs) over billions of years is certainly a major source of heterogeneity (e.g. van Keken et al. 2002). The composition of abyssal peridotites collected from MORs also indicate mantle compositional heterogeneities (e.g. Warren 2016, and references therein), whereas the composition of mid oceanic ridge basalts (MORB) has been interpreted in terms of large scale chemical (e.g. Niu & O'Hara 2008) and temperature (e.g. Dalton et al. 2014) variations.
The pyrolite mantle compositional model represents a hypothetical fertile mixture of basaltic and dunitic components (in a 1 to 3-4 ratio, respectively) that upon melting is able to generate a typical basaltic magma leaving a refractory dunitic peridotite. Standard pyrolite compositions are derived from major element considerations under a number of assumptions about melting and magma parenting (Anderson 2007). When the subducted basaltic crust reaches depths >50 km, it transforms into eclogite. The parametrization in WINTERC-G is petrologically regularized in a way that the compositional space is described only by variations in Al 2 O 3 (or Mg# equivalently). Such parametrization follows a fertility trend observed in natural rocks that can be qualitatively regarded in terms of variable amounts of a basaltic/eclogitic component. In our model, Al 2 O 3 -rich compositions are more basaltic/eclogitic and representative of fertile mantle, whereas Al 2 O 3 -poor compositions are more dunitic thus reflecting a higher proportion of the refractory residue left after mantle melting. We do not account for possible departures of this scheme such as an additional pyroxenite (third) chemical endmember which, in any case, would be volumetrically small < 6% (e.g. Hirschmann & Stolper 1996;Warren 2016). The lithospheric mantle compositions we obtain are, on average, more fertile than the values reported in petrological data bases, especially in cratons. Such discrepancy has also been observed by Cammarano et al. (2011) combining seismic, topography and geoid data. A plausible explanation is that our model is parametrized with a unique lithospheric mantle layer. Therefore, any possible vertical chemical stratification in cratons will be averaged out. Petrological data from mantle xenolith suites frequently report that the upper part of some cratons is more refractory and texturally different from the lower part (e.g. Carlson et al. 2005;Griffin et al. 2009). The absolute compositional values in our model have to be regarded with some caution; they are dependent, among other things, on the calibration constant used to determine the absolute surface elevations (Section 4.2.8). However, although the average value of the compositions determined here could be shifted depending on the parameters used to calibrate surface elevation, the relative variations should be relatively robust.

Mid oceanic ridges (MORs) and ridge push
Mid oceanic ridges are one of the main features of mantle convection and plate tectonics. At MORs new oceanic crust is created by decompression melting of adiabatically ascending mantle in response to plate separation. Studies addressing the thermal structure beneath oceans from global tomography models introduce, in general, in one way or another, prior information from either the half-space or the plate cooling models ( oceans and instead assume lithospheric thermal steady state everywhere with an extra transient contributions in young oceans (<70 Ma, Section 4.2.2). A simple inspection of WINTERC-G compositional maps suggests that the mantle beneath the Atlantic and Indian oceans is, on average, colder and more fertile than that in the Pacific Ocean ( Fig. 9 and Fig. 14). That translates into low mantle densities in the Pacific (fast spreading ocean) with respect to the other major oceans (slow spreading, Appendix F). Furthermore, looking at intra-ocean variations along MORs in WINTERC-G we also see a trend that connects temperature, composition and density with oceanic spreading rate. In Fig. 16 we show the temperature, composition and density along an area encompassing the lithosphere with age <15 Ma along MORs for all oceans. We take the oceanic lithospheric age and spreading rate from a grid based on magnetic anomalies (Müller et al. 2008). We bin the mantle density, temperature and composition in 10 mm yr -1 spreading rate intervals and compute the dispersion for each bin. Our results show a correlation between spreading rate and mantle structure. In particular, for spreading rates of < 50-60mm yr -1 , mantle fertility and density decrease and temperature increases for increasing rate. At > 50-60mm yr -1 the mantle structure becomes less clearly correlated with spreading rate, and flattens more or less uniformly (especially density). The global MORB data base (PetDB, https://search.earthchem .org/) shows that the mantle source for slow and deep MORs is a fertile and dense peridotite whereas for fast and shallow MORs the peridotitic source is more refractory and less dense (Niu & O'Hara 2008). Our density model, constrained by surface waveform, gravity field, surface heat flow and elevation data integrated within a thermodynamic framework, is compatible with PetDB petrological data base, and suggests that the dynamics of MORs, i.e. mantle upwelling, is likely connected to composition and temperature (ultimately density and viscosity) in the upper mantle. This would allow us to tentatively relate ridge push, as an expression of the buoyancy contrasts across MORs, and spreading rate for slow spreading ridges (<55 mm yr -1 ); in contrast, for fast spreading ridges slab pull would play a major role as a far-field force not related to the local ridge density column. Our inversion results suggest that upper mantle temperature variations at depths >200 km across different ridge segments are modest (10-30 K) and that chemical changes (0.2 wt% in Al 2 O 3 , or 0.09 Mg# according to our petrological parametrization) would play a significant role in the inferred sublithospheric mantle density variations (3-7 kg m -3 , Fig. 16). Morphological differences between slow and fast spreading ridges have long been known. Fast spreading MORs exhibit in general shallow axial depths and smooth topography, whereas low spreading MORs typically show axial rift valleys and rough topography (e.g. Menard 1967;Small 1994). The (full) spreading rate varies across MORs from 170 mm yr -1 at the East Pacific Rise to <10 mm yr -1 at the Gakkel Ridge in the Arctic Ocean (e.g., Edmonds et al. 2003). One of the fundamental parameters describing MORs is the axial depth: in slow MORs (<55 mm yr -1 ) it is anticorrelated with spreading rate with a significant scatter, while in fast MORs (>80 mm yr -1 ) it is spreading rate independent and comparatively less variable (Small 1994;Rowley 2019). Petrological analysis of MORBs characterizes the original melting conditions and mantle source composition (e.g. Niu & O'Hara 2008, and references therein). MORB melts are not primary magmas but secondary products of crustal level fractional crystallization processes which have to be corrected for in order to extract a reliable mantle chemical signal (e.g. O'Hara & Herzberg 2002). The composition of locationaveraged MORB samples correlates with axial depth in MORs at global scale (Klein & Langmuir 1987). This has been interpreted in terms of depth and degree of melting variations in a chemically homogeneous mantle. In that case the compositional differences among MORB samples translate into temperature difference up to 250 K (Klein & Langmuir 1987;Langmuir et al. 1992;Dalton et al. 2014). Other authors have argued that such lateral temperature variations arise from neglecting mantle heterogeneity (e.g. Shen & Forsyth 1995;Presnall et al. 2002) and biases in the fractionation correction, thus liming the thermal change in MORs to <50 K (Niu & O'Hara 2008) in line with our results.

Average 1-D model of the Earth
Unlike commonly used 1-D Earth reference models, for example AK135 (Kennett et al.1995), PREM (Dziewonski & Anderson 1981), the 1-D global average of WINTERC-G does not show a Vs jump or change in slope at around 200 km depth (Fig. 17). Instead, our results show that it is possible to explain global surface wave data (along with the other constraining data sets considered here) with an average velocity structure compatible with a mantle geothermal gradient of 0.55-0.6 K km -1 and a potential temperature of 1300-1320 • C (for depths >200 km), close to previously estimated values in the literature (e.g. McKenzie & Bickle 1988;Fang & Niu 2003;Herzberg et al. 2007;Katsura et al. 2010). According to WINTERC-G, the amplitude of the maximum lateral temperature variations (cratons versus hotspots) in the sublithosphere at the top of the transition zone (400 km depth) is about 120 K. This agrees well with other seismic observations suggesting maximum lateral temperature variations of at most ∼120-150 K at the 410-km olivine to wadsleyite transition (e.g. Chambers et al. 2005, andreferences therein, Kuskov et al. 2014;Munch et al. 2020). The global average used in Schaeffer & Lebedev (2013) also shows a Vs gradient in the 100-200 km depth range. In contrast, in WINTERC-G the average velocities decrease strongly in the first ∼80 km and then increase steadily down to the top of the transition zone. The strong negative Vs gradient up to 80 km is due to the lithospheric conductive geothermal gradient corresponding to the average lithospheric thickness value in oceans. At greater depths the moderate positive Vs gradient arises from the average adiabatic thermal gradient. The average density distribution in WINTERC-G mimics the velocity average although the density decrease in the upper 80 km is less steep than in the case of the velocities (Fig. 17).
The thermal thickness of the lithosphere (i.e. depth of the 1300 • C isotherm in WINTERC-G) controls the vertical velocity structure in our model. Both pressure and temperature have a first order impact in mantle seismic velocities. Disregarding phase transitions, a pressure increase produces an increase in velocities whereas a temperature increase decreases seismic velocities. However, while the pressure gradient with depth does not change much laterally, the geotherm varies dramatically across the LAB. In this way, the thermal buffer (bounded atop by the LAB as per our definition) where the transition between conductive and adiabatic thermal regimes takes place, corresponds naturally with a relative low velocity zone. The thickness and amplitude of such low velocity zone depend on the lithospheric thickness: narrow and steep in thin and hot lithospheres, wide and gradual in thick and cold lithospheres. The former would justify why, regardless of any water or melt related effects, seismic receiver function techniques image a clear LAB signal in thin continental areas or oceans (e.g. Rychert et al. 2005;Kawakatsu et al. 2009;Hansen et al. 2015) but sometimes struggle to do so in thick, cratonic terranes (e.g. Kind et al. 2012). The mechanical lithosphere or stagnant lid is defined by a threshold viscosity value below which the mantle is not mechanically competent. Such mechanical boundary is located in the vicinity of our thermally defined lithosphere but its actual depth will be dictated by pressure (depth) and other minor mantle components such as water and melt. For instance, taking the thermal lithosphere as reference, the mechanical lithosphere will lie deeper in cratons than in thinned lithospheres because for a fixed temperature the mantle viscosity increases with pressure. Therefore, our temperature-based LAB will likely underestimate the mechanical lithosphere in thick cratons, and vice versa in younger, warmer and thinner lithospheres.

Dynamic topography
The Earth's surface elevation is the result of various processes acting at different timescales. It is well know from analysis of the Earth's gravity field that most of the surface elevation is isostatically compensated within the outermost mechanically competent layer: the lithosphere (e.g. Ricard et al. 2006). In other words, lateral density variations within the crust and lithospheric mantle dictate, to a large extent, the long wavelength surface elevation. In this light, the surface elevation can be regarded as the expression of the pressure balance at depth below the highly viscous lithosphere. The concept of dynamic topography stems from the transient (tens of Ma) surface deformation induced by mantle flow. As opposed to isostatic elevation that is in quasi-static equilibrium, dynamic topography depends upon moving density anomalies and viscosity stratification (e.g. Flament et al. 2013). A proxy to dynamic topography is the so called residual isostatic topography or elevation: the difference between the observed elevation and an isostatically compensated topography according to some predefined lithospheric model. Such residual topography, disregarding density and layer geometry uncertainties inherent to any lithospheric model, would originate from density variations underlying the lithospheric thermal boundary layer.
The combination of complementary data sets used in this work allows us to estimate the residual isostatic topography directly. Unlike on other studies, here both the crustal and lithospheric geometry, as well as their density structure, are jointly determined using the same constraints, minimizing, in this way, their uncertainties. Furthermore, in this work we do not consider any explicit agelithospheric depth oceanic model constrained by corrected bathymetric and magnetic age data. Instead, we constrain the mantle temperature, composition and associated density from our input global geophysical data sets. Considering this, the similarity between our predicted residual isostatic bathymetry (Fig. 13) and the estimates based on extensive oceanic crustal data sets (Hoggard et al. 2017;Rowley 2019) is remarkable. The most striking difference with our results is perhaps in the Argentinian Basin, off the SE South American margin, one of the world largest negative isostatic anomalies in amplitude and lateral extension (Hoggard et al. 2017;Rowley 2019); in contrast, in our model the residual anomaly in that location is only mildly negative. This can be understood if we consider that the thermal lithosphere in WINTERC-G is 20-30 km thicker (and therefore colder and denser) than its counterpart of the same age in the conjugate South African Atlantic margin. Therefore, the comparatively deeper bathymetry in the Argentinian basin arises as a natural consequence of isostasy rather than a dynamic contribution in WINTERC-G. The North Atlantic North American margin in our model is in a situation similar to that in the Argentinian Basin. Our results show that part of the topography signal that has been previously identified as residual (or dynamic) can be explained isostatically by lithospheric density variations compatible with our constraining seismic and satellite gravity data sets.
The RMS amplitude of the residual isostatic anomaly in our model is 264 m. This is comparable but lower than the values previously reported in the literature, 390-880 m (Flament et al. 2013;Hoggard et al. 2017;Rowley 2019). We note that since our inversion uses, in step 1, surface elevation under local isostasy as a constraint, our estimate of the residual isostatic anomaly would be biased towards lower values (see Appendix E).
Compared to oceans, the predictions of the residual topography onshore from earlier studies are more diverse, due to the different crustal models, data and methodologies used in those works (e.g. Panasyuk & Hager 2000;Kaban et al. 2003;Afonso et al. 2019). Over continental areas, the largest positive residual topography predicted in WINTERC-G (>500 m) is located in the East European craton, Greenland, the Andes and the Himalayas. The residual topography is moderately positive in Brazil, India, East African Rift, West Australian, Siberian, South and Northwestern African cratons according to our results. In contrast, central Asia, most of Antarctica, southern South America and, to a lesser extent, central Africa are characterized by negative residual topography values (Fig. 13).
In the oceanic lithosphere the bathymetry (and surface heat flow) is well explained isostatically by means of cooling models in which the thermally defined oceanic lithosphere thickens as it moves away from MORs. The simple half-space cooling model where the lithospheric depth increases with the square root of oceanic age (e.g. Turcotte & Schubert 2002) explains well the structure of oceanic lithosphere younger than 80 Ma if we put aside distortions induced by hotspots. For older lithospheres, the half-space cooling model overpredicts the bathymetry (or the lithospheric thickness). Oceanic lithosphere of >80 Ma age is better described by an asymptotic plate cooling model (e.g. Stein & Stein 1992). Regardless of the cooling model/mechanism considered, a crucial aspect in determining the residual or anomalous bathymetry over oceans is the data used to constrain the age-lithospheric depth relationship. A number of recent studies focused on the anomalous bathymetry of oceans incorporate large global data sets with extensive lateral coverage. Hoggard et al. (2017) compiled nearly 2000 oceanic controlled source seismic lines ranging over >300 000 km to determine waterloaded basement depths corrected for sedimentary and crustal loading. These authors combined the basement depths with the magnetic crustal age grid in Müller et al. (2008) to calculate a best-fitting agelithospheric depth ocean model from which residual elevation is estimated. Rowley (2019) used an approach similar to that of Hoggard et al. (2017) to correct for the sediment load using measured wet bulk densities from about 1200 deep-sea drilling sites taken from the International Ocean Discovery Programme (IODP). Rowley (2019) derived an age-lithospheric depth relationship for oceans combining sediment-load-corrected depths, MORs axial ridge depths and about 90 000 globally distributed magnetic reversal picks explicitly dated. The residual isostatic topography models of Hoggard et al. (2017) and Rowley (2019) show mutually similar patterns, also similar to the values given by WINTERC-G: positive residuals in regions affected by hotspot activity (e.g. Iceland, Azores, Canaries, Afar, Crozet, Kerguelen, Hawaii) and negative residuals along the eastern North American, southern Australian and eastern South American margins.
Continental lithosphere is, in comparison, more complex than oceanic lithosphere. Therefore, continental surface topography reflects a much longer history and a laterally diverse density structure, compared to oceans. In continents, residual topography is usually determined assuming a crustal (e.g. Steinberger et al. 2001) or a lithospheric density model (e.g. Panasyuk & Hager 2000, Kaban et al. 2003 and integrating its isostatic response over depth. Crustal compilation models based on controlled source seismics like CRUST2.0 or CRUST5.1 (Mooney et al. 1998;Bassin et al. 2000) have been typically considered (e.g. Panasyuk & Hager 2000, Kaban et al. 2003. As for the lithospheric mantle, different data sets and models are taken into account. Panasyuk & Hager (2000) assumed the tectonic regionalization GTR1 from Jordan (1981) assigning a density contrast of +0.2% to the continental tectosphere (with respect to oceans) with its base set at constant depths of 100, 180 and 200 km in Phanerozoic orogenic zones, Phanerozoic platforms and Precambrian shields and platform, respectively. Kaban et al. (2003) derived lithospheric mantle densities from a temperature model based on surface heat flow and crustal radiogenic measurements (Artemieva & Mooney 2001) in stable continental platforms; in other continental areas not covered by thermal data or likely out of thermal equilibrium as well as oceans these authors converted seismic velocity anomalies from S20 tomography model (Ekström & Dziewonski 1998) to densities. Some common features among the estimates of Panasyuk & Hager (2000) and Kaban et al. (2003) include negative residuals in eastern North America, eastern Europe, western South America and positive residuals in western North America, East African Rift, Asia and SW India.
In continental areas, our predicted residual isostatic topography shares fewer similarities with previous estimates, largely based on CRUST1.0 crustal model and earlier versions of it (cf. Flament et al. 2013, and references therein). Even in those cases where the sign of our predicted residual topography matches the values from previous studies, our results show lower amplitudes: for example positive residuals in western North America, Brazil and the East African Rift, and negative residuals in central and northern Africa, Bay of Bengal and Kara Sea. In other areas we find conspicuous differences between WINTERC-G predicted values and some earlier works. For instance, in eastern Europe and Greenland, and, to a lesser extent, northern and western Australia and Siberia, our residual topography is positive (400-700 m, Fig. 13) rather than negative (e.g. Panasyuk & Hager 2000;Kaban et al. 2003). Conversely, our predicted residuals are small and around zero in eastern North America, rather than strongly negative, and negative instead of positive in east and west Antarctica (Flament et al. 2013, and references therein). On the other hand, a recent global lithospheric model constrained by elevation and gravity data complemented by seismic tomography prior information (LithoRef18 Afonso et al. 2019) predicts a similar pattern of positive residual topography over continental platforms (especially East European, North and South American and Australian cratons) and negative residuals over parts of central Asia, western Europe, western North America and central Africa. As in our study, Afonso et al. (2019) also used CRUST1.0 (Laske et al. 2013) as the prior model in their inversion and their results show crustal thinning with respect to CRUST1.0 in some cratons, in particular in Africa and South America, with a collocated increase in the average crustal density. In this work, we use CRUST1.0 model as prior/reference information for the Moho depth, allowing for variations within the uncertainty bounds statistically defined by Szwillus et al. (2019). Densities within the crust are not derived from CRUST1.0 but constrained entirely by data in our inversion: elevation and surface waveform data in step 1, and satellite gravity gradient data in step 2. The crustal densities in WINTERC-G are, in general, comparable to those in LithoRef18 but our correction to the crustal thicknesses in CRUST1.0 in most continental areas is reversed with respect to that in LithoRef18; for example we obtain crustal thickening beneath most cratons (especially in Africa), and thinning in central Africa, central South America, east Asia and east Australia. By contrast, the asthenospheric densities in LithoRef18 show values and a spatial pattern similar to that from our average sublithospheric densities.
There are different possible causes for the discrepancies on the residual isostatic topography reported over continents here and elsewhere. In the first place, in contrast to the oceanic case, the comparatively thicker continental crust plays a significant role in the isostatic balance. Hence, different values of Moho depth and/or average crustal thickness would impact the estimates of residual topography. The average crustal densities in WINTERC-G model are in general agreement with previous estimates based on seismic and gravity data (e.g. Afonso et al. 2019), with values around 2850 kg m -3 in most cratonic and continental platforms except in the north and south Americas where densities are about 50 kg m -3 lower (Appendix F). It would, in principle, be possible to modify the average continental crustal density to obtain full isostatic equilibrium in some areas (i.e. zero residual topography). For instance, a reduction of 20-50 kg m -3 would account for 400-700 m of positive residual topography over the East European, West African or Indian cratons; alternatively, the lithospheric mantle composition could be changed (0.5-1 wt% Al 2 O 3 , or 0.22-0.45 in Mg#) to obtain a flat residual topography over cratons, albeit the corresponding surface wave data misfit increase introduced would be in this case larger than in the case of a crustal densities modification. In the two cases (change in crustal density or lithospheric mantle composition) the sublithospheric mantle composition (and therefore density) would have to be subsequently adjusted to match again the gravity field data, further increasing the surface wave data misfit (see Appendix E).

C O N C L U S I O N S
In this work we present a new global lithospheric and upper mantle thermochemical model constrained by state of the art waveform inversion, satellite gravity, surface elevation and heat flow data. WINTERC-G is based upon an integrated geophysical-petrological approach where mantle seismic velocities and density are computed within a thermodynamically self-consistent framework fully accounting for petrological non-linear effects (e.g. phase transitions). Our model incorporates thermal modelling adding in this way a physically driven regularization to the inversion of seismic, gravity and elevation data. The complementary sensitivities of our data sets allow us to constrain the geometry of the LAB, to separate thermal and compositional anomalies in the mantle, and to estimate a proxy for the dynamic surface elevation contribution. WINTERC-G shows similar long wavelength temperature patterns as those present in previous seismic (or seismically derived) global models and earlier integrated studies incorporating surface wave data at lower lateral resolution. The upper mantle composition and density distributions in WINTERC-G offer a new state of the art image at a high resolution globally (225 km average interknot spacing).
The fundamental parameter in our model is the depth of the thermal LAB, defined as the 1300 • C isotherm, as its value is the best resolved one by our constraining data sets. According to WINTERC-G the lithosphere is thick (>150 km), and cold in most cratons and some active areas. We identify considerable differences in cratonic temperatures and compositions. Based on their thermal thickness we can categorize cratons into four groups: In terms of lithospheric mantle composition, we can set three groups of cratons: (i) Refractory (Al 2 O 3 , < 2.5 wt%, or Mg# > 90.1): NAm., EAn, WAf., Sib. and Tar. Cratons.
The EEur. and Amz. and SP Cratons have areas that fall within both the fertile and intermediate groups.
According to WINTERC-G the sublithospheric mantle beneath the group of thickest cratons (depth > 200 km) remains about 60-120 K colder than the surrounding ambient mantle down to the top of the transition zone. In contrast, the sublithosphere under other cratons, or part of them, is relatively warm: SK, Ald., Tan. Ind., Gre., SP and Amz. In general, the amplitudes of the sublithospheric lateral thermal variations range from 200 K at 330 km to 120 K at 400 km. The 1D average profile of WINTERC-G displays a mantle geothermal gradient of 0.55-0.6 K km -1 and a potential temperature of 1300-1320 • C for depths >200 km.
The rest of cratons range somewhere between those two compositional end members. The most fertile and densest sublithospheric mantle at global scale in our model in located under the EEur. Craton. In contrast, other thick cratons seem to be partially underlain by refractory sublithospheric mantle, for example the eastern and southwestern margins of the NAm. And WAus. Cratons, respectively.
We obtain lithospheric thinning in the Pacific North American and Chilean Antarctic margins, Atlas Mountains (northwest Africa), Altai-Hangay Mountains (northwest Mongolia), Oman Sea, Borneo and eastern Asian margin. Hotspots are in general characterized by relatively high sublithospheric temperatures, 30-130 K warmer than the ambient mantle. The largest thermal anomalies are located southeast of the Louisville hotspot, beneath the Tasman Sea, Iceland hotspot, Azores and Bouvet triple junctions, North America Pacific margin and the East African Rift.
WINTERC-G shows progressive thickening of oceanic lithosphere with age, from low values of 60-70 km around MORs to 100-130 km in the oldest oceanic lithosphere. The oceanic lithosphere beneath MORs is slightly more refractory than the global average. In contrast, we find fertile lithospheric mantle zones in the mid-Atlantic and Pacific Oceans (Hawaii, southeast of the Louisville hotspot), the Tasman Sea, the Philippine Sea Plate and the Kuriles Trench. We observe significant regional differences among oceans: the mantle lithosphere beneath the Atlantic and Indian Oceans is, on average, colder, more fertile and denser than that beneath the Pacific Ocean. Our model suggests that along MORs there is a global correlation between spreading rate and mantle structure. In particular, for spreading rates of < 50-60 mm yr -1 , mantle fertility and density decrease and temperature increases for increasing rate. At >50-60 mm yr -1 the mantle structure becomes less clearly correlated with spreading rate. In this light, ridge push, as an expression of the buoyancy contrasts across MORs, could control spreading rate for slow spreading ridges (<55 mm yr -1 ), whereas for fast spreading ridges slab pull would play a major role as a far-field force not related to the local ridge density column.
Mantle geophysical properties (density and seismic velocity) are determined based on a thermodynamic formalism in WINTERC-G. However, this is not the case for crustal velocities and densities. In contrast to the mantle, where thermodynamic equilibrium is prevalent, vast portions of the crust are thermodynamically metastable, with their mineralogical assemblage and physical properties reflecting instead the conditions present at the moment of rock formation. In addition, the petrological and lithological complexity of the (continental) crust is higher than in the mantle. Therefore, a fundamental methodological future development will be to define a comprehensive lithological parametrization of the crust suitable for inversion of large scale geophysical data sets. In a similar fashion, although included to account for the longest surface wave dispersion curve periods, the velocities at the 410 and 660 km discontinuities are not thermodynamically parametrized in the current version of WINTERC-G. This has a non-straightforward impact in both the predicted residual isostatic topography and in how the gravity signal should be filtered prior to the inversion (e.g. Afonso et al. 2019;Davis et al. 2019). Future work should address this issue further extending the model deeper to at least include the transition zone in a consistent manner.

A C K N O W L E D G E M E N T S
We thank Jörg Ebbing, Roger Haagmans and the members of the 3-D Earth Consortium for numerous discussions and valuable feedback. We would like to thank the editor Juan Carlos Afonso, Bernhard Steinberger and an anonymous reviewer for their careful reading of the original version of the paper and for many constructive comments and suggestions. JF is supported by an Atracción de Talento senior fellowship (2018-T1/AMB/11493) funded by Comunidad Autonoma de Madrid (Spain). This study has been done in the framework of the project '3D Earth-A Dynamic Living Planet' funded by European Space Agency (ESA) as a Support to Science Element (STSE) and has received funding from Science Foundation Ireland (SFI) grant iTHERC to JF (16/ERCD/4303) and SFI grants to SL 13/CDA/2192, 16/IA/4598, cofunded by the Geological Survey of Ireland and the Marine Institute and 13/RC/2092, cofunded under the European Regional Development Fund. The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme H2020-MSCA-IF-2014 under the Marie Skłodowska-Curie REA grant agreement n • 657357 . We thank the operators of the permanent and temporary seismic networks around the world for collecting and sharing seismic data and acknowledge the Incorporated Research Institutions for Seismology (IRIS; http://www.iris.edu), the GE-OFON Global Seismic Network (https://geof on.gfz-potsdam.de) and Observatories and Research Facilities for European Seismology (http://www.orfeus-eu.org) for providing the data used in this study. The authors thank ESA for the provision of GOCE satellite gravity data. The figures were generated with the Generic Mapping Tools (GMT, http://gmt.soest.hawaii.edu). The model presented in this work are available upon request to the corresponding author.
where the asterisk denotes the complex conjugate. In our case for the linear density distribution in eq. (A1) and integrating in with respect to the radius we obtain: In view of the last expression, it is convenient to express the boundary topographies, a( ) and b( ) as where R a and R b are the mean radii of a( ) and b( ), and s( ) and t( ), are undulations of of a( ) and b( ) with respect to the mean radii. We can express the power of the boundary radii in terms of spherical harmonics (Martinec et al. 1989;. For an integer n ≥ 1, the nth power of the boundary a( ) can be written as a power series of s( )/R a by means of the binominal theorem: [a ( ) ] n = R n a n k = 0 n k A similar expression holds for [b( )] n . If we now introduce the ratios p a = R a /R and p b = R b /R we can derive the final expression for the scaled potential coefficients: In a particular case, when both bounding topographies are spherical, that is s( ) = t ( )= 0 for any , the only non-zero contributions to the integrand in the previous equation are two terms for k = 0. Therefore: The synthetic gravity field (geoid, gravity anomalies and gravity gradients) can be computed from V jm (σ jm ) as described in Fullea et al. (2015) and . The WINTERC-G model has been computed using three different levels of parallelization. In the spherical harmonic formalism the gravity field contribution of the different layers are parallelized. For the layers where either the upper or the lower bounding surfaces are non-spherical the calculation of the binomial expansion terms in σ jm is paralellized. Finally, the column independent thermodynamic calculations required to evaluate the mantle density from temperature and composition are also parallelized.

A P P E N D I X B : S E I S M I C T O M O G R A P H Y
In this appendix we show a comparison of Rayleigh-wave, phase-velocity maps (Fig. B1) and the relative data sampling for the input Rayleigh (Fig. B2) and Love (Fig. B3) phase-velocity maps.

A P P E N D I X C : G R AV I T Y F I E L D DATA F I LT E R I N G
A common assumption in lithospheric modelling studies at regional scales is that the geoid anomaly should be filtered retaining only wavelengths corresponding to harmonic degree and order ≥10 to exclude deep signals (e.g. Featherstone 1997;Chase et al. 2002;Fernàndez et al. 2010;Fullea et al. 2010Fullea et al. , 2014. This assumption is valid at scales where long wavelengths are naturally filtered by the very regional nature of the model. At the global scale some authors have also proposed a similar wavelength filtering to image the upper mantle density structure. The rationale for such filtering is based upon geoid kernels computed assuming some rheological stratification (e.g. Forte & Peltier 1991;Deschamps et al. 2001), and modelling of gravity field and elevation data (e.g. Afonso et al. 2019  (contributing up to 70% of the 2-10 signal) are clearly related to core-mantle boundary and lower mantle density anomalies, part of the degree 4-9 geoid field, which contains the signal of subduction processes, mid oceanic ridges, and mantle plumes, is also coming from density anomalies within the upper mantle and the lithosphere (e.g. Crough & Jurdy 1980;Hager 1984;Bowin 2000). This fact is also apparent if we examine the power spectra of the phase-velocity maps inverted in step 1 in this work and the resulting temperature maps (i.e. without using gravity data). Hence, removing harmonic degrees 4-9 from the input gravity data would artificially damp long wavelength signals stemming from the upper mantle within our model domain. At the same time part of the harmonic degrees 4-9 gravity signal has its origin likely below 400 km, in particular within the mantle transition zone in the depth range 400-660 km, according to estimated 1-D geoid kernels (e.g. Deschamps et al. 2001). In this work, we remove only degrees 2-3 from the gravity field data in our inversions (n4 model thereafter). For a comparison, we have conducted an alternative step 2 gravity field data inversion for crustal density and mantle composition/density removing degrees 2-9 from the input gravity field data (n10 model thereafter, Fig. C1). We can consider n4 and n10 as two end-member models: n4 contains a spurious harmonic degree 4-9 signal mainly from the mantle transition zone, whereas n10 is missing the fraction of the harmonic degrees 4-9 signal that originates in the upper 400 km.

s, North Atlantic
In the inverted crustal density the differences between models n10 and n4 are only relevant in oceans where, in general, they are in the range ±30 kg m -3 with relatively short wavelengths (harmonic degrees 15-30, Fig. C2). For the mantle composition the differences between n10 and n4 models (± 0.3 Al 2 O 3 wt% or 0.15 Mg#) show comparatively longer wavelengths (harmonic degrees 4-9) and a clear correlation with plate tectonic features (Fig. C2). With respect to model n4, model n10 tends to overestimate mantle fertility and density in most thick cratonic areas and the west Pacific Ocean, and to underestimate those fields in the Pacific subduction margins, the Alpine-Himalayan belt, and some MOR sections in the North Atlantic and Indian oceans. In terms of density, at lithospheric levels the differences n10-n4 range from  Figure B2. Relative data sampling for Rayleigh wave phase-velocity maps. Sensitivity is shown as the sums of the columns of the sensitivity matrix, which include the number of paths touching each knot of the model grid and the path weights (Lebedev & van der Hilst 2008). Maps are shown for six sample periods. Column sums values are in percentage of the maximum at each period. Areas with no coverage are shown in grey. The total number of paths is reported at the top right of each plot. Plate boundaries are shown in red.
-10 kg m -3 (e.g. Sunda Plate, Tibet) to up to 10 kg m -3 (southwestern Australian margin, western Pacific Ocean). In the sublithosphere, the spatial pattern of the density differences among n10 and n4 models is similar to that in the lithosphere but the amplitudes are comparatively smaller (±5 kg m -3 in general, Fig. C2). If we consider a subduction zone we can understand better the end-member nature of both n4 and n10 models. Slabs subducting across the upper mantle are colder than the ambient mantle and, therefore, denser. It is clear that n10 model is missing part of the upper mantle high density slab signal by removing harmonics 4-9 from the input data. Such data filtering would explain why model n10 tends to underestimate mantle density along the Pacific subduction belt with respect to n4 (Fig. C2). However, if a slab crosses our model's bottom boundary (400 km depth) dipping into the transition zone and deeper, a part of its associated deep gravity signal reflected in degrees 4-9 will be spuriously propagated as artificially high upper mantle densities into our n4 model. In view of our analysis, at least in subduction zones, we can regard n10 and n4 models as the lower and upper bounds for the upper mantle density respectively.

A P P E N D I X D : M A N T L E M E LT I N G
In this work melt and pre-melt effects on mantle seismic velocities and attenuation are parametrized using three auxiliary parameters: a pseudo-melt fraction, , a critical temperature T c , and the threshold melt fraction, c . Pre-melt effects are introduced within a buffer range when T c < T < T sol :  Figure B3. Relative data sampling for Love wave phase-velocity maps. Sensitivity is shown as the sums of the columns of the sensitivity matrix, which include the number of paths touching each knot of the model grid and the path weights (Lebedev & van der Hilst 2008). Maps are shown for six sample periods. Column sums values are in percentage of the maximum at each period. Areas with no coverage are shown in grey. The total number of paths is reported at the top right of each plot. Plate boundaries are shown in red.
Al O mantle-n4 where T sol is the solidus temperature for a given pressure. The pseudo melt fraction varies linearly from zero at T = T c to c , at T = T sol within the buffer range. Here we assume T c = 0.98 * T sol and c = 0.7 %. We note that the gradient so defined does not represent actual melt but serves as a smoothing parameter (Bonadio et al. 2018), reflecting pre-melt effects on the rock-aggregate anelastic behaviour When the temperature reaches the solidus the pseudo-melt fraction is equal to the threshold melt fraction c and afterwards is defined as c plus the usual melt fraction per se: where T liq is the liquidus temperature for a given pressure. In this work, we use the mantle-peridotitic nominally anhydrous solidus and the liquidus from Katz et al. (2003) and references therein, and we limit the maximum amount of pseudo melt to 2%. Figure D1 shows the pseudo melt distribution in WINTERC-G model at different depths. Melt and pre-melt effects start to be important at depths of 78-80 km, peak at 82-84 km depth, and vanish at depths >90 km. Our model predicts significant melting beneath mid oceanic ridges (especially in the Pacific Ocean), East Africa Rift, Asian Pacific margin and Sunda Plate.

A P P E N D I X E : I S O S TAT I C R E S I D UA L T O P O G R A P H Y
Our model is biased towards a local isostasy situation as we are asking for this (weighted) condition in the inversion (step 1). After step 1 inversion we obtain around zero topography residuals for most continental regions, whereas the pattern in oceans shows bathymetric residuals comparable to those from earlier studies (e.g, Hoggard et al. 2017;Rowley 2019). After adding gravity field data in step 2 inversion the residual pattern in oceans remains similar to the one from step 1 (with reduced amplitudes in many areas), whereas for continents a residual topography pattern emerges as a result of the modified crustal density and mantle composition (Fig. 13). Unlike oceans, where the bathymetry is largely dictated by the thermal structure, in continents there is a trade-off between crustal structure (density and thickness) and mantle composition due to the relatively larger contribution from the thick crust ( cf. Fig. 2c).
In this appendix we present an alternative model where we enforce local isostatic equilibrium in step 2 inversion of gravity field data. We do so by decoupling the lithospheric and sub lithospheric mantle composition/density. The rationale for it is separating model parameters with (lithosphere) or without (sublithosphere) effects on the isostatic balance. Therefore, simultaneously modifying continental crustal density, lithospheric and sublithospheric mantle composition we can jointly invert for surface topography (local isostasy) and gravity field data.
To keep the models comparable we impose the same threshold bounds applied in the original WINTERC-G model (see Section 5.3). The alternative isostatic model (Fig. E1) shows residual isostatic topography around zero in general, and large negative gravity field residuals (calculated-observed) beneath most of the continental areas characterized by positive residual topography (observed-calculated) in the original model (e.g. East European craton, west Australia, Himalayas, Fig. 13). The alternative isostatic model (Fig. E2) is characterized by lateral chemical variation in both the lithospheric and sublithospheric mantle stronger than those in the original model (Fig. 14). In particular, in the regions where the negative gravity field residuals are large in the alternative isostatic model (e.g. East European craton) the sublithospheric mantle composition is at the upper boundary value set in the inversion (Fig. E2C). Therefore, to match the observed gravity field in those areas preserving the isostatic elevation, the inversion would tend to further increase the sublithospheric mantle density by making the composition even more fertile. If we now consider 'dynamic' sublithospheric topography restricted to the upper mantle (400 km), E dyn , this would imply a negative contribution in areas of high sublithospheric density (e.g. in East European craton). In other words, in such high density sublithospheric regions the predicted isostatic topography, E isos , must be increased (negative isostatic residual) so that the observed topography, E obs , is matched: E obs = E isos + E dyn . In order to obtain an increase in E isos the lithospheric density must be reduced, but if we want to simultaneously match the gravity field data we must then further increase the sublithospheric density. Therefore, the resulting model, adjusted for upper mantle 'dynamic' topography, would contain even larger lateral and vertical compositional gradients across the mantle than the alternative isostatic model (Fig. E2), which would be difficult to reconcile with the surface wave tomography data even if we allow the thermal structure to vary reasonably.

A P P E N D I X F : M A N T L E D E N S I T Y
The final 3-D distribution of the mantle density (and seismic velocities) in WINTERC-G is dictated by two inversion parameters: temperature and bulk composition. Within the upper 150 km, the density distribution mimics the (thermal) lithospheric thickness and, to a lesser extent, the lithospheric compositional field (Fig. F1). At about 50 km depth, old oceans, cratons and stable continental platforms are relatively dense, whereas MORs, young oceans, continental rifts and thinned lithosphere appear as low density zones, with a maximum density contrast of 220-240 kg m -3 . At greater depths, the high density areas are progressively restricted to cratonic cores and thickened lithosphere zones. At Tasman Sea, whereas in other areas (e.g. most of the Pacific Plate, North-western Atlantic, North America Atlantic and west Australian) the low mantle densities are primarily related to refractory composition (Fig. 9, Fig. 14 and Fig. F1).