Joint estimate of the coseismic 2011 Tohoku earthquake fault slip and post-seismic viscoelastic relaxation by GRACE data inversion

SUMMARY Satellite-derived gravity data offer a novel perspective for understanding the physics of megathrust earthquakes at subduction zones. Nonetheless, their temporal resolution and observational errors make it difﬁcult to discern the different phases of the seismic cycle, as the elastostatic deformation (coseismic) and the stress relaxation by viscous ﬂow (post-seismic). To overcome these difﬁculties and to take advantage of the physical constraints on the temporal evolution and on the spatial pattern of the earthquake-induced gravity disturbances, we have jointly estimated the fault slip of the 2011 Tohoku earthquake and the rheological stratiﬁcation by means of a Bayesian inversion of GRACE data time-series and within the framework of spherically symmetric self-gravitating compressible viscoelastic earth models. This approach, in addition to improve the exploitation of satellite-derived gravity data, allows us (i) to constrain the fault slip taking advantage of information from both the co- and post-seismic signatures and (ii) to investigate the trade-off between the fault slip and the shallow rheological stratiﬁcation. In this respect, it can be used to improve the modelling of crustal displacements from GPS data, even if their higher accuracy and temporal resolution allow to discriminate well the coseismic signature from the others.


I N T RO D U C T I O N
Solid Earth mass rearrangement and ocean water redistribution caused by great earthquakes are made visible by their co-and postseismic signatures on the Earth's gravity field, nowadays detectable by the Gravity Recovery And Climate Experiment (GRACE) and Gravity and Ocean Circulation Explorer (GOCE) satellites (De Linage et al. 2009;Matsuo & Heki 2011;Cambiotti & Sabadini 2013;Han et al. 2014;Broerse et al. 2015;Fuchs et al. 2016). These signatures result from earthquake-induced elastostatic deformation (coseismic) and stress relaxation by viscous flow (post-seismic, Tanaka et al. 2006;Sabadini et al. 2016), and are superimposed on the background gravity field, which varies through time due to hydrology, continental ice variations, residual ocean circulation and solid Earth processes other than the earthquake (De Linage et al. 2009;Matsuo & Heki 2011).
The monthly temporal resolution and the observational errors of GRACE data make it difficult to distinguish the co-and post-seismic signatures only on the basis of their temporal evolutions, especially when we have to estimate at the same time the characteristic times controlling stress relaxation. These relaxation times, indeed, vary with the wavelength and so depend on the geographical location (De Linage et al. 2009;Broerse et al. 2015). Furthermore, these signatures have also to be distinguished from the time-dependent background gravity field (from now on abbreviated to TBG). In particular, differently from annual and semiannual periodic signals which can be removed from the 15 yr of GRACE data, additional trends in the TBG (due to non-periodic hydrological, oceanic and ice mass re-arrangements; Didova et al. 2016) can affect the estimate of the post-seismic signature (De Linage et al. 2009).
These difficulties compromise the estimate of the earthquake signatures and their later comparison with a physical model could be not meaningful. The data time-series analysis, indeed, neglects the physical relation between the spatial pattern and the temporal evolution of the post-seismic signature, as well as the relation between the co-and post-seismic signatures themselves, which are the response of the Earth to the same forcing: the earthquake. In particular, both the co-and post-seismic signatures depend on the coseismic slip and are characterized by different spatial patterns, where only the post-seismic one depends on the rheological stratification. After-slip also causes gravity changes which involve both elastostatic deformation and the following viscoelastic relaxation and, so, its signature depends both on the evolution in time of the after-slip distribution and on the rheological stratification. In light of this and similar to what has already been done using inland and seafloor GPS data time-series after the 2011 Tohoku earthquake (Yamagiwa et al. 2015;Tomita et al. 2017), the only way to discriminate the after-slip contribution from the viscoelastic response to the main shock shall take into account both spatial and temporal patterns of the observed gravity changes. Table 1. Seismic moment, moment magnitude and average focal mechanism (rake, dip and strike angles) of the fault slip obtained by inversion of filtered and unfiltered GRACE data. For each filter we also report the equivalent Gaussian radius, from the comparison of the isotropic part of the anisotropic DDK filters with the Gaussian ones (Kusche et al. 2009 In order to overcome these difficulties due to the intrinsic nature and trade-offs embedded within the earthquake process in which an inaccurate estimate of the coseismic deformation would affect the estimate of the post-seismic deformation and vice versa, we hereinafter define a rigorous method for modelling GRACE data. It consists in fitting a physico-mathematical model of both the coand post-seismic signatures to the satellite-derived gravity data and apply it to the case of the 11 March 2011 Tohoku earthquake. In other words, by means of a fully Bayesian approach and avoiding any preliminary data time-series analysis, we jointly estimate the fault slip and the rheological stratification controlling the earthquake signatures, as well as a simplified TBG model. The choice of using only GRACE data, without including additional data sets as those from seismic waves and crust displacements (Fuchs et al. 2016;Zhou et al. 2018), aims at discussing the advantages of the method presented here in a case where it is actually difficult to recognize the coseismic signature by a preliminary data time-series analysis. Furthermore, it aims at establishing what we can learn from this geodetic technique and by the perspective of improved gravity data that the next generation of gravity missions can make available in the next future (NGGM; Silvestrin et al. 2012;Pail et al. 2015;). On the other hand, this choice prevents us from estimating the after-slip distribution following the main shock. Indeed, in order to catch its evolution in time, satellite-derived gravity data should be able to detect trends on short timescales, say of a few months, and this is not yet the case, at least for those from GRACE space mission. We expect, however, that this limitation does not affect our conclusions. Indeed, for the case of the 2011 Tohoku earthquake, it has been shown that there is a prevalence of viscoelastic relaxation, rather than after-slip (Sun et al. 2014;Iinuma et al. 2016;Tomita et al. 2017).
The data analysis, as well as their modelling and the details about the inversion theory are presented in Section 2 and are mainly a combination of previously developed and well established strategies. Then, we present the results of the joint inversion of both filtered and unfiltered GRACE data and their discussion in Sections 3 and 4.

Gravity data
We consider the RL05 GRACE Level-2 monthly data products and their error structure generated by GeoForschungsZentrum (GFZ, Dahle et al. 2013) up to spherical harmonic (SH) degree and order L = 90, from April 2002 to June 2017, but for the 2011 March product, which we have excluded from the analysis since the earthquake occurred during that month. In particular, we consider the whole covariance matrix describing the spatial correlation of each monthly product, thus assuming no correlation between different times. As recommended, we replace the C 20 Stokes coefficients by the ones from Satellite Laser Ranging (Cheng & Tapley 2004).
In order to focus on the earthquake signature in the epicentral area and its vicinity, we localize the gravity data in a spherical cap of radius ϑ = 6 • centred at the epicentre of the centroid-momenttensor (CMT) solution Ekström et al. 2012). This is done using the Slepian functions bandlimited to L = 90 and keeping only the first N = 23 Slepian coefficients, with N = (L + 1) 2 (1 − cos ϑ)/2 being the spherical Shannon number (Simons et al. 2006). We thus obtain N = 23 time-series of Slepian coefficients, as well as their covariance matrix by error propagation.
We consider both unfiltered GRACE data, but for the spatial localization, and filtered ones using the anisotropic DDK decorrelation filters (Kusche 2007;Kusche et al. 2009). The DDK filters are applied before the spatial localization and after the removal of the time average of the time-series before the earthquake, from April 2002 to February 2011. These filters have been specifically designed to reduce the peculiar short wavelength noises of the GRACE data (the anisotropic north-south stripes caused by the polar orbit of the twin satellites) through a weighted spatial average. This kind of filtering is herein preferred in order to avoid the excessive damping of the earthquake gravity signal produced by the isotropic Gaussian filtering. Furthermore, the spatial resolution of the DDK filters can be compared to isotropic Gaussian filters of smaller and smaller radii (Kusche et al. 2009), from 240 km for the strongest DDK3 filter to 135 km for the weakest DDK8 filter (see Table 1), where the filters weaken from 240 to 135 km by including shorter wavelengths, or higher harmonics, so increasing the amplitude of the recovered gravity signal.
According to the data treatment just outlined, the observed ith Slepian coefficient at each given time t, say y obs i (t), reads where X m (t) is the Stokes coefficient of SH degree and order m (after the removal of the time average of the time-series before the earthquake) and the coefficients W i m account for the spatial localization, the conversion into gravity disturbance and the DDK filtering Here, GM is the Earth's gravity constant, r is the equatorial radius, G i m are the SH coefficients of th ith Slepian function (Simons et al. 2006) and D m are the DDK filter coefficients (Kusche et al. 2009).
In the case of unfiltered GRACE data, the DDK filter coefficients are replaced by the Kronecker delta, D m = δ .
From eq. (1) and according to error propagation (Tarantola 2005), the covariance between the ith and i th Slepian coefficients, say Y ii (t), reads where C m, m (t) is the covariance between the Stokes coefficients X m (t) and X m (t).

Gravity modelling
We model the co-and post-seismic gravity signatures by means of spherically symmetric self-gravitating compressible viscoelastic (Maxwell) Earth models (Tanaka et al. 2006;Cambiotti et al. 2011b;Cambiotti & Sabadini 2015;Sabadini et al. 2016) based on the Preliminary Reference Earth Model (PREM; Dziewonski & Anderson 1981). We also account for the loading and gravitational effects of the ocean water redistribution including a global ocean layer (Cambiotti et al. 2011a). The modelled Stokes coefficients are then converted into Slepian coefficients according to eq. (1), applying or not the DDK filtering depending on whether they are compared with filtered or unfiltered GRACE data. The fault slip is imposed on the slab interface of the Kamchatka-Kuril-Japan subduction zone (Hayes et al. 2012) and parametrized using bicubic splines defined over patches of about 80 × 40 km 2 (Yabuki & Matsuúra 1992;Cambiotti et al. 2017). Taken collectively, as shown in Fig. 1, they cover the slab interface for a length of about 800 km and down to 120 km depth (width of about 370 km). By definition, the slip can reach the top edge of the selected fault surface, which is the trench, and is set to zero at the internal (left, right and bottom) edges.
The rheological stratification is characterized by an elastic lithosphere (the shallowest layer) and three viscoelastic layers for which we use the Maxwell rheology and we simply assume constant viscosities: the asthenosphere (down to 220 km depth), the upper (from 220 to 670 km depths) and lower (below 670 km depth) mantle. The bulk and shear moduli and the initial density, instead, vary with depth according to the PREM.
As rheological model parameters, we consider the lithospheric thickness, H, and the asthenospheric viscosity, η A . The lower/upper mantle viscosity ratio, instead, is set to 30 (Ricard et al. 1993) and the upper mantle/asthenospheric viscosity ratio, R, is set to 100. The possibility of estimating also the latter rheological model parameter is discussed in the Appendix. We also note that elastic or viscoelastic layers with viscosities higher than 10 20 Pa s (corresponding to a Maxwell time of about 50 yr using a shear modulus for the lithospheric mantle of 66 GPa) can be hardly distinguished from each other using only the 6 yr and 3 months of post-seismic signature covered by GRACE after the March 2011. In this respect, the lithospheric thickness should be regarded as a parameter indicating the top of the asthenosphere where, instead, stress relaxation can take place on short timescales. Stress relaxation within the upper and lower mantle, instead, hardly takes place in the short time here considered.
In order to take into account any geophysical process other than the earthquake and the aliasing error of the S 2 tidal wave (Ray & Luthcke 2006;De Linage et al. 2009), we also fit a static value, a linear trend and annual, semiannual and 161-d periodic functions to each time-series of Slepian coefficients. Different from the physicomathematical model of the earthquake signatures, this TBG model describes the main temporal evolutions that we have to expect into the GRACE data, but omits any description of the spatial patterns.
Within this framework, the modelled Slepian coefficient timeseries of the gravity disturbance reads where y i is the ith Slepian coefficient time-series (with i = 1, ···, N), a i and b i are the Slepian coefficients describing the static gravity field and the linear trends, c ik ane d ik are the Slepian coefficients describing annual, semiannual and 161-d periodic signals (with k = 1, 2, 3 and ω k being the respective angular frequencies). Furthermore, s j is the jth fault slip coefficient (with j = 1, ···, S and S being the number of bicubic splines used for the fault slip parametrization) and K ij is the ith Slepian coefficient of the time dependent gravity disturbance caused by the unitary jth fault slip coefficient. We note that the latter factor, K ij , depends on the rheological model parameters v, as the lithospheric thickness and the asthenospheric viscosity, and can be obtained implementing the spherically symmetric selfgravitating compressible viscoelastic Earth model (Tanaka et al. 2006;Cambiotti et al. 2011b;Cambiotti & Sabadini 2015;Sabadini et al. 2016). Different from the direct and joint method here presented, the common approach considers the GRACE data time-series in the spatial domain, that is, sampling the gravity field over a regular grid with a not too small step (often every 0.5 • or 1 • in longitudes and latitudes, although smaller than the GRACE spatial resolution, especially after spatial filtering; De Linage et al. 2009;Broerse et al. 2015). Then, it fits to each time-series a TBG model, as in eq. (4), and a simple model of the earthquake signature as, for instance, the coseismic jump and a function describing viscoelastic relaxation Here the index i now identifies the geographical point (rather than the Slepian coefficient), t 0 is the earthquake time, H is the Heaviside function, f i and p i are the co-and post-seismic gravity disturbances and τ i is the effective relaxation time at this location. The drawback of the common approach is that the earthquake signature is estimated for each geographic point without exploiting the specific spatial patterns expected on the basis of a viscoelastic earth model (and often omitting the spatial correlation from the data covariance matrix of each monthly GRACE product), meaning that the co-and post-seismic signatures are discriminated only on the basis of their temporal evolutions. Moreover, different from the coseismic signature which can be described as a step-like discontinuity in time, the temporal evolution of the post-seismic signature in eq. (5) approximates the much more complex relaxation spectrum of a viscoelastic Earth model and, so, the later comparison The CMT focal mechanism Ekström et al. 2012) is also shown.
with the physico-mathematical model is often limited to the postseismic gravity disturbance cumulated during the whole time window spanned by the GRACE data after the earthquake, rather than performed for each monthly GRACE product as in eq. (4). In the end, the fact that the characteristic relaxation times τ i are estimated for each geographic point (once again not exploiting any constraints on their spatial pattern) makes specific prior information on them necessary (Broerse et al. 2015).
The alternative approach based on the empirical orthogonal function (EOF) analysis (Chao & Liau 2019) still does not account for the post-seismic signature. Furthermore, it simply decomposes the data time-series (after a preliminary least square estimation) in different spatiotemporal signals and, so, it is not clear whether the EOF recognized as the coseismic signature fully represents it or, instead, some fraction of it leaks into the other spatiotemporal signals. In light of this, although the EOF analysis is an effective approach for assessing the presence of the coseismic signature into GRACE data, its results cannot yet be used in an inverse method aimed at constraining the fault slip, as well as the rheological stratification.
In light of this, the direct and joint method presented here makes it simpler to recognize the co-and post-seismic signatures into the GRACE data time-series by taking advantage of the physical constraints to which they must obey. At the same time, by definition, it links the observations to the model parameters that we are estimating through the choice of a specific physico-mathematical model of the earthquake signatures. This means that the fitting to the GRACE data depends on the realism of the adopted model and, so, the estimate of the model parameters can be biased by modelling errors (the assumption of a flat bathymetry and of a global ocean layer in the framework of spherically symmetric Earth's models can lead to errors of a few tens of percentages for the coseismic geoid anomaly of the 2011 Tohoku earthquake truncated at SH degree 40; Broerse et al. 2014). On the other hand, we note that this disadvantage characterizes also the common approach, when the estimated earthquake signatures from eq. (5) are used for constraining the model parameters because of the inevitable choice of an Earth's model.

Inverse problem
In order to define the posteriori probability density function (PDF), we rely on a previously developed fully Bayesian approach for GPS data inversion (Fukuda & Johnson 2008;Cambiotti et al. 2017). This approach introduces two hyperparameters, α 2 and β 2 , that weigh information from observations and from prior constraints and that shall be jointly estimated with the model parameters. In particular, α 2 accounts for modelling errors or biases in the estimate of the observational errors from the data processing (Yabuki & Matsuúra 1992;Cambiotti et al. 2017).
As it concerns the prior constraints, we only require that the fault slip is smooth to some degree in order to make the inverse problem overdetermined and avoid implausible results. The measure of the smoothness is defined as the average of the second-order derivatives of the slip over the fault surface (Yabuki & Matsuúra 1992) and how much the slip is smooth is determined by the estimate of β 2 . Different from previous geodetic data inversions (Ozawa et al. 2012;Zhou et al. 2018), we do not consider any prior constraint about the slip direction (like positivity constraints). Furthermore, our inverse method differs from the Akaike Bayesian Information Criteria (ABIC, Yabuki & Matsuúra 1992) because we average the posteriori PDF also for the two hyperparameters (Fukuda & Johnson 2008;Cambiotti et al. 2017) rather than fixing them at their most likely values.
In the end, we do not consider any prior information about the rheological parameters and hyperparameters and, so, their prior (or homogeneous) PDF (Tarantola 2005) simply reads where v = (H, log η A , log R) is the array collecting the rheological model parameters. Note that we use the logarithmic scale for η A and R because these parameters can span a few orders of magnitude about which we know very little.   Maximum and minimum gravity disturbances of the co-and post-seismic signatures and of the linear trends, and maximum amplitudes of the annual, semiannual and 161 day periodic signals obtained by inversion of filtered and unfiltered GRACE data. The post-seismic gravity disturbance has been calculated in June 2017, that is, 6 yr and 3 months after the 11 March 2011 Tohoku earthquake. For the sake of comparison, the gravity disturbance of the linear trend has been calculated for the same amount of time. For each filter we also report the equivalent Gaussian radius, from the comparison of the isotropic part of the anisotropic DDK filters with the Gaussian ones (Kusche et al. 2009 Following Yabuki & Matsuúra (1992) and Fukuda & Johnson (2008), we thus define the posteriori PDF for the model and hyperparameters given the GRACE data as follows: with y obs and y being the arrays collecting the Slepian coefficient time-series from GRACE data and the modelled ones according to eq. (4), which we rewrite in the following compact form: Here m is the array collecting the TBG model parameters (i.e. static value, linear trend and annual, semiannual and 161-d periodic signals for each Slepian coefficient time-series) and s is the array collecting the fault slip coefficients (to be multiplied by the bicubic splines in order to obtain the fault slip). Furthermore, Y is the data covariance matrix and L is the matrix defining the smoothing criterion on the fault slip (Yabuki & Matsuúra 1992;Cambiotti et al. 2017). In the end, M and K are the matrices describing the linear relations between the data and the TBG model parameters and the fault slip coefficients, respectively, while P and S are the number of data and of fault slip coefficients. As it is for the factors K ij entering eq. (4), the matrix K also depends on the rheological model parameters and is obtained implementing the spherically symmetric self-gravitating compressible viscoelastic earth model. Strategies for investigating this kind of PDF are discussed in previous works (Yabuki & Matsuúra 1992;Cambiotti et al. 2017).

R E S U LT S
Satellite-derived gravity data can recover only the long wavelength gravity field (up to SH degree and order L = 90 for the RL05 GRACE Level-2 data) and, so, they miss a significant part of the gravity disturbance caused by earthquakes, being characterized also by shorter wavelengths. Furthermore, the spatial filtering of the GRACE data (usually applied to reduce short wavelength noises; Wahr et al. 2006) damps the earthquake signatures, thus limiting further the information from the observations that we can use for constraining the earthquake source and the rheological stratification. In light of this, we first investigate the impact of the spatial filters on our results considering the anisotropic DDK decorrelation filters.
For the sake of comparison and in order to investigate the possibility of not applying any filter, we also consider the original (unfiltered) GRACE data. Fig. 1 shows the fault slip estimated by inversion of filtered and unfiltered GRACE data (see also Fig. S1 for the in or whole range of DDK filters). It clearly describes a thrust earthquake, with the largest slip occurring towards the trench and decreasing by increasing the depth. The total seismic moment and the average focal mechanism of each inversion are listed in Table 1. We note that a weaker filtering yields a smaller seismic moment and maximum slip. In particular, for the strongest (DDK3-5) filters, the fault slip has two maxima, more northern and southern with respect to the centroid-momenttensor (CMT) solution Ekström et al. 2012), and involves almost the whole fault along strike, going to zero at the northern and southern edges as prescribed by the definition of the bicubic splines used for its parametrization. For the weakest (DDK6-8) filters, the fault slip still involves almost the whole fault along strike, although slightly favouring the northern maximum. Only when we consider unfiltered GRACE data, we obtain a fault slip more concentrated along strike and with one maximum closer to the CMT solution.
For the strongest filters, the estimated moment magnitude is in agreement with previous results from inversions of GRACE data (Wang et al. 2012;Cambiotti & Sabadini 2012;Zhou et al. 2018) or from point-like seismic source models as the CMT solutions (Cambiotti & Sabadini 2013;Ekström et al. 2012), about M W = 9.1. For the weakest filters or the unfiltered case, instead, the estimate better agrees with previous results from inversions of teleseismic, strong motion, tsunami and GPS data and based on finite fault models (Lay et al. 2011;Ozawa et al. 2012;Zhou et al. 2018), about M W = 9.0. Nevertheless, the estimated fault slip is more spread over the fault and with a smaller maximum slip than those obtained from other kinds of data, which can be as high as 80 m from seafloor GPS data (Sun et al. 2014;Tomita et al. 2017): this is expected from gravity data inversion that shows the tendency to widen the region where the slip is distributed compared to teleseismic or GPS data due to lack of information at short wavelengths.
Differently from the estimate of the fault slip, which becomes more regular and better localized around the epicentre by weakening or avoiding the spatial filtering, the estimate of the rheological stratification is quite sensitive to whether the filters are applied or not. Fig. 2 shows the marginal PDF for the lithospheric thickness, H, and the asthenospheric viscosity, η A , obtained by inversion of filtered and unfiltered GRACE data (see Fig. S2 for whole range of DDK filters). From the strongest to the weakest filters, the marginal PDFs are similar to each other, except for a slight reduction of H (from 122 to 103 km) and a slight increase of η A (from 2.7 to 3.0 × 10 18 Pa s). In contrast, for the unfiltered case, the marginal PDF becomes wider (thus increasing the uncertainties on the estimated rheological stratification) and shifted towards a thinner lithosphere, H = 65 km, and a higher asthenospheric viscosity, η A = 5.8 × 10 18 Pa s. This drastic change on the rheological parameters estimated by inversion of filtered and unfiltered GRACE data can be understood by considering the respective co-and post-seismic gravity disturbances and linear trends of the TBG model shown in Fig. 3 (see Fig. S3 for the amplitudes of the periodic signals of the TBG model and Table 2 for whole range of DDK filters). In this perspective we remind that these signatures have been estimated by fitting a physico-mathemetical model of the earthquake-induced deformations and of a simplified TBG model to GRACE data (see Section 2). In particular, the co-and post-seismic signatures have not been discriminated only on the basis of their temporal evolutions as commonly done by a preliminary data time-series analysis. Rather, they have been modelled varying both the fault slip and the rheological stratification and, so, they have been discerned by taking into account the physical relation between their spatial patterns and temporal evolutions. In other words, the present framework discards any couple of co-and post-seismic signatures that cannot be obtained by the same physico-mathematical model and translates the problem of discerning them into the problem of finding a fault slip and a rheological stratification that can simultaneously explain the GRACE data.
We first note that the coseismic signatures shown in Fig. 3 are mainly characterized by a bipolar pattern, with the negative pole greater, in amplitude, than the positive one (De Linage et al. 2009;Cambiotti et al. 2011a), especially for the strongest filtering. The post-seismic signatures, instead, are mainly characterized by a positive pole (Han et al. 2014;Broerse et al. 2015), or polar pattern. Moreover, we can also estimate the amount of damping of the coand post-seismic signatures introduced by the spatial filtering: about the 50 and 30 per cent for the weakest DDK8 filter and up to 80 and 60 per cent for the strongest DDK3 filter. As it concerns the TBG model, the gravity disturbances range from a few to several μGal from the strongest to the weakest filters. In contrast, for the unfiltered case, they are one order of magnitude greater, about several tens of μGal, and their spatial patterns are characterized by north-south stripes which closely resemble those affecting GRACE data. This is due to the fact that the TBG model neglects any physical relation between spatial patterns and temporal evolutions of its components and this large degree of freedom allows it to be fitted to the observational errors as well, especially at short wavelengths where they exceed the physical signatures. In light of this, we can ascribe the drastic change of the marginal PDF for the rheological parameters estimated from the unfiltered GRACE data to the wrong estimate of the TBG model and, in particular, of its linear trends. This issue necessarily biases the part of the signal that should be attributed to the post-seismic signature, thus altering the information on the rheological stratification that we obtain from the observations.
On the other hand, despite the difficulties in constraining the TBG model from unfiltered GRACE data, it is noteworthy that the estimate of the fault slip is not much affected by the trade-off between the TBG model and the rheological stratification. In particular, the fault slip estimated from the unfiltered GRACE data is still physically sound (Fig. 1c) and it seems to benefit from having kept the whole earthquake signature, without filtering it. Indeed, having kept information from observations up to the maximum SH degree of the RL05 GRACE Level-2 data (L = 90), the slip distribution is characterized by only one and more pronounced maximum close to the trench and, different from the inversion of filtered GRACE data, it is slightly more localized even along strike.
After this sensitivity analysis, we conclude that the most reliable strategy for using satellite-derived gravity data for seismological purpose consists in applying a spatial filter. In this way, we compensate for the large degree of freedom of the TBG model and we avoid that its estimate biases that of the other model parameters, especially the rheological ones. On the other hand, the filter must be weak in order to retain as much information from observations on the fault slip as possible, which leads us to opt for the weakest DDK8 filter. Furthermore, always in the perspective of limiting the influence of the TBG model on the solution of the inverse problem and of avoiding regions mainly dominated by the TBG rather than by the earthquake signatures, the circular cap in which we spatially localize the GRACE data (see Section 2.1) should not be taken too large. In light of this, we have chosen a circular cap of radius ϑ = 6 • where we expect most part of the earthquake signature, thus excluding even more distant areas dominated mainly by the TBG and the GRACE observational errors.

D I S C U S S I O N
For the inversion of the DDK8 filtered GRACE data, Fig. 4 compares the time-series of the modelled and observed gravity disturbances, at two representative geographical locations. The TBG model has been removed from both time-series and compared, instead, with the residuals and the observational uncertainties, which have been scaled by the factor α = 2.4 in order to account for the estimated modelling errors. Here α 2 = 5.9 is the hyperparameter that weights information from observations, jointly estimated from GRACE data inversion (see Section 2.3).
Although the earthquake signature is above the scaled observational uncertainties, we note that it is difficult to discriminate between the co-and post-seismic signatures only on the basis of their temporal evolutions, especially when the relaxation time controlling the latter has to be estimated at the same time. This difficulty, however, is solved by our approach because we discriminate between the two signatures taking into account also their spatial patterns, prescribed by the physico-mathematical model that we are using. In contrast, the TBG model is comparable with the scaled observational uncertainties, indicating that it must be included in the modelling, although it can be poorly constrained, at least for the geographical area here considered.
Looking at these time-series and as anticipated in the Introduction, it would have been difficult to discriminate also the time dependent after slip from the viscoelastic response to the main shock, as well as to infer a more detailed rheological stratification, only using the monthly GRACE data. For this reason, we have decided to limit our attention to the coseismic slip and to the steady-state (Maxwell) rheology and postpone the refinement of the present modelling to transient (Burgers) rheologies and after-slip to next studies where additional data sets could also be considered, like inland and seafloor GPS observations (Sun et al. 2014;Broerse et al. 2015). At the same time, however, we note that seafloor GPS data indicate that the maximum after-slip in the first 9 months after the 2011 Tohoku earthquake does not exceed 1 m (Iinuma et al. 2016) and that there is a prevalence of viscoelastic relaxation, rather than  Fig. 3b), that is at (144.5 • E, 36.9 • N) and (139.9 • E, 38.6 • N), respectively. The TBG model has been removed from both time-series, which thus represent only the co-and post-seismic signatures. (c, d) TBG model and residuals (black and grey lines). The lightgrey area represents the observational uncertainties scaled by the factor α = 2.4, with α 2 = 5.9 being the hyperparameter estimated by GRACE data inversion.
after-slip (Sun et al. 2014), at least in the primary rupture area (Tomita et al. 2017). In this respect, our estimate of the Maxwell rheology should not be biased by the omission of after-slip in the present modelling. We expect, instead, that transient rheologies can increase the trade-off between coseismic slip and rheological parameters, but we do not further investigate this issue because the increase of the number of rheological parameters would make it difficult to retrieve more solid and physically sound results from the only GRACE data here considered. Let us now discuss in more detail the estimate of the model parameters. From Fig. 1(b), we first note that the fault slip decreases by increasing the depth and has its maximum of about 15 m close to the trench. The small value of the maximum fault slip is compensated by its wide extension along strike, that is a consequence of a poor resolving power of the GRACE data due to its spatial resolution. The total seismic moment, indeed, is still consistent with a megathrust of moment magnitude 9. There is a small right-lateral slip (up to 5 m) towards the northwest corner of the fault surface which we definitely think to be an artifact. In general, the fault slip goes to zero between 40 and 60 km depth, which is consistent with the typical locking depth at subduction zones (Trubienko et al. 2013;Herman et al. 2018).
In light of this, we can conclude that the present approach makes it possible to constrain the fault slip (at least its distribution along dip) by exploiting information from observations about the spatial and temporal patterns of both the co-and post-seismic gravity signatures. Previous efforts, aimed at characterizing the 2011 Tohoku earthquake using only the coseismic signature, have been successful when supplemented by GPS data and additional prior information (Fuchs et al. 2016;Zhou et al. 2018), like positivity constraints, or resorting to simplifying assumptions, like point-like seismic sources (Cambiotti & Sabadini 2013) and homogeneous finite faults (Cambiotti & Sabadini 2012;Wang et al. 2012). In addition to the optimization on the information from the GRACE data (i.e. the exploitation of the post-seismic signature for constraining the coseismic fault slip beyond the rheological stratification), another advantage of the present framework is that it does not depend on the possibility of recognizing the earthquake signatures on the basis of their temporal evolution only. In light of this, it limits the impact of any leakage of the post-seismic signature into the coseismic one that a preliminary time-series analysis could introduce. Nevertheless, we note that transient (Burgers) post-seismic signatures in the first few months after the earthquake, if not included in the modelling as in the case of the steady-state (Maxwell) one adopted here, can still be attributed to the coseismic signature.
From the marginal PDF for the lithospheric thickness and the asthenospheric viscosity shown in Fig. 2(b), we note that the two rheological model parameters are anticorrelated. This is physically sound because a thick lithosphere implies a small post-seismic signature and, so, the asthenospheric viscosity has to decrease in order to match the amplitude of the observed post-seismic signature in a given period of time. The mean and 95 per cent confidence intervals for the lithospheric thickness and the asthenospheric viscosity are H = 103.2 +15.6 −20.4 km and η A = 3.0 +2.2 −1.1 × 10 18 Pa s, respectively. The estimate of the asthenospheric viscosity is consistent with the range of values usually assumed or inferred in post-and interseismic studies (Han et al. 2014;Diao et al. 2014;Mavrommatis et al. 2014), while that of the lithospheric thickness is larger, although consistent with subduction zone environments (Trubienko et al. 2013;Herman et al. 2018). In this respect, we can argue that GRACE data do not resolve small spatial scale features of subduction zones, like the low viscosity mantle wedge beneath the overriding plate, and we remark, at the same time, that most studies fix a priori the lithospheric thickness, with a few exceptions (Diao et al. 2014). In the end, we note that the estimated uncertainties are too optimistic, especially when compared with the wide range of values appearing in the literature. The small uncertainties are the result of an under parametrization which certainly does not cover all the rheological stratification scenarios.
The rheological stratification, despite its simplistic description adopted here, is able to account for a wide range of different spatial patterns and temporal evolutions of the post-seismic signature. This allows us to quantify the trade-off between the fault slip and the rheological stratification. In order to investigate this issue further, we focus on the seismic moment as the main parameter describing the fault slip and we show in Fig. 5 how its estimate changes varying the rheological stratification, that is, solving the inverse problem given different lithospheric thicknesses and asthenospheric viscosities. We note that the seismic moment increases by a factor of 2-3 by increasing the asthenospheric viscosity by one order of magnitude, from 10 18 to 10 19 Pa s. The dependence on the lithospheric thickness is not so straightforward. Nonetheless, it is also responsible for variations of about 50 per cent with respect to the mean seismic moment of 3.69 × 10 22 N m. This sensitivity of the seismic moment obviously reflects that of the fault slip on the rheological stratification and corroborates the effectiveness of the present approach for modelling satellite-derived gravity data. Indeed, fixing (rather than jointly estimating) the rheology can have large

C O N C L U S I O N S
We have presented a new method for exploiting satellite-derived gravity data, like those from the GRACE space mission, for seismological purposes and applied it to the case of the 2011 Tohoku earthquake in order to jointly constrain the fault slip and the rheological stratification. The high observational error, which can be mitigated only after spatial filtering at the cost of reducing the recovered gravity signals, and the monthly temporal resolution of GRACE data makes it difficult to discriminate the co-and postseismic signatures only on the basis of their temporal patterns. These main difficulties are resolved by directly fitting to the GRACE data the physico-mathematical modelling of the earthquake signatures. In this way, indeed, we exploit physical constraints on the spatiotemporal patterns of the earthquake signatures that we are looking for into the data time-series.
This strategy simplifies also the issue of discerning the earthquake signatures from those caused by other geophysical processes, such as hydrology and ocean circulation, and thus allows to obtain physically sound results even implementing a weak DDK8 spatial filtering, comparable to an isotropic Gaussian filter of a radius of only 135 km. In principle, improving the modelling of the other geophysical processes affecting the TBG in the surrounding of the Kamchatka-Kuril-Japan subduction zone, it would be even possible to deal with unfiltered GRACE data. Nevertheless, at present, the TBG model does not take into account any constraint on the spatial patterns of its components and, so, when fitted to the GRACE data, it is biased by the high observational errors and this biases also the estimate of the rheological stratification on which the modelling of the post-seismic signature depends.
Despite the advantages of the present method and different from the inversion of seismic wave, tsunami and GPS data, the temporal and spatial resolutions of GRACE data and their high observational errors do not allow to resolve the fault slip distribution, especially along strike, at the short spatial scales of the actual rupture of the 2011 Tohoku earthquake. This make important to have at disposal improved gravity data from NGGM, both in their accuracy and spatial and temporal resolutions (Silvestrin et al. 2012;Pail et al. 2015;, in order to fully exploit the potential of the gravitational effects from mass re-adjustment for seismological purposes. The new method aims at jointly estimating the fault slip and the rheological stratification from inversion of satellite-derived gravity data. This task is mandatory for a full understanding of the earthquake cycle, where viscoelastic relaxation plays a crucial role in characterizing the post-seismic phase, as well as the interseismic and pre-seismic ones Panet et al. 2018). This novel perspective, once extended to all the phases of the earthquake cycle and based on improved gravity data from NGGM, will supplement ongoing researches mainly based on crust displacements observed by GPS (Trubienko et al. 2013;Mavrommatis et al. 2014;Tomita et al. 2017).

A C K N O W L E D G E M E N T S
This work is supported by the ESA (European Space Agency) grant 'Gravitational Seismology', EO Science For Society, ESA ITT AO/1-9101/17/I-NB. Postseismic GRACE and GPS observations indicate a rheology contrast above and below the Sumatra slab, J. geophys. Res., 120, 5343-5361.

S U P P O RT I N G I N F O R M AT I O N
Supplementary data are available at GJ I online. The top and bottom edges of the fault are at 8.5 (the average depth of the trench) and 120 km depth, and the grey lines are given every 20 km depth. The CMT focal mechanism Ekström et al. 2012) is also shown. Figure S2. Marginal PDF of the logarithm of the asthospheric viscosity, log η A (in Pa s), and the lithospheric thickness, H, obtained by inversion of (a) DDK4, (b) DDK5, (c) DDK6 and (d) DDK7 filtered GRACE data. The PDFs have been scaled so that their maximum value is 1. Figure S3. Amplitudes of the modelled gravity disturbances of the (a,b,c) annual, (d,e,f) semiannual and (g,h,i) 161-d periodic signals of the TBG model obtained by inversion of (a,d,g) DDK3 and (b,e,h) DDK8 filtered and (c,f,i) unfiltered GRACE data. The dash-dotted circle indicates the circular cap of ϑ = 6 centred at the CMT epicentre Ekström et al. 2012) in which we spatially localize the GRACE data using 23 Slepian functions bandlimited to spherical harmonic degree L = 90 (Simons et al. 2006). The straight dash-dotted line indicates the strike of 202 of the CMT solution and the solid contour represents the fault surface. Figure S4. Marginal PDF for (a) the lithospheric thickness, H, and the upper mantle/asthenospheric viscosity ratio, R, and for (b) the asthenospheric viscosity, η A , and the upper mantle/asthenospheric viscosity ratio, R, obtained by inversion of DDK8 filtered GRACE data. The PDFs have been scaled so that their maximum value is 1.
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper.

A P P E N D I X : U P P E R M A N T L E V I S C O S I T Y
For the sake of completeness, we have verified whether our approach can or cannot constrain the upper mantle viscosity, which corresponds to the upper mantle/asthenospheric viscosity ratio, R, for the parametrization adopted here. Fig. S4 shows the marginal PDFs for η A and R and for H and R obtained by inversion of DDK8 filtered GRACE data. We first note that for R 30, both PDFs depend slightly on R. This is due to the fact that stress relaxation into the mantle becomes slower by increasing R and, so, negligible on the 6 yr and 3 months after the earthquake monitored by GRACE.
In this respect, the value of R = 100 that we used in the main text is representative of a wide range of upper mantle viscosities, which also includes values generally estimated by studies about Glacial Isostatic Adjustment (GIA; Tosi et al. 2005;Caron et al. 2018), from 10 20 to 10 21 Pa s. Nevertheless the largest probabilities occur for smaller R, close to the limit where the asthenosphere and the upper mantle have the same viscosity and for thinner lithospheres, up to H = 43 km. In particular, the maximum likelihood is attained for R = 1.5 and η A = 6 × 10 18 Pa s, implying an upper mantle viscosity of just 10 19 Pa s. We reject this scenario for consistency with GIA studies and we restrict the validity of our results to R 30, meaning that, as expected, we have to use some prior information about the deep rheological stratification in (co-and) post-seismic studies.