-
PDF
- Split View
-
Views
-
Cite
Cite
Atsushi Nakao, Tatsu Kuwatani, Shin-ichi Ito, Hiromichi Nagao, Adjoint-based data assimilation for reconstruction of thermal convection in a highly viscous fluid from surface velocity and temperature snapshots, Geophysical Journal International, Volume 236, Issue 1, January 2024, Pages 379–394, https://doi.org/10.1093/gji/ggad417
- Share Icon Share
SUMMARY
It is a general problem in geoscience to estimate the time-series of velocity and temperature fields for a fluid based on limited observations, such as the flow velocity at the fluid surface and/or a temperature snapshot after flow. In this study, an adjoint-based data assimilation method (also known as 4-D variational data assimilation) was used to reconstruct the thermal convection in a highly viscous fluid (e.g. Earth’s mantle) to investigate which observations constrain the thermal convection and how accurately the convection can be reconstructed for different wavelengths. The data assimilated to the adjoint-based model were generated synthetically from forward models with convecting cells of different length-scales. Based on the surface velocity and temperature snapshot, our simulations successfully reconstructed thermal convection over 50 Myr in the case that the wavelength of the convective cells is sufficiently large. We obtained two main results from this parametric study. (1) When we only considered instantaneous thermal structure fitting in the cost function, the convection reconstruction tended to fail. However, there are some cases where the laminar thermal convection can be reconstructed by assimilating only the velocity along the fluid surface. (2) There is a limit to the reconstruction of thermal convection in the case that the convecting cells are small (∼1000 km for a 50 Myr reconstruction). We propose that (1) is related to the balance of forces due to the thermal buoyancy and viscous stress around the thermal anomalies and (2) is related to how information is preserved (i.e. how the previous thermal structure is maintained in the observable state throughout the convection process). The results enable the use of geological records to estimate time-series of velocity and temperature in Earth’s deep interior, even though the records may only contain information from shallow parts of Earth.
1 INTRODUCTION
It is a general problem in geoscience to constrain flow processes based on limited observations along a fluid surface and/or at a point in time after fluid flow. For example, in situ observations are often difficult during high-temperature or high-pressure experiments that mimic conditions in Earth’s interior, and it is necessary to infer the dynamic processes based on quenched samples (Luhmann et al. 2017; Oyanagi et al. 2021, 2022). Other examples include those related to the flow processes of lavas, landslides, and glaciers, for which observations are limited due to the spatial scale and randomness of these phenomena.
A time-series of mantle convection can be used to examine various geophysical and geological phenomena, including earthquakes and volcanism, but observations of mantle convection are limited and difficult to obtain. Geochronological data based on isotopic and palaeomagnetic methods can provide information on the age and velocity of tectonic plates (Seton et al. 2012), but provide little direct information on the flow in Earth’s deep interior. In contrast, geophysical studies, such as those that use seismic and electromagnetic methods, provide information on the present-day state of Earth’s interior (Li et al. 2008; Tada et al. 2016; Morishige & Kuwatani 2020; Iwamori et al. 2021), but do not provide information on the past state. It is therefore necessary to combine these limited observations with fluid dynamics theory to constrain the dynamics or state of the past or future mantle.
4-D variational (4D-Var) data assimilation (also known as the adjoint method) is a powerful tool for solving unknown parameters, such as the past temperature field of a target fluid. The 4D-Var method incorporates data observed at all time steps to constrain the unknown parameters at the initial state. Therefore, the method is suitable for estimating past mantle states, as observations are abundant for the present-day but sparse back in geological time. Previous studies have developed the 4D-Var method to reconstruct mantle convection based on the present-day mantle temperature estimated from seismic tomography and observed plate velocities (Bunge et al. 2003; Ismail-Zadeh et al. 2004; Liu & Gurnis 2008; Colli et al. 2020). These methods can provide insights into global and regional palaeo-mantle dynamics (Zhou & Liu 2017; Ghelichkhan et al. 2021), and they have been used to estimate unknown physical parameters such as the rheological parameters of mantle minerals (Worthen et al. 2014; Ratnaswamy et al. 2015; Li et al. 2017).
However, almost all previous 4D-Var simulations used to reconstruct palaeo-mantle dynamics have assumed forced mantle convection, although the driving force is generally considered to be thermal buoyancy within the mantle. In particular, tectonic plates are considered to be dragged by negative buoyancy (Forthys & Uyeda 1975; King 2007). In detail, previous 4D-Var simulations have not used the velocity of tectonic plates estimated from palaeomagnetic data to evaluate the misfit between the model and observations, but instead incorporated the surface velocity as an upper boundary condition of the forward conservation equations of mass and momentum. To our knowledge, Li et al. (2017) is the sole study that simulated time-dependent mantle thermal convection using a free-slip condition on the model surface. This was achieved by introducing a hyperparameter that controls the balance between the temperature and velocity fields. Based on forced convection, previous studies have investigated how far back in geological time the 4D-Var methods can reconstruct mantle flow and how the accuracy of the plate velocity data affects the reconstruction (Vynnytska & Bunge 2015; Colli et al. 2020). Consequently, the limitations of the inverse problem under conditions of free thermal convection remain unknown.
The aims of this study were to clarify [Topic 1] what data can be used to effectively reconstruct free thermal convection of a highly viscous fluid, and [Topic 2] what factors determine the limits of the reconstruction for the flow process. To investigate these topics, we constructed a 4D-Var model for free convection without external forces, where both the surface velocity and temperature are considered with a cost function. We then undertook a parametric study by varying the balance of assimilation between the surface velocity and temperature for Topic 1, and by varying the length-scales of the thermal convection cells in the target fluid for Topic 2.
The remainder of this paper is organized as follows. Section 2 describes the theoretical background and algorithms of the 4D-Var model. Section 3 describes the reference models used to generate the synthetic data used in the 4D-Var simulations. Section 4 describes the results of the 4D-Var simulations that assimilated the synthetic data generated in Section 3. Section 5 discusses mainly Topics 1 and 2, and future research perspectives. Finally, the conclusions are presented in Section 6.
2 METHODS
2.1 Forward equations
We considered a highly viscous, incompressible fluid in which laminar flow occurs. The fluid convects in a 2-D rectangular domain with a horizontal domain x ∈ [x0, x1] and a vertical domain z ∈ [z0, z1], where x = (x, z)⊤ denotes the Cartesian coordinates (Fig. 1). The fluid motion develops as a function of time t ∈ [t0, t1]. The driving force of the fluid is thermal buoyancy due to cooling along the upper boundary and heating along the lower boundary and, as such, the fluid flows spontaneously. The fluid contains chemical components, but the chemical composition does not affect the fluid flow. As such, we considered three governing equations: equations of motion, energy conservation and chemical transport. The forward governing equations are written in a non-dimensional form.
![Schematic illustration of the (a) forward and (b) adjoint parts of the 4D-Var simulation. Green characters are related to the equations of motion, red characters to the equations of energy conservation and blue characters to the equations of material transport. ψ = stream function; u = (u, w)⊤ = (∂ψ/∂z, −∂ψ/∂x)⊤ = velocity; T = temperature; Ci = concentration of chemical component i; φ = adjoint stream function; τ = adjoint temperature; Γi = adjoint concentration of chemical component i; x ∈ [x0, x1] = horizontal distance; z ∈ [z0, z1] = vertical distance and t ∈ [t0, t1] = time.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gji/236/1/10.1093_gji_ggad417/1/m_ggad417fig1.jpeg?Expires=1748157057&Signature=ejkKcdHoGQfkZ2BjTmkJDS~MzZGaDs3zcCk4h~Nz1YlDGNl4y1-kggcKLBJRt5DkU~cySUG~4jNmIleT1Aro5nBhnFIlgEpxR36MEX4UdJ-PM6mCwHswtPuNf7Drx0w7YOv5QYl2sbQgH9zTk108qKw5T4phzDdgmLidaip-fUA5hDhtltPTyGGyZO6AjQc0TaHgIHnttxEaaTxZsiFt~nEnb~3sR8Wf3PgtjxXJONYOIeShdDC-KeWC184KTIRMJ8VJW2WY1oN1E14pAo07f7cOFoKTn1DZR0uHbGq99eBiC38ds72-anu7ZSzOWh7RUuyBiYbbsY9VGfm4G5nlXA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Schematic illustration of the (a) forward and (b) adjoint parts of the 4D-Var simulation. Green characters are related to the equations of motion, red characters to the equations of energy conservation and blue characters to the equations of material transport. ψ = stream function; u = (u, w)⊤ = (∂ψ/∂z, −∂ψ/∂x)⊤ = velocity; T = temperature; Ci = concentration of chemical component i; φ = adjoint stream function; τ = adjoint temperature; Γi = adjoint concentration of chemical component i; x ∈ [x0, x1] = horizontal distance; z ∈ [z0, z1] = vertical distance and t ∈ [t0, t1] = time.
2.1.1 Equation of motion
The equation of motion of a highly viscous fluid flowing due to thermal buoyancy, which combines the conservation equations of mass and momentum, can be written as follows (Turcotte & Schubert 2002):
where η is the constant effective viscosity of the fluid, ψ(x, t) is the stream function, Ra is the thermal Rayleigh number (i.e. a non-dimensional number representing the intensity of thermal convection) and T(x, t) is the fluid temperature. The stream function ψ is defined as follows:
where u(x, t) is the velocity field of the fluid. The nabla, Laplacian and biharmonic operators are defined as follows:
and
respectively. The thermal Rayleigh number Ra is defined as follows:
where ρ0 is the reference fluid density, α is the thermal expansivity, ΔT(T1 − T0 > 0) is the temperature difference between the lower and upper boundaries of the fluid, g is acceleration due to gravity, h is the fluid thickness, η0 is the reference viscosity and κ is the thermal diffusivity. The parameters that make up Ra are constant. Eq. (1) is valid for a 2-D domain, the Boussinesq approximation, and a highly viscous fluid with an infinite Prandtl number and constant viscosity.
A free slip condition was imposed on all four boundaries of the forward equation of motion (eq. 1):
These conditions can be rewritten using the stream function as follows:
when the reference of the potential ψ is set along the boundaries.
2.1.2 Energy conservation
We considered heat transport due to advection and conduction such that the energy conservation equation can be written as
where k is the thermal conductivity.
The fluid is cooled at T0 along the upper boundary, heated at T1 along the lower boundary, and insulated along the side boundaries, such that the boundary conditions of eq. (11) can be written as follows:
2.1.3 Chemical transport
Advection of chemical components in the fluid can be written as follows:
where Ci(x, t) denotes the concentration of component i. The components are assumed to have no effect on the fluid motion and passively follow the thermal convection (i.e. the chemical components are tracer markers). The tracers are useful in evaluating the accuracy of the reconstructed velocity field.
2.2 Cost function
We now address the problem of how to reconstruct fluid convection in the forward model on the basis of partially observed fluid motion, temperature and composition. To do this, we define a cost function that quantifies the misfit between the observed data and corresponding modelled variable values as follows:
where
and wu, wT and |$w_{C_i}$| are the weights of the data assimilation, and uobs(x, t), Tobs(x, t) and |$C^{\rm obs}_i({\bf x}, t)$| are the non-dimensional observed flow velocity, temperature, and concentration of chemical component i used for the data assimilation, respectively.
To reproduce the velocity and temperature in the forward model that are consistent with the observations, we minimized J subject to the forward governing equations (i.e. eqs 1, 11 and 15). In this case, the following Lagrangian function can be defined using the Lagrangian multipliers φ, τ and Γi:
Hereafter, we refer to φ(x, t) as the adjoint stream function, τ(x, t) as the adjoint temperature and Γi(x, t) as the adjoint concentration of the ith chemical component.
2.3 Adjoint equations
Using eq. (18) and a coefficient comparison, we derived the following non-dimensional adjoint equations.
• Adjoint equation of motion
where δ( · ) is the delta function.
• Adjoint equation of energy conservation
• Adjoint equation of compositional transport
• Gradient of the cost function with respect to the forward variable values at the initial condition
• Values for the adjoint variables at the initial condition (t = t1) of the backward calculation
• Adjoint boundary condition for eq. (19)
• Adjoint boundary condition for eq. (20)
We can derive similar adjoint equations related to motion and energy from those of previous models (Bunge et al. 2003; Price & Davies 2018) when using (1) 2-D coordinates, (2) the adjoint stream function φ instead of the adjoint velocity Φ = (∂φ/∂z, −∂φ/∂x)⊤ in the previous models (Bunge et al. 2003; Price & Davies 2018), and (3) removing the pressure-gradient term in the previous models in the same way as for the forward equations (Turcotte & Schubert 2002).
The unique features of the adjoint equations in this study are as follows: (1) the use of forward and adjoint stream functions ψ and φ; (2) the force term |$\frac{\partial }{\partial z}w_u (u-u^{\rm obs}) \delta (z-z_0)$| in eq. (19), which is associated with the free convection setting and (3) variables Ci and Γi, which are associated with reconstruction of the chemical component i.
2.4 Algorithm for data assimilation
Eqs (22) and (23) show that if τ(x, t0) and Γi(x, t0) are known, we can correct the initial conditions T(x, t0) and Ci(x, t0) of the forward model such that J becomes smaller. In contrast, eqs (24) and (25) show that τ(x, t1) and Γi(x, t1) are trivially given. As such, we derived the adjoint variables at t0 by solving the adjoint eqs (19), (20) and (21) backwards in time starting from t1 to update the initial conditions of the forward model.
We constructed a data assimilation algorithm as shown in Fig. 2. [Processs 1] Set the random initial conditions T(x, t0) and Ci(x, t0). [Process 2] Solve the forward eqs (1), (11) and (15) until T(x, t1) and Ci(x, t1) are obtained. [Process 3] Start solving the adjoint eqs (19), (20) and (21 backwards in time starting from t1 to t0. [Process 4] Obtain τ(x, t0) and Γi(x, t0), and update T(x, t0) and Ci(x, t0). [Process 5] Solve the forward equations again from the updated initial conditions. These steps are repeated until J becomes sufficiently small.
![Flowchart of the 4D-Var simulation. ψ = stream function; u = horizontal velocity; T = temperature; Ci = concentration of chemical component i; φ = adjoint stream function; τ = adjoint temperature; Γi = adjoint concentration of chemical component i and t ∈ [t0, t1] = time. Details are provided in Section 2.4.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gji/236/1/10.1093_gji_ggad417/1/m_ggad417fig2.jpeg?Expires=1748157057&Signature=dwhiM5c~IPaQg2TYkugskZwzRtwF5OPOD8LBpS2gRZ30N2kLWbvv-q6n~CPtLN~bQtsqkxy2wuUi7oKCEB8u5novVtXW8egUj8~W~P4w2GrYC~~U5oFoxkSqINWJUlrvyZbMhxv7lhd-1tSrm-1GJzb-X2QPkjkYeCCW9EC~OlEGFTYy8jgvtA7WTONXzK9Q-0tASBUwO34E7sItHlqU9xSVKbTyr2F91N6PlW6wCZuBE0EMZ-IASQKr01iofP-292eIQ5ucYuGlF7kUfZ-US-G65uQuy5L7RZopoq5h4V5KS02gdblYjc-gFHQZZL6PKfrcO-gjOtPRoTt2TC~4xw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Flowchart of the 4D-Var simulation. ψ = stream function; u = horizontal velocity; T = temperature; Ci = concentration of chemical component i; φ = adjoint stream function; τ = adjoint temperature; Γi = adjoint concentration of chemical component i and t ∈ [t0, t1] = time. Details are provided in Section 2.4.
2.5 Parameter values and dimensionalization
Although we have described the governing equations for the general fluid in Sections 2.1 and 2.3, we now consider the fluid to be Earth’s mantle by substituting the parameter values listed in Table 1, which yield Ra = 2.34 × 106. Using these parameters, the variables in the governing equations are converted to dimensional values as shown in Table 2. Furthermore, as shown in Fig. 1, we considered a geological spatiotemporal scale such that x0 = 0 and x1 = 3 (0–6000 km horizontally in the dimensional form), z0 = 0 and z1 = 1 (0 to 2000 km vertically), t0 = 0, and t1h2/κ = 50 Myr. The temperatures of the upper and lower boundaries were set to T0ΔT = 273 K (= 0 °C) and T1ΔT = 3273 K (3000 °C), respectively. We used a constant viscosity of η = 1 (1022 Pa s in the dimensional form) and a constant thermal conductivity of k = 1 (4.68 W m−1 K−1 in the dimensional form).
Physical parameters used to calculate the thermal Rayleigh number and/or to dimensionalize the simulation results.
Symbol . | Explanation . | Value . | Unit . |
---|---|---|---|
C0 | Reference concentration of chemical components | 1 | kg kg−1 |
cp | Isobaric specific heat | 1.2 × 103 | J kg−1 K−1 |
g | Gravitational acceleration | 10 | m s−2 |
h | Thickness of the convective layer | 2 × 106 | m |
ΔT | Temperature difference between upper and lower boundaries | 3 × 103 | K |
α | Thermal expansivity | 2.5 × 10−5 | K−1 |
η0 | Reference viscosity | 1022 | Pa s |
κ | Thermal diffusivity | 10−6 | m2 s−1 |
ρ0 | Reference density | 3.9 × 103 | kg m−3 |
Symbol . | Explanation . | Value . | Unit . |
---|---|---|---|
C0 | Reference concentration of chemical components | 1 | kg kg−1 |
cp | Isobaric specific heat | 1.2 × 103 | J kg−1 K−1 |
g | Gravitational acceleration | 10 | m s−2 |
h | Thickness of the convective layer | 2 × 106 | m |
ΔT | Temperature difference between upper and lower boundaries | 3 × 103 | K |
α | Thermal expansivity | 2.5 × 10−5 | K−1 |
η0 | Reference viscosity | 1022 | Pa s |
κ | Thermal diffusivity | 10−6 | m2 s−1 |
ρ0 | Reference density | 3.9 × 103 | kg m−3 |
Physical parameters used to calculate the thermal Rayleigh number and/or to dimensionalize the simulation results.
Symbol . | Explanation . | Value . | Unit . |
---|---|---|---|
C0 | Reference concentration of chemical components | 1 | kg kg−1 |
cp | Isobaric specific heat | 1.2 × 103 | J kg−1 K−1 |
g | Gravitational acceleration | 10 | m s−2 |
h | Thickness of the convective layer | 2 × 106 | m |
ΔT | Temperature difference between upper and lower boundaries | 3 × 103 | K |
α | Thermal expansivity | 2.5 × 10−5 | K−1 |
η0 | Reference viscosity | 1022 | Pa s |
κ | Thermal diffusivity | 10−6 | m2 s−1 |
ρ0 | Reference density | 3.9 × 103 | kg m−3 |
Symbol . | Explanation . | Value . | Unit . |
---|---|---|---|
C0 | Reference concentration of chemical components | 1 | kg kg−1 |
cp | Isobaric specific heat | 1.2 × 103 | J kg−1 K−1 |
g | Gravitational acceleration | 10 | m s−2 |
h | Thickness of the convective layer | 2 × 106 | m |
ΔT | Temperature difference between upper and lower boundaries | 3 × 103 | K |
α | Thermal expansivity | 2.5 × 10−5 | K−1 |
η0 | Reference viscosity | 1022 | Pa s |
κ | Thermal diffusivity | 10−6 | m2 s−1 |
ρ0 | Reference density | 3.9 × 103 | kg m−3 |
Non-dimensional physical variables used in the governing equations and the dimensionalization. Physical parameters used for the dimensionalization are listed in Table 1.
Symbol . | Explanation . | Dimensional value . | Unit of dimensional value . |
---|---|---|---|
Ci | Concentration of ith chemical component | Ci × C0 | kg kg−1 |
k | Thermal conductivity | k × ρ0cpκ | W m−1 K−1 |
T | Temperature | T × ΔT | K |
u = (u, w)⊤ | Velocity field | u × κ/h | m s−1 |
Γi | Adjoint concentration of component i | – | – |
η | Viscosity | η × η0 | Pa s |
τ | Adjoint temperature | – | – |
φ | Adjoint stream function | – | – |
ψ | Stream function | ψ × κ | m2 s−1 |
x = (x, z)⊤ | Cartesian coordinates | x × h | m |
t | Time | t × h2/κ | s |
Symbol . | Explanation . | Dimensional value . | Unit of dimensional value . |
---|---|---|---|
Ci | Concentration of ith chemical component | Ci × C0 | kg kg−1 |
k | Thermal conductivity | k × ρ0cpκ | W m−1 K−1 |
T | Temperature | T × ΔT | K |
u = (u, w)⊤ | Velocity field | u × κ/h | m s−1 |
Γi | Adjoint concentration of component i | – | – |
η | Viscosity | η × η0 | Pa s |
τ | Adjoint temperature | – | – |
φ | Adjoint stream function | – | – |
ψ | Stream function | ψ × κ | m2 s−1 |
x = (x, z)⊤ | Cartesian coordinates | x × h | m |
t | Time | t × h2/κ | s |
Non-dimensional physical variables used in the governing equations and the dimensionalization. Physical parameters used for the dimensionalization are listed in Table 1.
Symbol . | Explanation . | Dimensional value . | Unit of dimensional value . |
---|---|---|---|
Ci | Concentration of ith chemical component | Ci × C0 | kg kg−1 |
k | Thermal conductivity | k × ρ0cpκ | W m−1 K−1 |
T | Temperature | T × ΔT | K |
u = (u, w)⊤ | Velocity field | u × κ/h | m s−1 |
Γi | Adjoint concentration of component i | – | – |
η | Viscosity | η × η0 | Pa s |
τ | Adjoint temperature | – | – |
φ | Adjoint stream function | – | – |
ψ | Stream function | ψ × κ | m2 s−1 |
x = (x, z)⊤ | Cartesian coordinates | x × h | m |
t | Time | t × h2/κ | s |
Symbol . | Explanation . | Dimensional value . | Unit of dimensional value . |
---|---|---|---|
Ci | Concentration of ith chemical component | Ci × C0 | kg kg−1 |
k | Thermal conductivity | k × ρ0cpκ | W m−1 K−1 |
T | Temperature | T × ΔT | K |
u = (u, w)⊤ | Velocity field | u × κ/h | m s−1 |
Γi | Adjoint concentration of component i | – | – |
η | Viscosity | η × η0 | Pa s |
τ | Adjoint temperature | – | – |
φ | Adjoint stream function | – | – |
ψ | Stream function | ψ × κ | m2 s−1 |
x = (x, z)⊤ | Cartesian coordinates | x × h | m |
t | Time | t × h2/κ | s |
2.6 Numerical schemes
The forward and adjoint equations in Sections 2.1 and 2.3 are discretized in space and time with a finite volume method and a staggered grid (Fig. 1). The grid is rectangular and contains 121 uniform nodes along the horizontal axis (120 cells; 50 km interval) and 41 uniform nodes along the vertical axis (40 cells; 50 km interval). The variables were also discretized along the time axis, as well as into 5001 steps with uniform 104-yr intervals. It was confirmed in advance that the discretization satisfies the Courant–Friedrichs–Lewy condition at each time step (i.e. |$u \Delta x/ \Delta t \approx w \Delta z/ \Delta t \lessapprox 0.05$|) in the case of the parameter sets in Table 1 and Section 2.5.
We used the same solvers as in Nakao et al. (2016). The discretized forward and adjoint equations of motion (eqs 1 and 19) were solved with a modified Cholesky decomposition algorithm, which is a direct solver for linear systems. The direct solver is overengineered under a constant viscosity, but is equipped for future studies given the complex rheology of mantle rocks. The discretized forward and adjoint equations for energy conservation (eqs 11 and 20) were solved with a partially upwind finite volume scheme (Clauser & Kiesner 1987). The transport of the chemical components (eqs 15 and 21) was solved with a marker-in-cell method with 43 200 markers. Although the tracers can record multiple components, for simplicity we only considered a single component (i.e. i = 1).
To update the temperature and composition at t = t0 in the forward model, we applied a simple gradient descent method:
where index n indicates the number of iteration loops. We empirically set αT = 5 × 10−3 and αC = 5 × 10−2 to |$w_{C_i}/w_T = 5\times 10^{-15}$|. In the first iteration loop (n = 1), a temperature structure T(x, t0) = (T0 + T1)/2 + 0.0167c and a composition structure Ci(x, t0) = (1 + c)/2 are obtained using a random perturbation c ∈ [ − 1, 1].
3 DATA AND REFERENCE MODELS
We use the synthetic velocity, temperature and compositional data from the reference forward simulations as the observed data for the 4D-Var simulations. The synthetic data are generated from reference forward simulations R1, R2 and R3, which have different temperature structures with horizontal wavelengths of 2h, h and h/2 at the initial condition t = t0, respectively (Figs 3 a–c), and the composition is initially horizontally stratified in each reference run (e.g. Figs 3d–f). The settings of the reference models are the same as in the forward part of the adjoint model (Fig. 1a). The fluid temperature Tobs and composition |$C_i^{\rm obs}$| of the entire model domain (x0 ≤ x ≤ x1 and z0 ≤ z ≤ z1) are sampled from the reference model at the last time step t = t1 (4th row of Figs 3a–f), which is analogous to seismic and electromagnetic observations in a geophysical context. The sampling resolution is 50 km on both the horizontal and vertical axes, which is comparable to the resolution of current seismic and electromagnetic observations of the upper mantle with a dense local observation network (Tada et al. 2016), although the chemical composition is recorded by the markers and has a higher resolution than the grid. The horizontal component of the fluid velocity uobs was sampled along z = z0 in the three reference models throughout the model time (t0 ≤ t ≤ t1), which is analogous to plate motion estimated from geological records, such as palaeomagnetic data.

Snapshots of the three reference models at four time steps. (a)–(c) Temperature (coloured contours) and velocity (arrows) fields for runs R1, R2 and R3. (d)–(f) Chemical composition for runs R1, R2 and R3. t0 denotes the time at the initial condition, and t1 denotes the time at which the temperature structure can be observed.
As an example, we now describe the flow processes of run R2, in which the convective cells have a horizontal length-scale of h (∼2000 km; Fig. 3b). Three cold (∼1000 °C) and three hot (∼2000 °C) anomalies were placed along the upper and lower boundaries, respectively, under the initial condition (1st row of Fig. 3b). The cold/hot anomalies descend/ascend due to their negative/positive buoyancy (2nd row of Fig. 3b). Near the upper boundary of the model (0 km depth), the currents start to converge horizontally toward the three cold anomalies and diverge horizontally away from the three hot anomalies, due to the negative buoyancy of the cold anomalies. In contrast, near the lower boundary of the model (2000 km depth), the currents simultaneously converge horizontally toward the hot anomalies and diverge horizontally away from the cold anomalies, due to the positive buoyancy of the hot anomalies. The thermal anomalies reach the opposite side of their initial positions with a decreasing vertical velocity (3rd row of Fig. 3b). Finally, the cold/hot anomalies spread horizontally, due to the transformation of the flows from the vertical to horizontal directions around the lower/upper boundaries (4th row of Fig. 3b). Similar flow processes occurred in runs R1 (Fig. 3a) and R3 (Fig. 3c), but the horizontal lengths of the convective cells differ with the initial thermal structures.
Following these fluid flow processes, the chemical component is transported (Figs 3d–f). The chemical composition was assumed to not alter any physical properties, including the fluid density and viscosity, and thus the chemical markers move passively and follow the thermal convection (i.e. non-compositional convection). In each run, layered structures with compositions of C1 = 0 or 1 are set as the initial condition (1st row of Fig. 3), and synclines and anticlines are generated along the tracks of the cold and hot anomalies, respectively (2nd row of Fig. 3), and stirring progresses with increasing model time (3rd and 4th rows of Fig. 3). The chemical heterogeneity at the initial condition (e.g. C1 = 0 or 1) remains on a small scale in the later stages, but the chemical structure is averaged (i.e. degraded) in space during sampling in the 4D-Var simulations. The shapes of the deformed layers are almost the same as the outlines of the plumes, indicating that the dominant mechanism of heat transport is advection.
The thermal heterogeneities placed in the initial conditions for each reference run control the wavelengths of the thermal structures, although the horizontal wavelength of thermal cells tends to develop under a given Ra, based on the thermal boundary layer theory (Turcotte & Schubert 2002; Ribe 2007). With sufficient time, the thermal heterogeneity set for the initial conditions will be homogenized and the run will converge into a ‘self-consistent’ steady state as shown in Fig. A3 (Appendix B). However, we applied our 4D-Var method to the transitional conditions that cause excessive thermal diffusion and stirring, to investigate the ability of our 4D-Var method to retrospectively reproduce the diffusion and stirring processes.
4 RESULTS
Nine 4D-Var simulations listed in Table 3 were performed by assimilating the synthetic data taken from the three reference models (Section 3). Section 4.1 describes in detail how the thermal convection of the reference model is reconstructed in one of the 4D-Var simulations by assimilating the surface velocity, thermal snapshot, and compositional snapshot. Section 4.2 describes how the results change when the surface velocity or the temperature snapshot is excluded from the assimilated data. Section 4.3 describes how the accuracy of the reconstruction changes with the structure of the convection (i.e. the size of the convective cells in the three reference models).
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | Figure . |
---|---|---|---|---|---|
R1 | Reference | Long wavelength | – | – | 3a, 3d, 8d |
R2 | Reference | Medium wavelength | – | – | 3b, 3e, 5a, 5d, 7d |
R3 | Reference | Short wavelength | – | – | 3c, 3f, 9d |
1uT | 4D-Var | Random noise at n = 1 | u and T of R1 | 10−12 | 8a |
2uT | 4D-Var | Random noise at n = 1 | u and T of R2 | 10−12 | 4, 5b, 5e, 6a, 7a |
3uT | 4D-Var | Random noise at n = 1 | u and T of R3 | 10−12 | 9a |
1u | 4D-Var | Random noise at n = 1 | u of R1 | +∞ | 8b |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 6b, 7b |
3u | 4D-Var | Random noise at n = 1 | u of R3 | +∞ | 9a |
1T | 4D-Var | Random noise at n = 1 | T of R1 | 0 | 8c |
2T | 4D-Var | Random noise at n = 1 | T of R2 | 0 | 6c, 7c |
3T | 4D-Var | Random noise at n = 1 | T of R3 | 0 | 9c |
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | Figure . |
---|---|---|---|---|---|
R1 | Reference | Long wavelength | – | – | 3a, 3d, 8d |
R2 | Reference | Medium wavelength | – | – | 3b, 3e, 5a, 5d, 7d |
R3 | Reference | Short wavelength | – | – | 3c, 3f, 9d |
1uT | 4D-Var | Random noise at n = 1 | u and T of R1 | 10−12 | 8a |
2uT | 4D-Var | Random noise at n = 1 | u and T of R2 | 10−12 | 4, 5b, 5e, 6a, 7a |
3uT | 4D-Var | Random noise at n = 1 | u and T of R3 | 10−12 | 9a |
1u | 4D-Var | Random noise at n = 1 | u of R1 | +∞ | 8b |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 6b, 7b |
3u | 4D-Var | Random noise at n = 1 | u of R3 | +∞ | 9a |
1T | 4D-Var | Random noise at n = 1 | T of R1 | 0 | 8c |
2T | 4D-Var | Random noise at n = 1 | T of R2 | 0 | 6c, 7c |
3T | 4D-Var | Random noise at n = 1 | T of R3 | 0 | 9c |
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | Figure . |
---|---|---|---|---|---|
R1 | Reference | Long wavelength | – | – | 3a, 3d, 8d |
R2 | Reference | Medium wavelength | – | – | 3b, 3e, 5a, 5d, 7d |
R3 | Reference | Short wavelength | – | – | 3c, 3f, 9d |
1uT | 4D-Var | Random noise at n = 1 | u and T of R1 | 10−12 | 8a |
2uT | 4D-Var | Random noise at n = 1 | u and T of R2 | 10−12 | 4, 5b, 5e, 6a, 7a |
3uT | 4D-Var | Random noise at n = 1 | u and T of R3 | 10−12 | 9a |
1u | 4D-Var | Random noise at n = 1 | u of R1 | +∞ | 8b |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 6b, 7b |
3u | 4D-Var | Random noise at n = 1 | u of R3 | +∞ | 9a |
1T | 4D-Var | Random noise at n = 1 | T of R1 | 0 | 8c |
2T | 4D-Var | Random noise at n = 1 | T of R2 | 0 | 6c, 7c |
3T | 4D-Var | Random noise at n = 1 | T of R3 | 0 | 9c |
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | Figure . |
---|---|---|---|---|---|
R1 | Reference | Long wavelength | – | – | 3a, 3d, 8d |
R2 | Reference | Medium wavelength | – | – | 3b, 3e, 5a, 5d, 7d |
R3 | Reference | Short wavelength | – | – | 3c, 3f, 9d |
1uT | 4D-Var | Random noise at n = 1 | u and T of R1 | 10−12 | 8a |
2uT | 4D-Var | Random noise at n = 1 | u and T of R2 | 10−12 | 4, 5b, 5e, 6a, 7a |
3uT | 4D-Var | Random noise at n = 1 | u and T of R3 | 10−12 | 9a |
1u | 4D-Var | Random noise at n = 1 | u of R1 | +∞ | 8b |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 6b, 7b |
3u | 4D-Var | Random noise at n = 1 | u of R3 | +∞ | 9a |
1T | 4D-Var | Random noise at n = 1 | T of R1 | 0 | 8c |
2T | 4D-Var | Random noise at n = 1 | T of R2 | 0 | 6c, 7c |
3T | 4D-Var | Random noise at n = 1 | T of R3 | 0 | 9c |
4.1 Reconstruction of the temperature, velocity and chemical composition fields
First, we describe the results of run 2uT, which is a 4D-Var simulation that assimilated the thermal and compositional snapshots at t = t1 and the velocity along z = z0 from run R2. The weighting ratio of the assimilation of the surface velocity to the temperature snapshot is set to wu/wT = 10−12, as we empirically confirmed that the simulation results change significantly around this value. The weighting factor of the composition is set to be sufficiently small (|$w_{C_i}/w_T = 5\times 10^{-15}$|) so that the composition does not affect the reconstruction of the flow and thermal structures.
Fig. 4 shows the thermal and compositional structures at the initial condition, T(x, t0) and |$C_1{(\bf x}, t_0)$|, of run 2uT that are updated as the iteration loops n progressed. Starting from a nearly uniform temperature and chemical composition at the initial condition of n = 1, cold/hot anomalies and a layered chemical structure appear as n increased.

Results of run 2uT at four different iteration steps n. (a) Optimized temperature (coloured contours) and velocity (arrows) fields at the initial condition (t = t0). (b) Errors of the optimized temperature (coloured contours) and velocity (arrows) fields at the initial condition (t = t0) between the 4D-Var model (runs 2uT) and its reference model (run R2). (c) Optimized chemical composition at the initial condition (t = t0).
Fig. 5(b) shows the temporal evolution of the temperature and velocity fields of the forward part of run 2uT after 600 iterations. The evolution is similar to that of its reference run (Fig. 5a). Three cold (∼1000 °C) and three hot (∼2000 °C) anomalies occur along the upper and lower boundaries at the initial condition, respectively (1st row of Fig. 5b), move vertically (2nd row of Fig. 5b), reach the opposite side of the initial positions (3rd row of Fig. 5b), and move horizontally along the opposite boundary (4th row of Fig. 5b). As such, the 4D-Var model reproduces the temperature and flow fields of run R2 throughout the model time. As shown in Fig. 5(c), the errors on the temperature and velocity fields are small in the last time step (4th row of Fig. 5c), but somewhat larger errors occur around the edges of the temperature anomalies in the first time step (1st row of Fig. 5c). This highlights the limited ability of the inverse method to reconstruct the extensive thermal diffusion processes resulting from the temperature discontinuity set as part of the initial conditions.

Snapshots of runs R2 and 2uT (iteration loop n = 601) at four time steps. (a) Temperature (coloured contours) and velocity (arrows) fields for run R2. (b) Temperature and velocity for run A2uT. (c) Temperature errors (coloured contours) and velocity errors (arrows) between runs R2 and A2uT. (d) Chemical composition of run R2. Dots indicate markers and colours indicate the value of the chemical component. (e) Chemical composition of run A2uT. t0 denotes the time at the initial condition, and t1 denotes the time at which the temperature structure can be observed.
Fig. 5(e) shows the reconstructed evolution of the chemical composition in run 2uT. The compositional structure is well reproduced at t = t1 (4th row of 5e), as the run is optimized for the composition at t = t1. In contrast, the layered structure is partially reproduced at t = t0, and there are large errors and a poor resolution for the composition around the six plume tracks (1st row of Fig. 5e). This is probably due to the large velocity errors around the plumes. In addition, the fine-scale structure of the composition (4th row of Fig. 5d), which is associated with significant deformation around and within the plumes, was lost when sampling from run R2, which also caused the low resolution around the plume tracks in run 2uT.
Figs 6(i) and (ii) shows the time-dependent root-mean-square errors (RMSEs) of the temperature and velocity fields between the 4D-Var and reference models, defined as follows:
and
respectively, where j is the sample number and N is the total number of samples. Tj, |$T_j^{\rm obs}$|, uj and |${\bf u}_j^{\rm obs}$| are sampled at the centres of all cells (i.e. N = 120 × 40; Fig. 1a). TE and uE of run 2uT tend to decrease as the iteration loops n increase (Figs 6a-i and a-ii). TE of run 2uT is high in the early stages and low in the later stages (Fig. 6a-i), consistent with the optimization based on the cost function, which includes the temperature errors at t = t1. In contrast, uE becomes large at ∼15 Myr, when the average flow velocity increases (Fig. 6a-ii).

Errors between the reference and 4D-Var runs. (a) Runs R2–2uT; (b) runs R2–2u; (c) runs R2–2T. (i) The top three plots show the temperature root-mean-square-error (RMSE) TE defined by eq. (31) between run R2 and each 4D-Var run on the vertical axes versus the iteration step n on the horizontal axes. Four different coloured lines correspond to different time steps t. (ii) The middle three plots similarly show the velocity RMSE uE defined by eq. (32). (iii) The bottom three plots show the cost function defined by eq. (16) normalized to that at the initial guess (n = 1) on the vertical axes versus the iteration step n on the horizontal axes.
Fig. 6(a-iii) shows the cost function normalized to that of the initial guess (n = 1) for run 2uT. The cost function tends to decrease as the iteration progresses, indicating that the optimization is successful.
4.2 Effects of the lack of observations
We now describe how a lack of observations (i.e. temperature snapshot or surface velocity) affects the results of the 4D-Var simulations by comparing runs 2uT, 2u and 2T (Fig. 7).

Snapshots of runs (a) 2uT, (b) 2u, (c) 2T (iteration loop n = 601) and (d) R2 at four time steps. Coloured contours indicate the temperature and arrows indicate the velocity. t0 denotes the time at the initial condition, and t1 denotes the time at which the temperature structure can be observed.
Run 2u (Fig. 7b), in which the thermal structure is excluded from the assimilated data, still reproduces the thermal convection of the reference run (Fig. 7d). TE(t0) of run 2u is small, although a somewhat larger TE(t1) is obtained in run 2u (Figs 6a-i and b-i). The reason why the thermal structure of the initial stages is successfully estimated without using any thermal information is discussed in Section 5.1.
In run 2T (Fig. 7c), in which the surface velocity is excluded from the assimilated data, the thermal convection of the reference run is not reproduced (Fig. 7d). However, the thermal and velocity structures in the last time step in run 2T (4th row of Fig. 7c) are consistent with run R2 (4th row of Fig. 7d), and the average temperature error TE(t1) is small (Fig. 6c-i). This is because the data assimilation is optimized to reduce the temperature error between the reference and 4D-Var models at t = t1. Although the three cold and hot anomalies are reproduced in run 2T (1st row of Fig. 7c), and the temperatures of the anomalies are consistent with the reference model (i.e. 1000 and ∼2000 °C), the anomaly shapes are vertically extended as compared with the reference model (1st row of Fig. 7d). In addition, the flow velocity is much higher than in the reference model (Fig. 7c). As shown by the red lines in Figs 6(c-i) and (c-ii), TE(t0) and uE(t0) remain large even as the iteration progresses. As such, we obtained a local solution in run 2T, which suggests there are limitations to reconstructing a time-series of fluid motion from observations of only the last time step.
4.3 Effects of the convecting cell size
We now investigate how accurately the thermal convection is reconstructed depending on the length-scales of the thermal and flow structures (i.e. runs R1, R2 and R3).
Fig. 8 shows the results of a reconstruction of the thermal convection of run R1, which has a longer horizontal wavelength than described in the previous sections. The 4D-Var simulation run 1uT, in which both the last temperature snapshot and surface velocity sampled from run R1 are assimilated, reproduces the thermal convection of the reference model throughout the model time (Figs 8a and d). Similar to the medium wavelength case, thermal convection with a long wavelength structure is also reproduced when the temperature snapshot is excluded from the assimilation (run 1u; Fig. 8b), but is not reproduced when the surface velocity is excluded from the assimilation, particularly in the initial stages (run 1T; Fig. 8c).

Snapshots of runs (a) 1uT, (b) 1u, (c) 1T (iteration loop n = 501) and (d) R1 at four time steps. Coloured contours indicate the temperature and arrows indicate the velocity. t0 denotes the time at the initial condition, and t1 denotes the time at which the temperature structure can be observed.
Fig. 9 shows a reconstruction of thermal convection of run R3, which has a shorter horizontal wavelength. In this case, the 4D-Var simulation fails to reproduce the thermal convection under any assimilation weight ratio, wu/wT (runs 3uT, 3u and 3T; Figs 9a–c). Focusing on run 3uT, the thermal and flow structures are well reproduced in the later stages (3rd and 4th rows of Fig. 9a), but not in the early stages (1st and 2nd rows of Fig. 9a). Although the six cold and hot anomalies are estimated to exist, their shapes are distorted as compared with those in the reference model. In Section 5.2, we discuss why the convection with a short-wavelength structure could not be reconstructed.

Snapshots of runs (a) 3uT, (b) 3u, (c) 3T (iteration loop n = 801) and (d) R3 at four time steps. Coloured contours indicate the temperature and arrows indicate the velocity. t0 denotes the time at the initial condition, and t1 denotes the time at which the temperature structure can be observed.
5 DISCUSSION
5.1 Temperature estimation from the surface velocity
Section 4.2 shows cases for which the thermal convection of the laminar flow is reproduced without information regarding the temperature, using only the velocity observed at the upper boundary (i.e. runs 1u and 2u). The temperature is accurately estimated under such settings when only the correct viscosity is given (Appendix A). We attribute this to the force balance between the viscous stress and thermal buoyancy around the thermal anomalies, as described in the following.
Given that our model is 2-D and incompressible, an upwelling flow towards the upper boundary is converted to horizontal diverging flows along the upper boundary (e.g. a mid-ocean ridge and backarc basin in a geological context), whereas a downwelling flow away from the upper boundary causes horizontally converging flows along the upper boundary (e.g. a subduction zone and continental collisional belt). Therefore, the surface horizontal flow velocity reflects the shallow vertical flow velocity. The vertical velocity of a body relative to the surrounding fluid produces a viscous stress, which is balanced by the buoyancy force of the upwelling or downwelling body. In our model, the buoyancy is generated solely by thermal expansion and not by composition. The analytical solution for the ascending velocity wP of a spherical body (e.g. a hot mantle plume) with a radius of rP and excess temperature of ΔTP can be expressed as a dimensional equation:
(Turcotte & Schubert 2002). As such, the temperature of the inner part of the fluid may be constrained, in some cases, by only the velocity at the upper boundary.
Eq. (33) also explains the trend described in Appendix A (i.e. overestimating the fluid viscosity η0 leads to overestimation of the excess temperature ΔTP and anomaly size rP; Figs A1a and c). Similarly, overestimating the thermal expansivity α would result in underestimation of ΔTP and rP.
5.2 Limitations of the inverse time problem
Section 4.3 revealed a limitation in reconstructing the thermal structure of a fluid with small convecting cells. Reconstruction of thermal convection with a horizontal wavelength of 1000 km failed (runs 3uT, 3u and 3T), whereas that with a horizontal wavelength of 2000 km was successful (runs 2uT and 2u). The reason for this is complex, but may be related to how the thermal heterogeneity at the initial condition is maintained during the model time, td = (t1 − t0)h2/κ = 50 Myr. More specifically, a very small thermal structure at the initial condition may dissipate due to stirring and diffusion processes before being observed as upwelling or downwelling along the surface and/or before being observed as temperature anomalies in the last stage.
Given the parameter values in Table 1, the length-scale of thermal conduction during td is about |$\sqrt{\kappa t_d } = 40$| (km), which is comparable to the thickness of oceanic plates. The length-scale of thermal advection is about |$\bar{u} t_d =$| 30 (mm yr−1) × 50 (Myr) = 1500 (km), where |$\bar{u}$| is the representative flow velocity. The length-scale of heat advection during td can be expressed as |$\sqrt{\bar{u} h t_d} = \sqrt{\rm 30\,\,(mm\,yr^{-1}) \times 2000\,\,(km) \times 50\,\,(Myr)}$| = 1732 (km), if we substitute |$\bar{u} h$| for the thermal diffusivity κ, similar to how the numerator of a Péclet number (i.e. a dimensionless number) is defined as the ratio of advective to diffusive transport (i.e. |${\rm Pe} = \bar{u} h / \kappa$|). Therefore, advection is a more effective heat transport mechanism than conduction, and the length-scale of heat advection |$\bar{u} t_d$| may have acted as a threshold at which the reconstructions before td are successful or not (i.e. 2000 or 1000 km).
It should also be noted that the 4D-Var inversion is successful because advection is still the dominant mechanism of heat transport in a given system, despite the thermal discontinuity set in the initial state. The diffusion process generally leads to a loss of information and, accordingly, the inverse problem is not able to provide a unique solution, as shown by the large errors at the edges of the temperature anomalies in the early stages of the 4D-Var experiments (Fig. 5c).
5.3 Future studies of palaeo-mantle dynamics
This study undertook data assimilation using synthetic data as a first-order approximation. Future research should assimilate real data to reconstruct plate motions and mantle convection, and will need to consider more complex components, such as the following.
Temperature- and pressure-dependent viscosity: In this study, the viscosity of the fluid surface was the same as that in the fluid interior, and the thermal structure of the palaeo-mantle could be estimated based on the classical relationship between the buoyancy of the downwelling/upwelling body and surface motion (Turcotte & Schubert 2002). However, when the surface of the target fluid is covered by a cold and hard lid, small-scale upwelling and downwelling would not appear as surface motion; for example in the case of sublithospheric small-scale convection under an oceanic plate (Eilon et al. 2022) or, in an extreme case, the stagnant lid above the Martian mantle (Reese et al. 1998). Furthermore, the plate motion could be partially decoupled from the deep mantle due to the low viscosity of the asthenosphere associated with the combined effects of temperature and pressure (along with hydration and/or partial melting; Karato & Jung 1998; Hirth & Kohlstedt 2003; Kawakatsu et al. 2009). Therefore, additional work is needed to estimate the temperatures of the downwelling and upwelling bodies under realistic rheological conditions. The introduction of an experimentally constrained rheology may lead to more reliable temperature constraints for large-scale thermal anomalies, because of the difference in temperature dependence of the density and viscosity of mantle minerals; that is the density is almost inversely proportional to temperature, whereas the viscosity follows the Arrhenius law (Hirth & Kohlstedt 2003).
Compositional convection: This study only considered pure thermal convection, in which the fluid motion is driven by its thermal buoyancy. However, when the density of the target fluid depends on its chemical composition, such as water content (Jacobsen et al. 2008), as well as its temperature, the fluid may be driven by the spatial heterogeneity of the concentration of the chemical component. This could lead to a trade-off between the temperature and compositional estimates of the inner part of the target fluid. However, once an accurate temperature field is obtained based on the surface heat flux and geological information, the spatial and time-dependent distribution of a buoyant chemical component could be estimated, as in the inverse forward simulations of Nakao et al. (2016, 2018). Nakao et al. (2016, 2018) showed that water in and from subducting oceanic plates significantly affects the evolution of a subduction zone, including the plate velocity, island arc stress state, backarc spreading, and plate stagnation at a depth of 660 km.
Applications to geological data: Our results suggest that it is difficult to reconstruct palaeo-mantle dynamics only from a present-day snapshot of the mantle thermal and compositional structure (e.g. Fig. 7c). The assimilation of additional data, such as those that contain time-series of the thermal and chemical structure, will increase the resolution and accuracy of the reconstruction of thermal convection. Volcanic and metamorphic rocks can record the time, temperature, pressure, composition and strain rate along Lagrangian tracers of their ascent, and will be useful in constraining the thermal convection in the palaeo-mantle (Iwamori 2003; Kuwatani et al. 2018a, b). Our future research will update the adjoint model of the marker-in-cell methods to include such geological data.
3-D spherical coordinates: The 2-D coordinates used in our model are appropriate where a tectonic plate is wide enough along its ridge and trench, such that toroidal flows are negligible. Cartesian coordinates are also appropriate when focusing on short-term and small-scale tectonics. However, a spherical shell domain will be required when reconstructing long-term and global-scale convection based on real data (Bunge et al. 2003; Price & Davies 2018).
Uncertainty quantification: The inversion of the chemical tracers in our models suggests that the flow reconstruction is poor in some areas, such as around the tracks of the downwelling and upwelling bodies (Fig. 5e). A future study will quantitatively assess this uncertainty using a second-order adjoint method (Ito et al. 2016, 2021, 2023), which will help identify which data are useful for better constraining the palaeo-mantle dynamics.
6 CONCLUSIONS
We undertook a synthetic study using adjoint-based data assimilation to reconstruct time-series of temperature and velocity fields in a highly viscous fluid undergoing free thermal convection. We obtained the following results. (1) Reconstruction of the thermal convection tends to fail when the snapshot information is assimilated for the last stage of the flow process, whereas there are some cases where the reconstruction was successful by only assimilating the velocity along the fluid surface when an accurate viscosity was used. This is likely related to the balance of forces due to the thermal buoyancy and viscous stress around the thermal anomalies. (2) There is a limit on the reconstruction of thermal convection when the convecting cells are small. This may be due to how information is preserved during the stirring and diffusion processes. Our results suggest that it is critical to use geological data to estimate time-series of velocity and temperature in Earth’s deep interior. To overcome the loss of information due to stirring and diffusion, our future studies will assimilate Lagrangian data that partially record past temperatures.
ACKNOWLEDGEMENTS
We are grateful to Lorenzo Colli and an anonymous reviewer for their constructive comments, and Gael Choblet and Louise Alexander for editorial handling. We thank Tomoeki Nakakuki for providing the numerical code to solve the forward mantle convection. Simulation results were visualized using Generic Mapping Tools (Wessel et al. 2019). Part of the numerical simulations were performed with the JAMSTEC Earth Simulator (ES4). We thank Stallard Scientific Editing for English language editing. This research was supported by the Japan Science and Technology Agency (CREST; grants JPMJCR1761 and JPMJCR1763), MEXT Project for Seismology towards Research Innovation with Data of Earthquakes (STAR-E; grant JPJ010217), JSPS KAKENHI grants (19H05662, 20K21785, 20K04119, 21H04750, 22K03542, 22K14131, 23H00466 and 23KK0181) and Joint Research Programs 2021-B-01 and 2022-B-06 of the Earthquake Research Institute, University of Tokyo.
DATA AVAILABILITY
Original data of numerical results can be obtained by contacting the corresponding author.
References
APPENDIX A: EFFECTS OF THE ASSUMED VISCOSITY IN THE 4D-VAR MODELS
Section 4.2 showed a case in which the fluid temperature was successfully estimated throughout a model run by only assimilating the surface velocity (run 2u). Based on this result, this section investigates the accuracy of the temperature estimations under such conditions depending on the viscosity assumed in the 4D-Var models.
Fig. A1 a shows a case with a lower constant viscosity (run 2uLV; 5 × 1021 Pa s; Table A1) as compared with the reference model (run R2; 1022 Pa s; Fig. A1d). Although the convection cells are consistently reconstructed, the estimated temperature anomalies are smaller and insignificant. In contrast, in the case with a higher viscosity (run 2uHV; 2 × 1022 Pa s; Fig. A1c; Table A1), the convection cells are consistently reconstructed, and the estimated temperature anomalies are larger as compared with the reference model. The relationship between the fluid viscosity and estimated temperature is discussed in Section 5.1.

Snapshots of runs (a) 2uLV, (b) 2u, (c) 2uHV (iteration loop n = 601) and (d) R2 at four time steps. Coloured contours indicate the temperature and arrows indicate the velocity. t0 denotes the time at the initial condition, and t1 denotes the time at which the temperature structure can be observed.

Snapshots of a self-consistent case with a thermal Rayleigh number of 2.34 × 105. (a) Temperature and velocity fields of run RSC5. (b) Chemical composition of run RSC5. (c) Temperature and velocity fields of run SC5. (d) Chemical composition of run SC5.

Snapshots of a self-consistent case with a thermal Rayleigh number of 2.34 × 106. (a) Temperature and velocity fields of run RSC6. (b) Chemical composition of run RSC6. (c) Temperature and velocity fields of run SC6. (d) Chemical composition of run SC6.

Snapshots of a self-consistent case with a thermal Rayleigh number of 2.34 × 107. (a) Temperature and velocity fields of run RSC7. (b) Chemical composition of run RSC7. (c) Temperature and velocity fields of run SC7. (d) Chemical composition of run SC7.
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | η . | Ra . | Figure . |
---|---|---|---|---|---|---|---|
R2 | Reference | Medium wavelength | – | – | 1 | 2.34 × 106 | A1 d |
RSC5 | Reference | Self-consistent | – | – | 10 | 2.34 × 105 | A2a, b |
RSC6 | Reference | Self-consistent | – | – | 1 | 2.34 × 106 | A3a, b |
RSC7 | Reference | Self-consistent | – | – | 0.1 | 2.34 × 107 | A4a, b |
2uLV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 0.5 | 4.68 × 106 | A1a |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 1 | 2.34 × 106 | A1b |
2uHV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 2 | 1.17 × 106 | A1c |
SC5 | 4D-Var | Random noise at n = 1 | u and T of RSC5 | 10−12 | 10 | 2.34 × 105 | A2c, d |
SC6 | 4D-Var | Random noise at n = 1 | u and T of RSC6 | 10−12 | 1 | 2.34 × 106 | A3c, d |
SC7 | 4D-Var | Random noise at n = 1 | u and T of RSC7 | 10−12 | 0.1 | 2.34 × 107 | A4c, d |
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | η . | Ra . | Figure . |
---|---|---|---|---|---|---|---|
R2 | Reference | Medium wavelength | – | – | 1 | 2.34 × 106 | A1 d |
RSC5 | Reference | Self-consistent | – | – | 10 | 2.34 × 105 | A2a, b |
RSC6 | Reference | Self-consistent | – | – | 1 | 2.34 × 106 | A3a, b |
RSC7 | Reference | Self-consistent | – | – | 0.1 | 2.34 × 107 | A4a, b |
2uLV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 0.5 | 4.68 × 106 | A1a |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 1 | 2.34 × 106 | A1b |
2uHV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 2 | 1.17 × 106 | A1c |
SC5 | 4D-Var | Random noise at n = 1 | u and T of RSC5 | 10−12 | 10 | 2.34 × 105 | A2c, d |
SC6 | 4D-Var | Random noise at n = 1 | u and T of RSC6 | 10−12 | 1 | 2.34 × 106 | A3c, d |
SC7 | 4D-Var | Random noise at n = 1 | u and T of RSC7 | 10−12 | 0.1 | 2.34 × 107 | A4c, d |
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | η . | Ra . | Figure . |
---|---|---|---|---|---|---|---|
R2 | Reference | Medium wavelength | – | – | 1 | 2.34 × 106 | A1 d |
RSC5 | Reference | Self-consistent | – | – | 10 | 2.34 × 105 | A2a, b |
RSC6 | Reference | Self-consistent | – | – | 1 | 2.34 × 106 | A3a, b |
RSC7 | Reference | Self-consistent | – | – | 0.1 | 2.34 × 107 | A4a, b |
2uLV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 0.5 | 4.68 × 106 | A1a |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 1 | 2.34 × 106 | A1b |
2uHV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 2 | 1.17 × 106 | A1c |
SC5 | 4D-Var | Random noise at n = 1 | u and T of RSC5 | 10−12 | 10 | 2.34 × 105 | A2c, d |
SC6 | 4D-Var | Random noise at n = 1 | u and T of RSC6 | 10−12 | 1 | 2.34 × 106 | A3c, d |
SC7 | 4D-Var | Random noise at n = 1 | u and T of RSC7 | 10−12 | 0.1 | 2.34 × 107 | A4c, d |
Run ID . | Model type . | Initial condition . | Data assimilated . | wu/wT . | η . | Ra . | Figure . |
---|---|---|---|---|---|---|---|
R2 | Reference | Medium wavelength | – | – | 1 | 2.34 × 106 | A1 d |
RSC5 | Reference | Self-consistent | – | – | 10 | 2.34 × 105 | A2a, b |
RSC6 | Reference | Self-consistent | – | – | 1 | 2.34 × 106 | A3a, b |
RSC7 | Reference | Self-consistent | – | – | 0.1 | 2.34 × 107 | A4a, b |
2uLV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 0.5 | 4.68 × 106 | A1a |
2u | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 1 | 2.34 × 106 | A1b |
2uHV | 4D-Var | Random noise at n = 1 | u of R2 | +∞ | 2 | 1.17 × 106 | A1c |
SC5 | 4D-Var | Random noise at n = 1 | u and T of RSC5 | 10−12 | 10 | 2.34 × 105 | A2c, d |
SC6 | 4D-Var | Random noise at n = 1 | u and T of RSC6 | 10−12 | 1 | 2.34 × 106 | A3c, d |
SC7 | 4D-Var | Random noise at n = 1 | u and T of RSC7 | 10−12 | 0.1 | 2.34 × 107 | A4c, d |
APPENDIX B: RECONSTRUCTION OF SELF-CONSISTENT THERMAL STRUCTURES
We placed several thermal anomalies as part of the initial conditions in the reference runs to investigate the ability of our 4D-Var method to retrospectively reproduce the diffusion and stirring processes. Consequently, natural processes in the actual mantle might be unlikely to generate the large gaps in the temperature fields and some of the horizontal wavelengths of the convecting cells that we placed in the reference runs.
In this section, we describe how our 4D-Var method reproduces ‘self-consistent’ cases, where the thermal and flow structures naturally develop under given thermal Rayleigh numbers, Ra, when sufficient time has elapsed. We performed three reference runs with different Ra values for 5 Bsyr in advance, and continued the computation for another 50 Myr (Table A1). Three corresponding 4D-Var experiments were conducted using the surface velocity for the 50 Myr period and temperature snapshot at the last step sampled from the three self-consistent reference experiments.
Figs A2(a) and (b) show the 50 Myr evolution of reference run RSC5 starting from a self-consistent thermal structure with Ra = 2.34 × 105 (1/10 of that in the main text). In this case, the thermal structure reaches a steady state with a horizontally periodic structure with a wavelength of 2h and very low flow velocities. Figs A2(c) and (d) show the results of 4D-Var run SC5, in which both the surface velocity for 50 Myr and temperature snapshot at the last step sampled from reference run RSC5 have been assimilated, which reproduces the steady state in reference run RSC5.
Figs A3(a) and (b) show reference run RSC6 with Ra = 2.34 × 106 (same as that in the main text). Similar to the former case, the thermal structure reaches a steady state with a horizontal wavelength of |$\sim \, 2h$|, but in this case the flow velocities are larger and the thermal boundary layers are thinner as compared with run RSC5. Figs A3(c) and (d) show the results of 4D-Var run SC6, in which data sampled from reference run RSC6 have been assimilated, which reproduces the steady state in reference run RSC6.
Figs A4(a) and (b) show reference run RSC7 with Ra = 2.34 × 107 (10 times larger than that in the main text). A higher Ra leads to more chaotic convection, and this case does not reach a steady state, with the thermal boundary layers being intermittently peeled off. Figs A4(c) and (d) show the results of 4D-Var run SC7, in which data from reference run RSC7 have been assimilated. The 4D-Var run reproduces the thermal and flow structures in the later stages of the reference run (30–50 Myr), but not in the earlier stages (0–15 Myr). The partial failure of the reconstruction might be related to the loss of information due to the large advection velocities, as discussed in Section 5.2.