Efﬁcient and practical Newton solvers for non-linear Stokes systems in geodynamic problems

discretization, the resulting non-linear is most commonly using a Picard ﬁxed-point it understood that Newton’s method when augmented by strategies to ensure convergence even from points far from the solution more efﬁcient and accurate than a Picard solver. In this we evaluate how a straightforward Newton method must be modiﬁed to allow for the kinds of rheologies common in geodynamics. Speciﬁcally, we show that the Newton step is not actually well posed for strain rate-weakening models without modiﬁcations to the Newton matrix. We derive modiﬁcations that guarantee well posedness and that also allow for efﬁcient solution strategies by ensuring that the top left block of the Newton matrix is symmetric and positive deﬁnite. We demonstrate the applicability and relevance of these modiﬁcations with a sequence of benchmarks and a test case of realistic complexity.

A simple and frequently used way to solve such non-linear problems is to use Picard iterations, a particular form of fixed-point iterations (Kelly 1995). In it, one computes the viscosity and density as a function of the previous iteration's strain rate and pressure, solves for a new velocity and pressure field, and then repeats the process. The Picard iteration owes its popularity to the fact that it is relatively easy to implement in codes that only support linear rheologies because it only requires the repeated solution of linear problems. It is also often globally convergent, that is, with sufficiently many iterations it is possible to approximate the solution of the non-linear problem regardless of the choice of initial guess. Consequently, it is the method that is likely used in the majority of mantle convection papers that actually iteratively resolve the non-linearity in each time step; most papers do not explicitly state so, but van Keken et al. (2008), Tosi et al. (2015), Buiter et al. (2016) and Glerum et al. (2018) are some examples.
On the other hand, Picard iterations are often slow to converge, requiring dozens or hundreds of iterations for strongly non-linear problems -something we also observe in our results in Section 3. This slow convergence may make the solution of non-linear problems to high accuracy prohibitively expensive. Consequently, commonly used approaches to cope with the high computational cost are, for example, limiting the allowed number of Picard iterations per time step (e.g. Lemiale et al. 2008), combining Picard iterations with small time steps to ensure good starting guesses (e.g. Ruh et al. 2013), or other mostly ad hoc approaches. In practice, however, many studies do not adequately document the exact algorithm used and how this affects the accuracy of the solutions of the equations considered.
Here, we address the slow convergence of non-linear solvers by replacing the Picard solver by a Newton solver (Kelly 1995). Previous applications of Newton's method to geodynamics problems can be found in Kaus et al. (2015), May et al. (2015), Rudi et al. (2015) and Spiegelman et al. (2016). Newton's method promises quadratic convergence towards the solution, compared to the linear convergence of the Picard iteration, when the initial guess is close enough to the solution of the non-linear problem and therefore offers the prospect of vastly faster solution procedures. On the other hand, the implementation of Newton's method is substantially more involved than a Picard iteration. Furthermore, requiring an initial guess that is close enough to the exact solution is often impractical, and may require running a number of initial Picard iterations before starting the Newton iteration.
In this paper, we present the details of an improvement on the Newton method for the non-linear Stokes problem, and discuss an implementation of this improved Newton solver along with recommendations on how to use it. Specifically, and going beyond what is available in the literature, we will show that a naive application of Newton's method may break both the symmetry and the positive definiteness of the elliptic part of the (linearized) Jacobian of the Stokes operator. While the lack of symmetry is annoying from a practical perspective because it makes the solution of the linear system associated with each Newton step more complicated, a lack of positive definiteness implies that the Newton step is ill-posed and may not have a solution. We will analyse both of these issues in detail and propose modifications to the Newton equations that retain the symmetry and restore the positive definiteness. We will also consider whether there are special classes of material models where these modifications are not necessary. Unfortunately, as we will show, many rheologies that have been used extensively in the literature do not fall into these classes; our methods are therefore strict improvements over the current state of the art and will allow solving problems that were not previously solvable with an unmodified Newton method.
While there are previous reports on using a Newton method for Stokes problems in geodynamics applications (see e.g. May et al. 2015;Rudi et al. 2015;Kaus et al. 2015;Spiegelman et al. 2016), we will provide a more in-depth discussion of the mathematical properties of the operators and linear systems associated with each Newton step. We will underpin our claims with numerical experiments and demonstrate that the approach advocated herein is, indeed, more efficient and robust than previous approaches. In particular, we will show that our implementation of the Newton solver significantly decreases computational time for realistic problems, with greatly improved accuracy. Our implementation is available as open source as part of the ASPECT code (Kronbichler et al. 2012;Heister et al. 2017), an open source geodynamics community code.
The layout of the remainder of this paper is as follows: we will first describe the mathematical formulation of the non-linear Stokes problem we consider here, its discretization, and linearization in Section 2. This section also contains our main results on how the Newton method has to be modified ('stabilized') in order to make it well posed, as well as a discussion of practical aspects of how this method can be embedded in efficient non-linear and linear solvers. We then show how the above works in practice in Section 3, first using three artificial test cases and then using a realistic application of modelling subduction. We conclude in Section 4.

The model
Let us begin by concisely stating the equations we want to solve herein. We are concerned with modelling convection in the Earth mantle, a process that is typically described by a coupled system of differential equations. Under commonly used assumptions -see for example Schubert et al. (2001) -typical models include a Stokes-like, compressible fluid flow system for the velocity u and pressure p defined in the volume ⊂ R d (where the space dimension d = 2 or 3) under consideration, − ∇ · (ρu) = 0 i n , Downloaded from https://academic.oup.com/gji/article-abstract/218/2/873/5475649 by University Library Utrecht user on 16 September 2019 where η is the viscosity, ρ the density, g the gravity vector, ( · ) denotes the symmetric gradient operator defined by ε(v) = 1 2 (∇v + ∇v T ) and I is the d × d identity matrix. (The sign in eq. (2) is chosen in this way because −∇ · is the adjoint operator to the gradient in the first equation, leading to a symmetric system if the density is constant, as shown below.) While these equations describe a compressible model, we will assume for the purposes of this paper that the fluid is in fact incompressible, that is, that ∇ · (ρu) = ρ∇ · u = 0. We do so because we can illustrate all difficulties associated with the Newton method using this simplification already, and because many of the approximations used in geodynamics (e.g. the Boussinesq approximation) also assume incompressibility. In addition to this simplification, we have to scale the equations to ensure that we can numerically compare the residuals of the two equations and consequently have a basis for numerically stable algorithms. Consequently, we multiply the second equation by a constant s p = η 0 L where η 0 is a 'reference viscosity' and L a length scale of the domain we are solving the equations in. (See Kronbichler et al. 2012 for a more detailed discussion.) In order to retain the symmetry between the divergence in the second equation and the gradient in the first, we also replace the pressure by a scaled version,p = 1 sp p. The properly scaled, incompressible equations then read as follows: It is this form of the equations we will attempt to solve, using u,p as the primary variables. Of course, the physical pressure can be recovered as p = s pp after the system has been solved. In geodynamic models, the fluid flow model is coupled to an equation for the temperature T, and possibly other equations that describe the transport of chemical compositions. Here, C p is the specific heat, α T is the thermal expansion coefficient, k the thermal conductivity, H is the internal heat production and S and X are related to the entropic effects of phase changes. All coefficients that appear in these equations typically depend on the pressure, temperature, chemical composition and -in the case of the viscosity -the strain rate ε(u). Even though the entire system is coupled in non-linear ways, in this paper, we will only concern ourselves with the first set of these equations, (3) and (4), and how they can efficiently be solved through a Newton scheme. In principle, one may want to solve the entire system with a Newton scheme, given that the velocity appears in eq. (5), the temperature in eqs (3) and (4) via the temperature dependence of the viscosity and density, and more generally all coefficients may depend on pressure and temperature. While this is beyond the scope of the current paper, being able to apply a Newton method to the Stokes subsystem is clearly a necessary ingredient to the larger goal. Consequently, the efficient solution of non-linear Stokes problems is of interest in itself. As we will show, this alone is not trivial, and will therefore serve as a worthwhile target for the investigations in this paper. In fact, the incompressible formulation already poses all of the mathematical difficulties we will encounter in deriving well-posed Newton schemes. In other words, it serves as a good model problem to illustrate and understand both difficulties and solutions related to the linearization. The incorporation of compressible terms (i.e. solving eqs 1 and 2) would then only complicate the exposition of our methods. At the same time, we point out that our methods immediately carry over to compressible modelsalbeit with significantly more cumbersome formulae; we will investigate this generalization in future work.

Discretization
We convert eqs (3) and (4) above into a finite-dimensional system by utilizing the finite-element method for discretization. To this end, we seek approximations where ϕ u j and ϕ p j are the finite-element basis functions for the velocity and pressure, respectively. The expansion coefficients U j ,P j are found by solving the discrete weak form of the equations. Discretization of the incompressible system then leads to a non-linear system in X = (U,P), where the matrix Q and right-hand side b have an internal substructure. For our incompressible formulation, this substructure has the form Here, the matrix and right-hand side blocks are defined as where as usual we denote (α, β) = α(x) β(x) dx. Because the viscosity η may depend on the pressure and strain rate, and the density ρ on the pressure, the system is in general non-linear in the coefficients U j ,P j as both A = A(X) and f = f (X). (The coefficients η, ρ may of course also depend on the temperature or other factors, but we consider these fixed for the purposes of the current paper.) Much of the content of this paper is concerned with the question of how to solve the non-linear system (8) in practice, that is, how a naive application of the standard Newton iteration solver needs to be adapted to make it practical and efficient.

Newton linearization
In order to resolve the non-linearity in eq. (8), let us introduce the residual r(X) = Q(X)X − b(X). In Newton iteration k + 1, starting with the previous guess X k , we then need to solve where r k = r(X k ) and J k = ∇ X r(X k ). This system has the internal substructure After solving for δ X k , we can compute X k+1 = X k + α k δ X k where α k is a step length parameter that can be determined, for example, using a line search (Kelly 1995;Nocedal & Wright 1999).
There are a number of approaches to determining the entries of the matrix J k and to solving the resulting linear system. For example, in the geodynamics community alone, May et al. (2015) and Kaus et al. (2015) make use of a Jacobian-free Newton-Krylov framework (see Knoll & Keyes 2004), which essentially computes a finite-difference approximation of J by evaluating r at different values of its argument, and integrates this directly into the solver so that the full Jacobian matrix is never built. On the other hand, Rudi et al. (2015) and Spiegelman et al. (2016) use the same approach as we will take here and compute derivatives analytically or semi-analytically, except that Rudi et al. (2015) implemented this in a Jacobian-free manner.
Regardless of how exactly these derivatives are computed, the blocks of the linear system for the Newton updates will have to have the following form (again omitting dependencies on quantities we consider frozen, such as the temperature): It is easy to see that -as expected -the Newton system (13) reverts to the simple Stokes problem if the viscosity does not depend on strain rate or pressure, that is, if the system is linear.
Downloaded from https://academic.oup.com/gji/article-abstract/218/2/873/5475649 by University Library Utrecht user on 16 September 2019 As we will show below, while eq. (12) (and its block structure 13) is the correct linearization of the (discretized) original, non-linear system (3) and (4), it turns out that this does not necessarily lead to a well-posed problem. This is not uncommon in optimization problems where a function f (x) may have a well-defined minimizer, but the Hessian matrix H k = ∇ 2 f (x k ) at early iterates may be singular or have negative eigenvalues; consequently, the solution of the linear system H k δx k = −∇ f (x k ) may not have a solution δx k or the solution may not be a direction of descent. There are standard techniques described in the optimization literature for these cases (see e.g. the section on 'Hessian modification' methods in Nocedal & Wright 1999) that we will adapt in the following sections, though we will work at the level of the partial differential equations that give rise to the Newton matrix, rather than at the algebraic level of the matrix we wish to modify. Furthermore, the linear system we obtain in each Newton step may be difficult to solve for practical reasons if it is not symmetric.
We will therefore discuss the practical implications of Newton linearization in Sections 2.4 and 2.5 below, along with remedies to the problems we identify. It is important to stress that the modifications we propose only change the matrix J k in eq. (12) but not the right-hand side. As a consequence, we can hope that the iterates X k still converge to the correct solution X of eq. (8), and this is indeed the case in our numerical experiments as we observe that r k → 0 as the iterations proceed. In other words, we replace an exact (though potentially ill-posed) Newton iteration by an approximate (and well-posed) Newton iteration, but we continue to solve the original physical problem.

Restoring symmetry of J uu
Even for incompressible models, given the form of the individual blocks in eqs (14)-(16), the Newton system (13) is in general not symmetric. This is despite the fact that the matrix Q in the non-linear model (8) and in particular A in eq. (9) are of course symmetric, as shown in eq. (10).
On the other hand, symmetry of matrices is an important property from a practical perspective because it allows for the construction of efficient solvers and pre-conditioners. As a consequence, we advocate replacing eq. (12) by an approximation. This of course yields a different Newton update δx k and may destroy the quadratic convergence order of the Newton method. On the other hand, we retain our ability to construct efficient solvers and pre-conditioners; in practice, one does not often run a large number of Newton iterations in each time step, and consequently a reduction from quadratic to possibly only superlinear convergence order may be acceptable. As pointed out above, we do not modify the right-hand side of the Newton update equation and consequently converge to the solution of the original non-linear problem.
Specifically, then, we advocate for the following approximation of eq. (14): This approximation ensures that the top left block in eq. (13) is indeed symmetric, and as we will see below, this and the modification discussed in the next section will then allow for the construction of efficient, multigrid-based pre-conditioners and the use of the Conjugate Gradient method. Indeed, the modification simply symmetrizes the second term in eq. (14). In order to analyse the effect of the underlying approximation, it is useful to rewrite the original term in eq. (14) in sum notation: . Clearly, the matrix J uu k is symmetric if the tensor E is symmetric, that is, E mnpq = E pqmn , but this is not always the case. (By its definition, we already have E mnpq = E nmpq = E mnqp .) The modification we propose is equivalent to explicitly symmetrizing this tensor, that is, replacing E mnpq by E sym mnpq = 1 2 E mnpq + E pqmn and replacing the matrix in eq. (14) by It is instructive to consider whether there are cases in which the tensor E is already symmetric, and replacing it by its symmetrized version consequently does not change anything. Specifically, this is the case if the viscosity η(ε(u)) can be written as a scalar function of the square of the strain rate, that is, η(ε(u)) = f ( ε(u) 2 ) where ε 2 = i j ε 2 i j . In this case, the chain rule implies that We then have that E(ε(u)) mnpq = 4 f ( ε(u) 2 )ε(u k ) mn ε(u) pq , which satisfies the desired symmetry condition. Furthermore, for incompressible materials, we have that trace ε(u) = div u = 0, and in that case, the second invariant of the strain rate can be simplified to I 2 (ε(u)) = 1 2 (trace ε(u)) 2 − trace (ε(u) 2 ) = − 1 2 trace (ε(u) 2 ) = − 1 on the second invariant then also satisfies the criteria for cases where the explicit symmetrization does not actually change anything. Indeed, many incompressible material models define the viscosity only in terms of the second invariant of the strain rate, see, for example, Schellart & Moresi (2013). [We note that the geodynamics literature uses varying definitions for the second invariant. In contrast to the one used above, some papers use the definition I 2 (ε(u)) = 1 2 ε(u) : ε(u) 1/2 = 1 2 ε(u) 2 1/2 -see, for example, Gerya (2010, p. 56) or May et al. (2015). However, even with this convention the second invariant is a function of the square of the norm of the strain rate, and the conclusion above about material models that are functions of only the second invariant of the strain rate remains valid.] We end by pointing out that the entire Jacobian remains non-symmetric since, in general, J up = ( J pu ) T because of the added term due to the derivative of the viscosity with regard to the pressure (see eqs 15 and 16). We will come back to this in Section 2.6.2.

Restoring well posedness of the Newton step
The Stokes-like system (13) that arises from Newton linearization can only be well posed if the top left block is invertible. However, it turns out that this is not always the case, as we will see shortly. It is important to realize, however, that a lack of well posedness of the Newton step is not equivalent to a lack of well posedness of the original, non-linear problem from which it arises. Indeed, it is easy to conceive of situations where a Newton method applied to finding solutions of 1-D equations f(x) = 0 fails because one of the intermediate iterates x k happens to land at a location where f (x k ) = 0 and the next iteration fails because there is no δx k so that f (x k )δx k = −f(x k ). In multiple dimensions, and in particular in the case of the infinite-dimensional operator from which the top left matrix block J uu is derived, the situation is clearly more complex, but not much more complicated to understand.
To this end, recall that after the symmetrization discussed in the previous section, the matrix J uu has entries where the rank-4 tensor (I ⊗ I) i jkl = δ ik δ jl maps a symmetric rank-2 tensor onto itself. A sufficient (though not necessary) condition for the matrix J uu to be invertible (i.e. to have no zero eigenvalues) is if the corresponding differential operator, −∇ · [Hε(•)] is elliptic. This is the case if and only if the tensor H (as a map from rank-2 symmetric tensors to rank-2 symmetric tensors) has only positive eigenvalues, that is, if ε: (Hε) > 0 for all symmetric, non-zero rank-2 tensors ε. (We provide a bit more mathematical background for this connection between the coefficient H and the ellipticity of the corresponding differential equation in * * Appendix A.) Informally, for a strain hardening material model, ∂η(ε(u)) ∂ε is positive, and then so is 2η(ε(u))I ⊗ I + E sym (ε(u)) because E sym is a positive correction to the already positive-definite tensor 2η I ⊗ I. In other words, H would then be positive as would the differential operator, and J uu would be an invertible matrix. The same would be true if the material model is strain weakening and if the amount of weakening is 'small enough' because then the 'small correction' E sym does not offset the positive definiteness of 2η I ⊗ I. That said, we will need to be more formal with arguments as we are dealing with tensors instead of scalars; the remainder of the section is therefore devoted to formalizing these arguments and providing a solution to the problem.
Specifically, given the definitions above, the tensor H can be written as that is, H is a rank-2 update of a multiple of the identity operator. The first of these three terms has all eigenvalues equal to 2η, and the other two terms then lead to a perturbation of two of these eigenvalues corresponding to eigendirections that are spanned by ε(u) and ∂η(ε(u), p) ∂ε . As mentioned above, unless a material's strain weakening rate is sufficiently small, these perturbations may be strong enough to make one or both of the perturbed eigenvalues negative, and in this case the Newton step fails to be well posed.
To avoid this, we introduce a tensor where 0 < α ≤ 1 is chosen in such a way that H spd is positive definite. Using this modified form of H spd at every quadrature point at which we perform the integration of the bilinear form for the Newton matrix, we then build the matrix J uu used in the iteration. As before, since we do not change the right-hand side of the Newton update equation, we converge to the solution of the original non-linear problem. Clearly, if α = 0, then H sym is the identity operator times 2η and has positive eigenvalues. Because the eigenvalues depend continuously on α, there must be an α > 0 so that H spd is indeed positive definite. Ideally, to retain the convergence rate of Newton's method, we would like to choose α = 1. We therefore propose the following choice: we want to choose α so that (i) we have α = 1 if H is already positive definite, and (ii) α is as large as possible so that H spd is positive definite. In practice, however, we will also choose α small enough to avoid the case where one of the eigenvalues of H spd is positive but very small compared to 2η, to avoid the numerical difficulties resulting from trying to solve a linear problem with a poorly conditioned matrix J uu .
Downloaded from https://academic.oup.com/gji/article-abstract/218/2/873/5475649 by University Library Utrecht user on 16 September 2019 It turns out that we can use the rank-2 update form of H and H spd to explicitly compute the value of α. Let us abbreviate E sym = a⊗b + b⊗a where a = ε(u) and b = ∂η(ε(u), p) ∂ε . Then, it is clear that the (non-trivial) eigenvectors of E sym must lie in the plane spanned by a, b, that is, have the form v = cos(θ ) a a + sin(θ ) b b . The two non-trivial eigenvalues of E sym are then the extremal values of the Rayleigh quotient Thus, the eigenvalues of E sym are given by 2 The only eigenvalue of H spd we have to worry about becoming negative is therefore the one associated with the (possibly) negative This implies that we can choose the damping factor α as follows to ensure positive semidefiniteness: otherwise.
In practice, we would like to stay well away from a zero eigenvalue and instead choose α as follows: where 0 ≤ c safety < 1 is a safety factor that ensures that the smaller eigenvalue of H spd is at least (1 − c safety )2η and thus bounded away from zero. This computation is easily performed at every quadrature point during the assembly of J uu . This procedure then guarantees that the resulting matrix is symmetric and positive definite, implying that the Newton direction is well defined. It is again instructive to consider whether there are cases where we can always choose α = 1, that is, use the unmodified Newton step (possibly up to the symmetrization discussed in the previous section). The simplest case is if a: b = a b because in that case the definition of α in eq. (18) always ends up in the first branch, regardless of the size of η(ε(u)). Given the definition of a, b, this is specifically the case if Similarly to the discussion in the previous section, this is the case if η(ε(u)) = f ( ε(u) 2 ) and if f ≥ 0, that is, for a strain-hardening material. It is not difficult to show that this extends to the case where the viscosity is given by a non-decreasing function η(ε(u)) = f ( Pε(u) 2 ) where P is an orthogonal projection applied to the strain rate; an example is the operator that extracts the deviatoric component of the strain rate.
A more interesting case is where the material exhibits strain weakening. In that case, intuitively the conditions in eq. (18) imply that we can only choose α = 1 if the material 'weakens slowly enough'. Let us, for example, consider the class of materials for which η(ε(u)) = η 0 [I 2 (ε(u))] 1 n −1 . Such laws are typically used to describe either diffusion (n = 1) or dislocation creep (n > 1), see Karato (2012). Indeed, we show in Appendix B that in these cases one has to always choose α < 1 if n exceeds a certain threshold.

Algorithms for the solution of the non-linear problem
The discussions of the previous sections show that a naive application of Newton's method may lead to matrices that are neither symmetric nor positive definite. Indeed, in some cases the equations for the Newton update may not be well posed at all (see e.g. the discussion in Appendix B), even if the original, non-linear model has all of these properties. The remedies outlined above restore symmetry, positive definiteness and well posedness, and consequently lend themselves for a practical implementation. On the other hand, the resulting equations for the update are different from the ones obtained by linearizing the residual, and consequently we may not be able to expect quadratic convergence of the resulting non-linear iteration. Indeed, this is what we will observe in the experiments we show in Section 3. Regardless, the modifications have to be incorporated into an actual algorithm to solve the non-linear problem. The algorithm we propose for this -which is also the one implemented in the ASPECT code (Kronbichler et al. 2012;Heister et al. 2017) -is therefore outlined below. As for many other non-linear problems, it is not easy to universally achieve convergence, and the resulting algorithm is therefore complicated.

Non-linear iteration
As with many other non-linear problems, it is not generally possible to solve the non-linear Stokes equation we consider here using only a Newton iteration. Rather, we use a strategy where we use the following sequence to solve the non-linear Stokes problem in each time step: (i) We always use one initial Picard step. That is, we solve the original Stokes equations in which we 'freeze' all coefficient using values for the strain rate and pressure extrapolated from previous time steps; this corresponds to solving Q(X)X 1 = b(X) (in analogy to eq. 8) wherẽ X is the extrapoled solution. This allows us, in particular, to enforce the correct boundary conditions on all boundaries where the velocity is prescribed.
(ii) We then solve N DC ≥ 0 steps using the Picard method written in defect correction (DC) form. This corresponds to eq. (12) if one were to omit all terms that contain derivatives of η in the definition of the blocks in eqs (14)-(16). Equivalently, this corresponds to solving an update form of eq. (8) It is well known that the Picard iteration is more stable than a pure Newton method and often converges even in cases where Newton's method does not. It therefore allows us to compute an iterate close enough to the exact solution from which we can then successfully start the Newton iteration. (For this second set of iterations, we use the DC form because the updates δ X k then have a zero velocity on all boundaries where the velocity is prescribed.) (iii) We continue with full Newton steps, that is, we attempt to solve the unmodified Newton equations stated in eq. (12) with blocks defined as in eqs (14)-(16). We know that these equations will eventually lead to quadratic convergence, but they may not be symmetric, positive definite, or even solvable. Consequently, the linear solvers we will discuss in the next subsection may fail to converge.
(iv) If the linear solver failed in one of the previous, unmodified Newton steps, we continue with Newton-like steps that modify the matrix blocks as shown in eqs (14)-(16) by the methods of Sections 2.4 and 2.5. By construction, the resulting linear system is then guaranteed to be invertible, and indeed our linear solvers always succeed in our experiments.
These iterations are terminated once the non-linear residual r k has been reduced by a user-defined factor compared to the starting non-linear residual at the beginning of each time step. We use a line search (see Kelly 1995) to determine an acceptable step length for all Newton-type steps to further globalize convergence.
In addition to the outline above, we have tried a method suggested to us by Riad Hassani (private communication, 2017) in which the switchover between Picard DC iteration as defined above in (ii) (corresponding to using a Newton matrix in which we have dropped all terms involving derivatives of the coefficients) and Newton iterations (i.e. the same blocks but including the derivative terms) is done gradually by scaling the derivatives in overall iteration k by a factor c k between zero and one. We will in the rest of this paper refer to this as the residual scaling method (RSM). The initial N DC iterations can then be interpreted as using c k = 0, after which we choose where r k is the current non-linear residual and r N DC the residual in the first iteration after switching to the Newton or Newton-like method. This choice guarantees that c k ≈ 1 once Newton's method has reduced the residual significantly, that is, once we are close to the solution.
This variation often allows us to choose N DC smaller, that is, to try a method with a faster convergence rate earlier in the process. On the other hand, it sometimes requires more Newton-type iterations. Using this variation leads to somewhat mixed improvements over the strategy outlined above, as will be shown in our numerical results below.

Linear solvers
Regardless of whether we solve the Picard or any of the Newton-type problems above, we always end up with having to solve a linear system with the same block structure as eq. (13) in each non-linear step. This problem may or may not be symmetric, and the top left block J uu may or may not be positive definite. However, regardless of the these details, we use variations of the solvers discussed in Kronbichler et al. (2012) and Heister et al. (2017) to solve the linear problem.
More specifically, we use F-GMRES as the outer solver, with the following matrix as a pre-conditioner: where a tilde indicates an approximation of the matrix under the tilde, and S = J pu ( J uu ) −1 J up is the Schur complement of the system. Specifically, motivated by the discussions in Kronbichler et al. (2012) and Heister et al. (2017), we use the following approximations for each of these blocks: (i) ( J uu ) −1 : we approximate this matrix using either one multigrid cycle or a full solve with an approximation J uu of J uu that is constructed in a similar way as discussed in Kronbichler et al. (2012). In addition, because both multigrid and the Conjugate Gradient method used here require J uu to be symmetric and positive definite, we always apply the modifications of Sections 2.4 and 2.5, even if they are not applied to J uu itself.
(ii) S −1 : this block is an approximation to the inverse of the Schur complement S = J pu ( J uu ) −1 J up . Like for the original Stokes problem, the appropriate approximation is to use is the mass matrix on the pressure space scaled by the inverse of the viscosity; the inversion of M p is facilitated by a Conjugate Gradient solve. The approximation S −1 = M −1 p is known to be good if J pu = ( J up ) T , see Silvester & Wathen (1994). On the other hand, this is not the case if the viscosity depends on the pressure, given the additional term in eq. (15). However, the difference between the two matrices is small if the viscosity does not strongly depend on the pressure. This is, in fact, a commonly made assumption, at least for deep Earth mantle models, though it may not be valid for crustal models that employ pressure-dependent plasticity models. It is conceivable that one can construct a better approximation for -leading to fewer outer F-GMRES iterations -by also incorporating the viscosity derivative terms somehow, but we did not pursue this direction as it is tangential to the purpose of this paper.
It is, in general, not necessary to solve the linear systems in the first few non-linear iterations with high accuracy. Rather, without significant loss of non-linear solver performance, one can solve with a loose tolerance and terminate F-GMRES substantially earlier. Consequently, we have implemented both choices 1 and 2 of Eisenstat & Walker (1996) for stopping criteria for the linear solver, where for choice 2 we followed Kelly (1995) in using γ = 0.9 and α = 2. For the definition of these symbols see the original paper. We noticed for some of the problems that the difference between these two approaches where significant, where the first choice allowed for a much loser tolerance. Eisenstat & Walker (1996) stated that choice one represents a direct relation between the Newton right-hand side F and its local linear model at the previous non-linear iteration, while choice two is only an approximation of this. Therefore, we have chosen the first of these approaches for this paper.

Computation of derivatives
Implementations of Newton solvers require concrete implementations of the formulae for the derivatives ∂η(ε(u)) ∂ε and ∂η(ε(u)) ∂ p . These can be computed either using simple finite-differencing approaches or analytically. Fortunately, even for relatively complicated material models, exact formulae for these derivatives can be derived with modest effort. Examples for the material models we consider in our numerical results below are provided in Appendix B.

N U M E R I C A L E X P E R I M E N T S U S I N G C O M M O N B E N C H M A R K S
In this section, let us illustrate the performance of the methods layed out above, using several benchmarks that vary both in which specific elements of the solver they test as well as in the difficulty they present to solvers. In particular, we will assess whether and how fast different variations of our algorithms converge. This includes ensuring that the non-linear residual can be reduced to any small value desired. Furthermore, we will investigate optimal values and relative trade-offs for a variety of parameters that affect the non-linear solver scheme, as discussed in Section 2.6.
The benchmarks we describe here have all been used for similar purposes in the literature. Details of all of our experiments are, sometimes in a simplified form, also part of the ASPECT test suite. All codes necessary to run these experiments are available among the benchmarks included in ASPECT releases starting from version 2.1. The ASPECT repository can be found at https://github.com/geodynamics/aspect.

Non-linear channel flow
The simplest non-linear Stokes flow one can think of is probably a generalization of incompressible Poiseuille flow to include a strainrate-dependent viscosity. In it, one forces a fluid through a pipe or channel where the velocity is zero at the pipe sides and in-and outflow velocities are prescribed in such a way that the result is a flow field parallel to the pipe axis and constant in the along-pipe direction. The across-pipe variation of the velocity field can then be computed easily once a rheology law is chosen, leading to an analytically known flow field from which the in-and outflow boundary conditions can also be drawn either via prescribed velocities or prescribed tractions.
A visualization of the solution can be found, for example, in Turcotte & Schubert (2002) and Gerya (2010).

Setup
We use the 2-D benchmark setup of Gerya (2010, section 16.4). In it, the viscosity is chosen in accordance with a power-law approach as using the definition of the second invariant found in Gerya (2010, p. 59, equation 4.14). Here, C is a prefactor, and n is a stress exponent that allows for easy tuning of the non-linearity of the problem. The model geometry we use here is a box of 10 000 m × 8000 m, subdivided into 16 × 16 cells; we use quadratic finite elements for the velocity.

Results
Figs 1 and 2 show results for a number of methods and settings when the in-and outflow boundary conditions are either prescribed through tractions or velocity values. The latter turns out to generally be a more difficult problem to solve, but all methods eventually converge to a residual whose size is related to the tolerance with which we solve the linear systems. Fig. 1 shows that for this problem, when boundary values are given as tractions, line search is neither necessary nor useful, and similarly it is not necessary to run many initial Picard iterations to get close enough to the solution for the Newton method to start working. In addition, the Newton matrix modifications of Section 2.5 (right two panels of Fig. 1) actually destroy the quadratic convergence rate of Newton's method and result in only linear convergence as speculated at the beginning of Section 2.6 -though with a substantially better linear rate than Picard iterations.
On the other hand, Fig. 2 shows that for the more complicated problem when the flow is driven by prescribed velocity boundary conditions, either a line search method or sufficiently many initial Picard iterations are necessary to achieve convergence. Alternatively, the Downloaded from https://academic.oup.com/gji/article-abstract/218/2/873/5475649 by University Library Utrecht user on 16 September 2019 Figure 2. Non-linear channel flow benchmark: convergence history for several methods for a rheology with n = 3 where in-and outflow are described by prescribing the velocity. Panels as in Fig. 1. matrix modifications also yield a convergent scheme. The RSM in conjunction with the matrix modifications appears the most robust method, though not always the fastest. Indeed, as explained in Appendix B1, a stress exponent of n = 3 causes the matrix modifications to always scale down the derivative terms in the Newton matrix, resulting in a similar effect as the RSM.
In all cases, a pure Picard iteration always converges linearly, though at a rate that is not competitive with well-designed Newton iterations.

Spiegelman et al. benchmark
The Spiegelman et al. benchmark (see Spiegelman et al. 2016) is an extended form of the brick benchmark of Lemiale et al. (2008) and focuses on solving for the behaviour of a material with plastic rheology under compression.

Setup
The benchmark specifies two layers, see Fig. 3. The lower layer, which includes a regularized weak seed, has a constant viscosity. The upper layer has a viscosity given by the harmonic mean η eff = η 1 ηp η 1 +ηp . Here, η 1 is the background viscosity of the upper layer, and where p lith is the depth dependent lithostatic pressure and p = p − p lith is the dynamic component of the total pressure. η 1 can have three different values: 1e23, 1e24 and 5e24 Pa s. For von Mises plasticity, one would choose A = C, B = 0 where C is the cohesion of the material; in this case, the viscosity is strain rate but not pressure dependent. For a depth-dependent von Mises-type model, one would choose A = Ccos (φ), B = sin (φ) and α = 0 where φ is the friction angle; in this case, the viscosity depends on the static, lithospheric pressure but not the dynamic pressure component. Finally, Drucker-Prager plasticity fits this formula with A = Ccos (φ), B = sin (φ) and α = 1 where the viscosity now depends on both the strain rate and the (total) pressure p = p lith + p . We will only consider the von Mises and Drucker-Prager cases of the Spiegelman benchmark because these are the most interesting ones. The benchmark is completed by prescribing an inbound velocity (2.5, 5 and 12.5 mm yr −1 ) on the two sides of the geometry, requiring tangential flow at the bottom, and no stress at the top, allowing material to leave the domain. The three options for the inbound velocity and the three options for the reference background viscosity together form a set of nine test cases whose difficulty increases with velocity and reference background viscosity. In the original paper, an unstructured grid of stable Taylor-Hood elements was used. Based on their fig. 1, the results shown there should, based on the length of the edges, correspond to a uniform ASPECT mesh that has been refined globally approximately 7 or 8 times (i.e. 512 × 128 or 1024 × 256 cells). This does not, however, account for details of the unstructured mesh used in Spiegelman et al. (2016).

Results
We compare the results of our Newton implementation to the results of Spiegelman et al. (2016) and the pre-existing Picard solver in ASPECT. The von Mises case results are quite similar to the results from Spiegelman et al. (2016); consequently, we will focus on the more difficult Drucker-Prager case. We have exhaustively explored the space of parameters affecting the non-linear solver (see Section 2.6) at different mesh resolutions. In general, we found that higher mesh resolution (more refinement steps) made the problem more difficult to solve. As expected, the benchmarks also become more difficult as the prescribed velocity u 0 at the boundary is increased, leading to a larger strain rate and more pronounced non-linearity. This is visible in Fig. 3 where we also show the viscosity and the α factor necessary to keep the matrix positive definite. This factor drops to approximately one-half in the vicinity of the shear band -a result consistent with the theoretical considerations discussed in Appendix B3.
Figs 4 and 5 show results obtained for four and eight global mesh refinement steps, corresponding to meshes with 64 × 16 and 1024 × 256 cells, respectively. A comparison shows that the problem is indeed more difficult to solve on the finer meshes. Without enforcing the symmetry and positive definiteness of the top left block of the Jacobian, the linear solver converges quickly, but also crashes easily in a number of configurations because the matrix lacks the necessary structural properties; this is particularly the case for u 0 = 12.5 mm yr −1 . On the other hand, enforcing these properties on the matrix leads to some loss in speed of convergence of the non-linear iterations (because the computed search direction is no longer second order accurate), though we then also no longer encountered any linear solver failures.
We noticed that the runs without the matrix modifications are very sensitive to changes in many of the solver parameters. In the following, let us discuss a number of settings that we have found useful when working with the unmodified matrix, though for many cases the linear solver still fails with these settings. In particular, using a few line search iterations may help reduce the amount of iterations; however, allowing the step length parameter to decrease too much generally leads to too small steps and slow overall convergence. A good default value for the maximum number of line search step length reductions appears to be 5 or 10. We also found that performing 5-10 Picard iterations before switching to Newton iterations is a good number. Setting the relative linear tolerance (LT) of the GMRES solver rather strict typically results in fewer outer iterations, but at the price of more inner iterations that then increase the run time per outer iteration. A good compromise for this parameter is to require that the linear residual is reduced to a factor of 0.1 or 0.01 of its initial value. Using the RSM, that is, determining the linear solver tolerance automatically, usually requires one or two more outer iterations, but greatly decreases the chance that the linear solver will fail.  On the contrary, enforcing the J uu to be symmetric and positive definite will yield a very different behaviour. As stated above, the linear solver then always converges. Furthermore, the sensitivity of the convergence history to all the parameters above is greatly reduced. This has the advantage that a very loose LT can be used without a significant penalty in the number of iterations; that said, and as is apparent from the figures, modifying the matrix may require tens of non-linear iterations more to converge.
We have obtained the best performance by combining the two methods: we do not enforce the symmetry and positive definiteness of the matrix in the first non-linear iterations until (if) the linear solver fails; after this, we continue with the matrix modifications. This strategy combines the fast convergence of the unmodified method with the stability and robustness of the modified one.

Tosi et al. benchmark
The Tosi benchmark of Tosi et al. (2015) is designed as a community benchmark for mantle flow based on non-linear rheologies featuring a temperature, pressure and strain rate-dependent viscosity. Here, we specifically consider case 4 from Tosi et al. (2015), which seeks the steady state (at a large, unspecified end time) of a time-dependent problem. Unlike the two previous benchmarks, this benchmark has a temperature field that is coupled to the viscosity and therefore evolves over time.

Setup
The benchmark is posed in a 2-D square unit box with all free slip boundaries and an initial temperature given by T(x, z) = (1 − z) + Acos (π x)sin (π z). The viscosity is chosen as the harmonic mean where the two components of the viscosity are defined as a linear but depth-and temperature-dependent viscosity as well as a plastic yield criterion respectively: Here, η * is the constant effective viscosity at high strain rate, and σ Y the yield stress. Numeric values for all of these constants can be found in the original paper.

Results
The original paper does not contain convergence plots. Consequently, we can only compare between the methods available in our reference implementation. Specifically, these are: (i) a method whereby we solve the advection equation, then the Stokes equation with frozen coefficients, and then iterate these two steps out until we have reached convergence for the current time step; we will refer to this scheme as 'Picard' even though this stretches the term (as, strictly speaking, a 'Picard' iteration for the coupled system would solve both the Stokes and advection problem linearized around the previous solution; in our implementation, the Stokes system is linearized around the already computed advection solution). In ASPECT, this scheme is called 'iterated advection and Stokes'. (ii) A method where we first solve the advection equation and then do one Newton step on the non-linear Stokes system; again, these two parts are iterated out in each time step. We will refer to this method as 'Newton'; in ASPECT, it is called 'iterated advection and Newton Stokes'. Fig. 6 shows results for this benchmark, where the horizontal axis indicates the number of the non-linear iteration performed. Each spike corresponds to a time step starting at a large residual that is gradually decreased. A method that converges quickly shows a steeper decrease, requires fewer non-linear iterations, and can consequently fit more time steps (spikes) into the same number of non-linear iterations. Because the computational effort is largely confined to building and solving the linear systems, the horizontal axis also corresponds closely to the elapsed wall time.
The different panels of the figure can be summarized as follows: (i) the Newton method converges much faster than Picard iterations. For example, after the initial few time steps, the Newton method (with and without matrix stabilization) only requires two non-linear iterations per time step, whereas the Picard iteration requires six. This also translates to a speed up in wall time of around the same factor. (ii) Matrix stabilization is not necessary for this benchmark and in fact leads to a slight but not substantial degradation of performance. (iii) Convergence behaviour can differ substantially between timesteps (both for Picard and Newton). (iv) For this experiment, the DC form of the Picard iteration is slightly, but not substantially faster than the original Picard iteration.

A 3-D subduction test case
In order to verify that our implementation is not only applicable to academic benchmarks, but also to settings that occur in geodynamic modelling, let us consider a 3-D simulation of oceanic plate subduction. The model is inspired by the geodynamic setting of the Caribbean region, and simulates a slab that subducts while the subducting plate has a motion oblique to the trench which causes the slab to be dragged laterally through the mantle, a motion called slab dragging (Spakman et al. 2018).

Setup
We situate our test case in a Cartesian box with dimensions of 3000 km in width and length and 1000 km in depth (see Fig. 7). The viscoplastic rheology includes dislocation creep, diffusion creep and plasticity. The viscosity is then given by where Here, a symbol of the form x stands for the corresponding property of either diffusion (if x = diff) or dislocation creep (if x = disl), ν is a constant factor which can be used to scale the rheology, R is the gas constant, T is temperature and P is pressure. A x are pre-factors, E x are the activation energies and V x are the activation volumes. The values of all of these parameters used for the simulations are listed in Appendix C. The model consists of two layers, the upper and lower mantle (above and below 660 km, see Fig. 7), that differ only in strength through a 100-fold increase in η diff and η disl by choosing ν diff and ν disl larger by a factor of 100 in the lower mantle. This means that the lithosphere is fully defined by temperature. This thermal lithosphere is divided into two regions: a U-shaped region representing an oceanic plate that surrounds a region representing a Large Igneous Province (LIP), both modelled by a plate (Fowler 2005) of thickness 95 km and the ridge far outside the domain. The slab is also 95 km thick and is divided into three segments in which an analytic temperature field is prescribed following McKenzie (1970). The first segment is 200 km long with a dip angle relative to the surface starting with 20 • that smoothly steepens to a dip angle of 30 • . The second segment is 150 km long and the dip angle has at the end of the segment smoothly increased to 70 • . The third, straight segment is 50 km long and has a constant dip angle of 70 • . We describe the fault zones between the oceanic plate and the LIP as thin, vertical regions with an elevated initial temperature. The U-shaped lithosphere has a prescribed boundary velocity of 1 cm yr −1 in each component of the horizontal directions (for direction, see the arrows on Fig. 7), and zero velocity in the vertical direction. The LIP has a prescribed boundary velocity of zero in all directions. Below the lithosphere we use open boundary conditions. The top is a free surface (Rose et al. 2017) and the bottom has a zero velocity boundary condition.
This model is discretized on a mesh that has a total of 153 046 cells, resulting in 39 38 115 velocity and 172 097 pressure unknowns (in addition to another 13 12 705 unknowns each for the temperature and a compositional field). All results shown below were obtained on the Dutch national cluster Cartesius. Each node is equipped with Intel Xeon E5-2690 v3 ('Haswell') processors and has 24 cores; we used 10 nodes and 20 MPI processes per node. The model is run with a Courant-Friedrichs-Lewy number of 0.1 and the time step size is limited to grow by a maximum of 25 per cent from one time step to the next. The time step sizes computed by all non-linear solver methods used below are essentially identical.
The experiments in previous sections show that it is in general not necessary to solve the linear systems in DC schemes (i.e. the DC version of Picard iterations, as well as Newton iterations) particularly accurately. As a consequence, we only use a linear solver tolerance for these methods that requires a reduction of the linear residual in the F-GMRES solve by a factor of 0.1. On the other hand, the initial Picard iteration solves for the solution, not an update, and consequently requires a substantially larger reduction of the linear residual; we use a factor of 10 −6 (the default value of ASPECT).

Results
We have performed this experiment using our implementation of the Newton method, as well as the Picard iteration and its DC variation. In the following, let us provide two perspectives on this comparison.
First, Fig. 9 shows how all three methods reduce the non-linear residual in each time step, for two selected periods of the simulation (from time steps 75 to 100, and 275 to 300). In the early phases of the simulation, all methods quickly converge the non-linear residual to the desired tolerance of 10 −6 , though even here, the Newton method requires fewer iterations. Interestingly, starting around time step 275corresponding to about 3.92 Myr of model time -the problem appears to become substantially more difficult to solve. This time corresponds to the break-off of the deeper part of the slab (the necking in Fig. 8), which then rapidly sinks, and the increased velocity implies a larger strain rate and consequently stronger non-linearity. Indeed, as Fig. 9 shows, both variations of the Picard iteration are then no longer able to converge the residual to the desired tolerance, even though we allow up to 250 iterations per time step. Any results of these simulations must then necessarily be suspicious. On the other hand, the Newton iteration continues to rapidly converge. These data therefore underline the stability and robustness of the Newton method, and illustrate that it can solve problems that are otherwise not solvable with reasonable effort.  A second perspective is shown in Fig. 10, where we plot the wall time necessary to solve the problem up to a given time step. The figure demonstrates that our implementation of Newton's method is approximately one-third faster than the other two methods for the time steps shown. The difference becomes particular notable after time step 170, where the subducting slab starts to thin under its own gravity. We did not include data beyond time step 275 because, as discussed above, the Picard variants do not converge any more after this point. Consequently, even though the curves would substantially diverge after this time step due to the far larger number of non-linear iterations taken by the Picard variants, the solutions would no longer be comparable.
These models mostly use ASPECT's default parameter values. It is possible that we could further optimize any of the methods shown by changing these values. Furthermore, we have also so far not put much effort into optimizing the efficiency of the Newton solver code itself. That said, both of these issues are outside of the scope of the paper.

C O N C L U S I O N S
Newton's method is generally considered the best method to solve non-linear systems, but it is also known to be difficult to make work in practice. In this contribution, we have demonstrated that a naive application of Newton linearization may lead to a system for the updates δ X k that may be ill posed even if the original non-linear problem is well posed. We have shown how one can modify the Newton system to guarantee a stable solution of the update equations. Specifically, we have modified the partial differential operators that give rise to the Newton matrix so that the matrix is symmetric and positive definite. Importantly, however, we do not modify the right-hand side of the Newton update equation, and consequently the iterates of the modified Newton method still converge to the solution of the original non-linear problem. We have also described the globalization strategies necessary to guarantee that the Newton method actually converges in nearly all cases, and demonstrated our methods using standard geodynamic benchmarks and a real application.
The results we have shown demonstrate that with this combination of methods, Newton's method (with globalization approaches and stabilization of the matrix) really is the better choice: it converges more rapidly, in fewer iterations, is robust, and takes less computational time. Furthermore, it can be applied to problems that are known to be very non-linear and difficult to solve, such as the Spiegelman et al. (2016) benchmark. Finally, we have applied this method to a rheologically and geodynamically complex, 3-D case of oceanic subduction that is sufficiently non-linear that the typical Picard iteration no longer converges in a reasonable number of steps. The Newton method we have discussed here not only converges, but does so in relatively few steps and with substantial savings in wall clock time.
There are, of course, cases where a simple Picard solver would have been sufficient. In those cases, however, it is worth pointing out that using a Newton method is not substantially more expensive than a Picard method: the assembly of the Newton matrix is marginally more complicated because of the terms involving the derivatives of the viscosity, but other than that the solution procedure is the same between the two methods because of the structurally very similar matrices involved. Consequently, it is reasonable to advocate for always using a Newton method instead of Picard iterations. Silvester, D. & Wathen, A., 1994.

A P P E N D I X A : T H E C O N N E C T I O N B E T W E E N E L L I P T I C O P E R AT O R S , W E L L P O S E D N E S S O F T H E N E W T O N U P DAT E E Q UAT I O N A N D E I G E N VA L U E S O F C O E F F I C I E N T S
The discussion in Section 2.5 made use of the fact that a positive definite coefficient H spd in the definition of the J uu block of the matrix implies that the underlying differential operator is elliptic, and that consequently the J uu block is invertible. Since this connection is not obvious unless one works daily with partial differential equations, let us discuss this step in slightly more detail in this appendix.
To this end, let us first note that we call an operator A acting on functions u, v ∈ V (where V is a function space) bounded and coercive if there are constants c > 0, C < ∞ so that the following two conditions are satisfied: Here, ·, · is the inner (or duality) product in V. It is well known from the theory of partial differential equations that such operators lead to unique solutions of the equation Au = f , see Brenner & Scott (2002).
In the context of second-order partial differential operator -such as the one that governs the top left block of the Stokes problem -, an operator of the form Au = −div [H ε(u)] acting on a velocity field u ∈ V = H 1 0 ( ) d is said to be elliptic if there exists c 1 > 0 so that for all symmetric tensors τ . Here, H is the rank-4 tensor that maps strain rates to stresses. For such operators, we have by integration by parts that u, Av = (ε(u), H ε(v)) = ε(u) : [H ε(v)]. It is then easy to see that ellipticity implies coercivity by recognizing that v V = v H 1 0 ( ) d = |v| 2 + |∇v| 2 1/2 and knowing that c 2 v V ≤ ε(v) ≤ c 3 v V for some constants c 2 > 0, C 3 < ∞. In other words, if the coefficient H inside the differential operator satisfies condition (A1), then the associated operator is well posed and invertible. This condition carries over to the case where we let V be a finite-dimensional subspace of H 1 0 ( ) d -for example, the set of all finite-element functions defined on a mesh.
The important realization is now that the constant c in eq. (A1) equals the smallest eigenvalue of H where we consider H as an operator that maps a symmetric tensor to a symmetric tensor. To see this, assume that H had a negative or zero eigenvalue λ. Then, we can choose σ as the corresponding eigenvector and obtain that σ : (Hσ ) = σ : (λσ ) = λ σ 2 ≤ 0, in violation of eq. (A1). Consequently, eq. (A1) can only be satisfied if all eigenvalues of H are positive.
This argument proves that if all eigenvalues of H are positive, then Au = −div [H ε(u)] is elliptic and consequently coercive, and as a result the top left block J uu of the matrix is positive definite and invertible.
It may be of interest to note that the operator A may be invertible even if it is not elliptic, that is, it is only sufficient but not necessary that H only have positive eigenvalues. However, it is much more difficult to specify the exact conditions that have to hold for H to ensure that A is invertible, and we will not attempt to do so here. This is particularly true if H = H (x) is spatially variable with eigenvalues that may also be different from one location to another.

A P P E N D I X B : A L O O K AT S O M E C O M M O N R H E O L O G I E S
The results of Sections 2.4 and 2.5 were obtained for general rheologies in which the viscosity is a function of the strain rate ε(u) and possibly the pressure p. However, if we know the specific form of this dependence, we can say more about whether or not it is necessary to symmetrize the matrix, and/or whether it is necessary to use a scaling factor α that is less than one. In the following, we will consider some common rheologies from this perspective.

B1 Power-law rheology
Some of the simplest rheology are of power-law type where the viscosity is defined as η(ε(u)) = η with n ≥ 1. The form on the right shows that the viscosity is a function of the square of the norm of the strain rate. This implies that the matrix is automatically symmetric and does not have to be explicitly symmetrized with the procedure of Section 2.4. Furthermore, we can compute the derivative of the viscosity with respect to the strain rate and obtain ∂η(ε(u)) ∂ε = η − 1 n 0 2 1 2 − 1 2n 1 2n − 1 2 ε(u) 2 1 2n − 1 2 −1 2ε(u) = η(ε(u)) 1 n − 1 ε(u) ε(u) 2 .
Identifying a = ε(u) and b = ∂η(ε(u)) ∂ε as in Section 2.5, we see that the important term in the scaling factor definition (18) is It is clear that this term grows with n toward a value of 4η and will eventually exceed its limit c safety 2η as defined in eq. (18). In particular, even if we choose c safety = 1 (i.e. allow the smaller eigenvalue of H spd to be equal to zero, then the condition will only be satisfied for n ≤ 2, i.e. if the strain weakening is not too pronounced. For smaller values of c safety , we can only choose α = 1 if n is even less than that -for example, with c safety = 0.9, the condition is only satisfied only if n 1.82. It is interesting to note that the condition is independent of the the flow field -what value α one has to choose is entirely decided by n and c safety , and α will be the same at every quadrature point at which we integrate the bilinear form. Indeed, for this class of material model, we need to choose α = min 1 2 c safety n n−1 , 1 .

B2 Drucker-Prager rheology
For Drucker-Prager rheologies, the viscosity in 2-D is typically given by η(ε(u)) = C cos φ + p sin φ 2 1 2 ε(u) : ε(u) Here, C is the cohesion and φ the friction angle. As for the power-law case, the viscosity only depends on ε(u) 2 and we know that the resulting matrix will always be symmetric. By comparing the formula for η with the one from the power-law rheology above, we see that up to a different (and possibly pressure dependent) pre-factor, the Prager-Drucker law corresponds to a power law with n = ∞. Thus, we expect that we will have to choose α < 1 in eq. (18). Indeed, we can compute that ∂η(ε(u)) ∂ε = −η(ε(u)) ε(u) ε(u) 2 , which matches the corresponding formula for the power law with n = ∞. Thus, which is of course never less than 2η and consequently always violates the necessary condition in eq. (18) to choose α = 1. In other words, we can expect that the original Newton method will always lead to an ill-posed equation for the Newton update. On the other hand, eq. (18) tells us that the choice α = 1 2 c safety will always lead to a well-posed equation with c safety < 1. It is interesting to note that for both the power law rheology with large n and the Prager-Drucker rheology, one always needs to choose α < 1. This implies that the equations that define our stabilized Newton update are never the derivative of the residual, and we can consequently not expect quadratic convergence. As the calculations above show, this has, in fact, nothing to do with the concrete test case or setup: the choice of α does not depend on the value of the strain rate or other solution variables at a given point, but is the same throughout the entire domain.

B3 The rheology of the Spiegelman et al. benchmark
To demonstrate that α does not need to be constant throughout the domain and may, in fact, depend on the flow field, we need to consider a rheology in which ∂η ∂ε : ε is not a fixed multiple of the viscosity as in the last two cases. This is indeed the case for the rheology of the benchmark by Spiegelman et al. discussed in Section 3.2. There, the viscosity is -up to a factor of 2 -given by the harmonic average of a linear rheology and the Drucker-Prager model considered above: Downloaded from https://academic.oup.com/gji/article-abstract/218/2/873/5475649 by University Library Utrecht user on 16 September 2019