Modular and flexible spectral-element waveform modelling in two and three dimensions

In this section, we explain the motivation behind selecting the finiteelement method as a spatial discretization. We then show how the mathematical details of the method can be abstracted and implemented efficiently on a computer. Finally, we show how the details of a particular wave equation can themselves be generalized, leading to efficient implementations of coupled domains and viscoelasticity.

brought an increase in solution accuracy relative to computational cost (Madariaga 1976;Virieux 1984Virieux , 1986. Physical sophistication was further developed by the proper treatment of boundary conditions and material discontinuities (Robertsson 1996;Ohminato & Chouet 1997;Moczo et al. 2002), the incorporation of general anisotropy , and formulations in terms of spherical coordinates for axisymmetric Chaljub & Tarantola 1997) and fully 3-D domains (Igel & Weber 1995). Other work focused on increasing the combined spatial and temporal accuracy using 'optimal' operators (Geller & Takeuchi 1998), and applications via natural neighbours to more general, unstructured domains (Braun & Sambridge 1995;. While finite-difference methods are highly efficient for simulating wave propagation on rectilinear meshes, some factors limit their usefulness for domains with complex geometries, a complicated material structure, or which require more advanced boundary conditions. For studies in areas with strong topography, along with applications in medical imaging and nondestructive testing, this limitation can become especially severe. Recent work shows promising extensions of finite-difference methods to more complex domains via curvilinear coordinate transforms (Fornberg 1988;Hestholm 1999;de la Puente et al. 2014;Petersson & Sjögreen 2015;Shragge 2016), but these extensions still require a continuous and smooth surface, and are associated with a higher computational cost. Furthermore, the explicit implementation of boundary conditions becomes increasingly difficult as the geometric complexity of the domain evolves. Finally, increasing the spatial accuracy of the finite-difference operators is associated with a growth in stencil size which can negatively affect the parallel scaling characteristics in larger applications demanding distributed memory parallelism.
A common response to these limitations is the use of finiteelement and finite-volume methods, which elegantly include the effects of boundary topography, and do not suffer from severe communication overheads as the spatial approximation order is increased. This is fortunate, as low-order variants of either method are too inefficient to be useful (Marfut 1984;. Recent work on arbitrarily high-order finite-volume methods has shown promising results ), but the computational cost for a given solution accuracy seems to favour discontinuous Galerkin methods over finite-volume Käser et al. 2007;Wilcox et al. 2010;Tago et al. 2012). A particular advantage of both these schemes is their efficient and arbitrarily high-order discretization on unstructured simplicial meshes (tetrahedra, triangles). While discontinuous Galerkin methods are more flexible than their continuous counterparts across elemental boundaries, they are computationally more expensive, with additional work needed to compute the flux terms while also requiring a more restrictive CFL condition as a recent study by Ferroni et al. (2017) shows.
Thus, in applications with complex bounded domains which can be meshed with hypercubic elements (quadrilaterals, hexahedra), the spectral-element method has emerged as the most efficient numerical technique for solving wave propagation problems (Seriani & Priolo 1994;Faccioli et al. 1996Faccioli et al. , 1997Komatitsch & Tromp 1999;Capdeville et al. 2003;Chaljub et al. 2003;Fichtner & Igel 2008;Peter et al. 2011;Nissen-Meyer et al. 2014). A high-order variant of classical finite-element methods, the spectral-element method combines high-order interpolating polynomials on each element with a specific quadrature rule which results in a globally diagonal mass matrix. This feature, coupled with the fact that the elementwise operations can be formulated as dense matrix-matrix products, makes the method especially suitable for the current generation of SIMD (single instruction, multiple data) computing architectures. This is in contrast to classical finite-element approaches, in which the global mass matrix is not diagonal, and the time evolution of the wavefield may involve the solution of a sparse linear system at each time step.
Improvements in the quality and quantity of seismological data, as well as increasing computational power, make it desirable and possible to model wave propagation physics in more detail. This trend, coupled with the variety of appropriate spatial discretizations and the ever-expanding network of applications where waveform modelling is relevant, risks to complexify wave propagation codes to a level where they become unmaintainable. In this paper we describe a series of theoretical and practical developments useful for designing software for full-waveform modelling and inversion based on the spectral-element method, with a focus on time-domain approaches. The results are inspired by popular finite-element packages such as FENICS (Logg et al. 2012), DEAL.II (Bangerth et al. 2007), GRINS (Bauman & Stogner 2016) and DUNE (Dedner et al. 2010) in that many details of the underlying numerical methods are abstracted away. The developments allow us to make additional generalizations regarding the implementation of the wave equation for a wide variety of rheologies. We discuss the use of distributed data structures provided by PETSC (Balay et al. 1997(Balay et al. , 2017a and DMPLEX (Knepley & Karpeev 2009;Lange et al. 2016), and demonstrate how such structures allow for scaling to problems involving more than hundreds of millions of unknowns, and discretized onto general, unstructured meshes. Finally, we present several benchmark results on both 2-D and 3-D simplicial and hypercubic meshes, along with several other proof-of-concept examples, demonstrating the variety of scales and domains to which our SEM implementation can be applied. While several high-quality spectral-element implementations focused on FWI currently exist (Komatitsch & Tromp 2002a;Cupillard et al. 2012;, none encapsulates the flexibility offered by the abstraction presented herein.
Section 2 reviews the strong and weak forms of the elastic and acoustic wave equations, and illustrates how they can both be written in a similar functional form. Piece by piece, this form is then related to a general continuous Galerkin finite-element discretization. Following this, we see how the functional notation can also lend itself to a concise representation of wave propagation through more complex rheologies, such as viscoelasticity, and coupled acoustic/elastic media. Section 3 relates these abstractions in a one-to-one manner to concepts from software engineering, showing how they lead naturally to 'function decorators'. We then show some simple examples of how such decorators can be implemented with C++ template mixins. Following this is a demonstration of how the individual decorators can be used to automatically construct functionals in complex unstructured domains, using the unstructured mesh abstractions in DMPLEX. Finally, we comment on some preliminary performance and scaling statistics, and follow with a short discussion on the benefits of flexibility.

A N A B S T R A C T F O R M U L AT I O N O F T H E S E I S M I C WAV E E Q UAT I O N
In this section, we explain the motivation behind selecting the finiteelement method as a spatial discretization. We then show how the mathematical details of the method can be abstracted and implemented efficiently on a computer. Finally, we show how the details of a particular wave equation can themselves be generalized, leading to efficient implementations of coupled domains and viscoelasticity.

The weak form
Consider the wave equation for a simple elastic solid in an ndimensional domain G (n = 1, 2 or 3) and time interval (0, T] with T > 0 (Aki & Richards 2002): Here ρ is density, u is displacement, σ is stress and f represents an external forcing term. The time and space dependency of these last three terms is taken as implicit to simplify notation. A simple way to relate stress to displacement in a purely elastic medium is via Hooke's law where C is a fourth-order tensor characterizing the stiffness of the medium, and the symbol : denotes a contraction over adjacent indices. Taken together, eqs (1) and (2) define the strong form of the linear elastic wave equation. Boundary conditions (such as the freesurface conditions), along with quiescent initial conditions, must be included explicitly in order to obtain a unique solution. Boundary and initial conditions take the form respectively, whereτ (x),ū(x) andv(x) represent spatially varying distributions of traction, displacement and velocity, respectively.n is the outward pointing normal vector from ∂G (the boundary of the domain).
Replacing the spatial and temporal derivatives by discrete pointwise approximations forms the backbone of finite-difference methods. In contrast, finite-element methods work with the weak, or variational, formulation of the governing equations (Brenner & Scott 2007;Quarteroni et al. 2010). Here a major advantage arises from approximating the wavefield by suitably chosen basis functions, which leads to a natural representation of the solution and its derivatives on the whole domain, that is not only on a discrete set of points, and it enables us to compute integrals over volumes and surfaces.
To obtain the weak form of eq. (1) we first multiply with a smooth function w (referred to as a test function), which gives for any t ∈ (0, T]: where the test function w does not depend on time t. Applying the divergence theorem to the first term on the right in eq. (4) gives Note the explicit appearance of the boundary integral in eq. (5). This term vanishes if the free-surface boundary (σ ·n| ∂G = 0) condition from (3) is inserted, and eq. (5) reduces to The free-surface condition is now implicitly included without any special effort. This inclusion is a major advantage over finitedifference methods, especially when modelling wave propagation through regions with strong topography, considering sea bottom bathymetry or dealing with applications in engineering where the domains may exhibit an arbitrary level of geometric complexity. Of course, we may also specify any other boundary condition we like, with the caveat that the integral over ∂G may become non-zero. This is the case, for example, along the interface between coupled fluid and solid domains, or any internal discontinuity where the normal stresses are continuous. In seismology, it is often desired to solve the acoustic (scalar) version of the wave equation in addition to the elastic (vector) version. This is useful in modelling wave propagation through fluid or coupled fluid-solid systems (Komatitsch et al. 2000;Komatitsch & Tromp 2002b;Nissen-Meyer et al. 2008;Korta Martiartu et al. 2017), and also as a computationally efficient approximation to the elastic wave equation (Pratt et al. 1998;Prieux et al. 2013;Cance & Capdeville 2015). We can derive the weak form of the acoustic wave equation in a similar way to that demonstrated above. Starting from the strong form where c is the speed of sound and u is a scalar displacement potential (Chaljub & Valette 2004), we multiply with the (scalar) test functions w, integrate, and apply the divergence theorem to obtain: At this stage, it is instructive to note the similarity between eqs (4) and (8). Each includes four distinct terms: a term involving a volume integral of density and acceleration (the mass term), a term involving a surface integral of the normal component of stresses (the boundary term), a term involving the volume integral of internal forces (the stiffness term), and a term involving a volume integral over external forces (the forcing term). All terms also involve the dot product with either the test functions w or their gradients ∇w, and act on either the wavefield u or its gradient ∇u. It is understood that in the case of the scalar wave equation, the vector quantities simply reduce to their scalar counterparts.
With these similarities in mind and using the notation (a, b) G = G a · b d n x and a, b ∂G = ∂G a · b d n−1 x, we can write eqs (5) and (8) in a more compact form. The weak forms now read: for all t ∈ (0, T] and all w ∈ V n (resp. w ∈ V), find u (resp. u) such that or where we have used the elastic constitutive relationship given in eq.
(2) and where V denotes a suitably chosen Galerkin space that will be defined in the next section. In this form, the similarities between the two equations are quite apparent. The only significant difference comes in the way material coefficients are multiplied into each term, and the dimensions of the dynamic fields and test functions. We can use these similarities to add one additional layer of abstraction. Here we will use the concept of a 'mathematical model' to describe a set of equations. Each model encompasses a specific term in eqs (9) and (10), and includes a description of the relevant material parameters, the choice of the test functions space, and the associated mathematical operations. We denote such a model by the symbol I and define it as where A and E refer to the acoustic and elastic wave equation, and M, B, S, F refer to the mass, boundary, stiffness and forcing terms, respectively. In this notation, we can represent the acoustic or elastic wave equations with the simple expression or even more compactly as P∈{M,B,S,F} In essence, we have compressed all the information that does not change through time into I. Such a model is passed the time-varying field u as an argument to P = P(u), on which it acts in much the same way as a function. As before, we allow u to reduce to its scalar equivalent in the case of the acoustic, or any other scalar, wave equation.

Galerkin approximation
In order to solve eq. (13) numerically we need to choose a proper finite-dimensional approximation V h ≈ V. For simplicity and efficiency we consider the Ritz-Galerkin approach, which is also commonly known as Galerkin method (Zienkiewicz et al. 2005). Here, the test functions w and the solution u are approximated by a set of functions φ chosen from the same space. Let B be a basis of V h and let N denote its size. We can form a discrete approximation of any continuous function g, which could represent a component of the displacement u, or the stress σ , by expanding this basis with coefficients F j : The derivatives of g can then be approximated by the derivatives of φ j : The accuracy of this approximation will be limited by how well V h approximates the original function space V.
In addition to the accurate evaluation of these fields and their derivatives, we must also evaluate integrals like (u, w) G and ∇u ·n, w ∂G . To do this, we turn to numerical quadrature. Here, we combine a set of M integration weights W i , with g(x) and ∇g(x) evaluated on a set of integration points x i , resulting in for volume integrals, and for surface integrals. Note that the only difference between these two terms is the superscript on W. This allows us to write both nand (n − 1)-dimensional integrals in the same form, however in reality the weights W ∂G are only defined on ∂G. To work the model B into the wave equation, we can expand the definition of I and insert any choice of the basis B as an additional argument into eq. (13), which encodes all operations introduced in this section P∈{M,B,S,F} Consider the powerful generalizations obtained when writing the wave equation in this manner. In addition to generalizing over the acoustic and elastic wave equation, we have not yet specified exactly which function space B describes. In fact, we are able to use any function space, as long as eqs (14)- (17) are well defined. Additionally, we have not yet specified the spatial dimension. Indeed, the same form holds for 1, 2 and 3 dimensions. All that is required is that the integrals in eqs (16) and (17) exist for a given dimension n. For example, in 2-D these equations refer to surface and line integrals, while in 3-D they refer to volume and surface integrals, respectively. Most importantly, we can separate all physical quantities (encoded in P) from the finite-element discretization (parametrized by B).
We can now analyse the steps needed to compute I I (S, B) (the stiffness term). First, we use B to approximate the displacement field u with some finite-dimensional basis, for example polynomials, via eq. (14), resulting in a discretized approximation of u. We then use additional functionality internal to B to compute the gradient of the discretized displacement via eq. (15). This gradient, equivalent to the elastic or acoustic strain, is now passed on to I I . For I = A (the acoustic wave equation), this action involves multiplication by the inverse density ρ −1 , while for I = E (the elastic wave equation) this involves a contraction with the elastic tensor C. If we consider the strain and stress as being represented in Voigt notation [see Babuška & Cara (1991), for example], then the above operations simplify to a scalar-vector or matrix-vector product, respectively. The result of either operation is equivalent to the acoustic or elastic stresses. We then rely on B again to first compute the gradient of the test functions ∇w, and finally perform the numerical integration via the quadrature rule from eq. (16). This completes the computation of Eq. (18) describes the semi-discrete system of the weak form of the wave equation and does not yet include the discretization in time. Again, this can be seen as another layer independent to the physics and the spatial discretization. Commonly used explicit time marching schemes for are, for instance, the second-order Newmark central-differences scheme (Hughes 2000;Peter et al. 2011), second-order leap-frog (Ferroni et al. 2017), fourth-order symplectic schemes (Nissen-Meyer et al. 2008) or high-order Runge-Kutta methods (Wilcox et al. 2010).

Finite-elements
In seismology, we are often concerned with accurately representing different discontinuities within the Earth. These can include several phase-transition boundaries within the mantle, the core-mantle Downloaded from https://academic.oup.com/gji/article-abstract/216/3/1675/5174970 by RADCLIFFE SCIENCE LIBRARY user on 04 June 2019 boundary, or, on a much smaller scale, the ocean bottom. Across these discontinuities the model parameters and strain field are not differentiable, and for this reason a smooth polynomial basis over the entire domain G is usually not appropriate. To mitigate this problem we turn to the finite-element method (FEM), where the domain G is subdivided into a set of non-overlapping elements, each enclosing the volume G e . The basis B then consists of continuous functions on G which are restricted to piecewise polynomials defined on the individual elements. Within each element, we approximate functions and their derivatives as in eqs (14) and (15).
While using the FEM, for reasons of efficiency, we can take advantage of pre-tabulated integration weights and basis function derivatives defined on a reference element, which are often defined in an element-specific reference coordinate system ξ . To transform these integrals and derivatives to the physical coordinates x, we require an element-specific Jacobian ∂ x/∂ξ , which defines the mapping from reference to physical coordinates. With this transform, eqs (14)-(17) become and respectively. To assemble the integrals in eqs (21) and (22) over the whole domain G, one simply needs to sum contributions from each element. With this in mind, the only difference to eqs (14)- (17) is the dependence on the reference coordinate system ξ , along with the determinant of the Jacobian matrix and its inverse, which describes the local mapping from reference to physical coordinates in an ndimensional manifold. To calculate these additional quantities, all that is required is the definition of the mapping from the reference element to the physical domain. Let us now define E as a model containing this data, along with instructions on how to compute the Jacobian for a given element type. Then, in the finite-element discretization of the weak form, we can rewrite B as B(E), and thus eq. (13) becomes P∈{M,B,S,F} The additional layer introduced by E allows us to specify the reference element type independently of the basis B. For instance, we can use linear or quadratic functions to define the geometry of the reference element. Of course, there are some restrictions implied by the dimensionality of the problem and the details of B but with these restrictions eq. (23) is valid whether the domain is partitioned into either simplex-or hypercube-type meshes, or something else more exotic. The choice of these functionals depends on the specific problem at hand.
For most of the results presented in the next sections, we follow the classical spectral-element discretization commonly used in seismology, and choose the tensorized GLL basis as our discretization model (Komatitsch & Tromp 2002b;Fichtner et al. 2009;Cupillard et al. 2012). This also restricts our choice of finite-elements to those of a hypercubical character. It is important to note that this choice of basis prevents us from integrating the discrete mass term with full accuracy, as the GLL quadrature rule is only exact for polynomials up to order 2n − 1. As we use the same polynomials of order n to represent both the dynamic field variables and the test functions, the mass term includes polynomials of degree 2n. Given a field with a fixed spatial complexity (e.g. a band-limited wavefield) this inaccuracy can always be mitigated for a given domain and model through both mesh refinement and/or increasing the order of the polynomial approximation. In Section 3.3, we show examples validating the accuracy of the spectral-element method for select media with constant material parameters through benchmark results comparing numerically generated and semi-analytic solutions. Once material parameters are allowed to vary within each element, either linearly or otherwise, the order of the corresponding polynomial approximations increase in turn. While this suggests that the accuracy of the simulated scalar displacement potential given in eq. (10) is degraded to a higher degree than the elastic wave eq. (9) due to the product of the inverses of material coefficients, this can be remedied with a change of parameterization. For instance, by parameterizing eq. (10) with compressibility (ρ −1 · c −2 ) and inverse density, the accuracy of the discrete solutions to (10) and (9) are identical. A thorough study of the convergence properties of spectral-element methods is beyond the scope of this paper. A recent and comprehensive study can be found in Ferroni et al. (2017).

Complex rheologies
While the abstractions introduced so far allow us to construct a general finite-element discretization of the seismic wave equation, their utility extends to the mathematical models describing wave propagation physics as well. Consider a move beyond linear elasticity into viscoelastic and weakly non-linear regimes. Many of these effects these rheologies have on the wavefield can be expressed as modifications to the elastic or acoustic stresses, which are contained within the term I I (S, B(E)). Consider the viscoelastic wave equation. In this case, the stiffness term in eq. (9) depends on the entire strain history It is common to approximate this time-dependence by a discrete Laplace transform (or discrete relaxation spectrum) that can be equivalently represented as generalized Zener or Maxwell bodies (Emmerich & Korn 1987;Robertsson et al. 1994;Moczo & Kristek 2005; van Driel & Nissen-Meyer 2014). Using this discrete formulation, we can rewrite the stiffness term in (24) as ⎛ where the M j (t) are memory variables which encode the strain history of the Maxwell bodies. C U refers to the 'unrelaxed elastic tensor', the coefficients of which represent the instantaneous elastic response to a strain impulse. Note the following similarities between eq. (25) and the purely elastic stiffness as defined in eq. (9), and written in model notation as I E (S, B(E)). The operations provided by B(E) are identical. The gradient of the displacement u and test functions w maintain the same form. On the element level E, we require the same mapping to physical coordinates which have already been provided. Furthermore, we retain the contraction of a fourth-order tensor with the displacement gradient u, although the elastic tensor is modified to take on an un-relaxed state through the subtraction of the memory variables. We can encode these additional instructions in a new model A(·): Assuming a free-surface condition on the boundary of G, the application of A only modifies the term I E (S, B(E)). Thus we can consider it as an identity operator over the other models and eq. (26) simplifies to As we did with B, we now explore the steps required to compute I E (A(S), B(E)). First, we use the conventional stress term S to compute C U : ∇u(t), the first operation in eq. (25). The polynomial approximation provided by B is required, which in turn requires the Jacobian matrix provided by E. Once this term is computed, we use the instructions on the A layer to subtract the sum of the j memory variables M j (t). Once this is complete, we return to the conventional stress computation S and its arguments to compute the contraction with ∇w and the resulting integral. Finally, we use the instructions on A to update the differential equation controlling the time evolution of the memory variables.
Note that we have yet to specify any anisotropic symmetries (handled by the functional P), the specifics of the polynomial interpolation (handled by the functional B), or the element type (handled by the functional E). Nevertheless, we have a compact representation of the resulting weak form equation which can be used with any valid combination of P, B, and E. This form is valid for any modification to elastic or acoustic stresses. Such is also the case when applying gravitational corrections in the Cowling approximation (Cowling 1941;Chaljub et al. 2007), or simulating wave propagation in weakly non-linear/hyperelastic materials (Rivière et al. 2013). We simply require some modifier F which takes the physics defined in the wave equation as an argument, and implements modifications for all those terms P for which F(P) is not an identity transform.
Another illustrative example involves the coupling between the acoustic elastic rheologies, relevant when simulating wave propagation in combined fluid-solid domains. In this case, all elements along the coupling boundaries must compute additional quantities to ensure the continuity of traction between the two domains (Komatitsch & Tromp 2002a;Chaljub & Valette 2004;Nissen-Meyer et al. 2008). We can bring the (time-independent) coupling effects into the model notation by defining C F S (fluid to solid) and C SF (solid to fluid) and applying them to the coupled wave equation in a manner similar to eq. (26). Since these only act on the boundary terms, the application of these functionals to the wave equation results in identity transforms for all other terms. Inserting these identity transforms where appropriate, we obtain the simplified result We can include attenuation in the solid regions by including A in eq. (28): and follow a similar process for including corrections for selfgravitation as well. A complete numerical model of wave propagation can be built up from a set of functionally orthogonal models. In an abstract sense, the utility of these models seems be limited to generating concise representations of the discretized wave equation, which is generalized over rheology. In the next section we will explore their practical utility as tools for writing maintainable and efficient numerical modelling software.

I M P L E M E N TAT I O N I N M O D E L L I N G S O F T WA R E
The above abstractions have allowed us to split the physics, spatial discretization and finite-element shape mappings into three classes of logically orthogonal components. The components within each of these classes can, with some restrictions (e.g. those induced by the spatial dimension), be interchanged independently from changes in any other class. To be of practical use, these abstractions must be implemented in computer code, and it must be ensured that their use does not negatively affect performance. To achieve this goal, we turn to the C++ programming language, and make extensive use of the 'template mixin' paradigm.

Models as template mixins
As a simple introduction, consider a concrete instantiation of the model B(E). We can write a one-to-one correspondence between abstract model notation and C++ template mixin notation as Here, following the classical spectral-element discretization commonly used in seismology, we have chosen the tensorized GLL basis as our discretization model B, and restricted the support of this discretization to a 2-D 4-node quadrilateral element with a bilinear (P1) shape mapping, which takes the place of E. In practise, each component of the model (i.e. TensorGll) is a separate C++ class. Alternatively, we could use the same GLL basis as B, but change the support to a 3-D 8-node hexahedral element, and write this in mixin notation as Here we get a sense for the origin of the term 'mixin'. Given a collection of possible discretization schemes, and a collection of elemental restrictions, we 'mix-in' the two collections together to create a concrete instance of the abstract model B(E). The concrete instantiations of all the abstract models can be referred to in this way.
As was the case in Section 2, the efficiency of this notation grows and becomes more obvious as the rheologies and discretizations become more complex. Consider a toy implementation of eq. (20), which computes the gradient of some discrete function f, given in Listing 1.
An exploration of this Listing gives insight into how these abstract models are used in practice. First, we note that this function is a component of the 'Basis' model B. In Listing 1 this is specified by prefixing the name of the model the function belongs to (Basis) to the function name itself (ComputeGradient). At this stage, the element E is still abstract and waiting to be 'mixed-in', which is evidenced by the template <typename Element> preceding the function definition. Internal to the function, we first allocate a 2-D array to hold the gradient of F, which is of the size (NumDof, NumDim), or the number of degrees of freedom on this element, and the number of dimensions of the problem, respectively. Here we already take advantage of the abstract nature of the model notation: NumDof is provided by the current discretization model, but NumDim is currently unknown, and only provided once the complete model is specified. Now, we loop through each dimension, and fill up Gradient by following eq. (20) more or less verbatim. We multiply and sum F at the current integration point by the gradient in reference coordinates of the test functions at all the integration points. To transform this into physical coordinates, we rely on E to provide us with the shape mapping DepsDx. Although the true implementation of such functions can be made more sophisticated than the example in Listing 1, the concept here is what is important: through the C++ template mixin paradigm we are able to recognize the benefits of the abstract and compact model notation introduced in Section 2.
The remaining models can be implemented in the same way. Consider the case of the acoustic, or isotropic/anisotropic elastic stiffness term I I (S, B(E)), both implemented in Listing 2. Here we are mixing two models: the discretization (Basis), and a particular model of computing stresses (Physics).
If we unpack the information compressed in these models, we see that Listing 2 is written in a similar fashion as the continuous weak form given in eqs (9) and (10). Indeed, such a function remains unchanged across dimensions, discretization models, and element types (of course, as long as each of these are consistent with each other). An example of some concrete instantiations of I I (S, B(E)) are I I (S, B(E)) → Acoustic TensorGll QuadP1 , I I (S, B(E)) → Elastic TensorGll HexP1 , among many others, where EnrichedLagrange refers to a discretization model based on mixed-order Lagrange polynomials suitable for high-order mass lumping on simplicial elements (Chin-Joe- Kong et al. 1999;Cohen et al. 2001;Zhebel et al. 2014;Mulder & Shamasundar 2016).
The implementation of more complex rheologies follows in a similar manner. Taking an implementation of viscoelasticity using memory variables, as described in Section 2.4, we can write a simple implementation of I E (A(S), B(E)), or eq. (27), as expressed in Listing 3. The syntax we are using here can again be related to the notation introduced in section 2. The template parameter specifies the discretized wave equation I E (S, B(E)), and the resulting model takes the wavefield u as an argument. The Attenuation class performs the job of A, specifically timestepping the state of the memory variables, and modifying the constitutive relationship accordingly. As in Listing 2, Listing 3 is valid for all consistent dimensions, discretization models and element types, and is also valid regardless of the exact form of the elastic stiffness computation (i.e. isotropic or anisotropic). XXXX Here we note that all these functions could be made even simpler by explicitly building the finite-element stiffness matrix, and just performing one matrix-vector product to compute the internal forces. Our reasons for not taking this approach are two-fold. First, there are significant efficiency gains in this 'on-the-fly' computation of the stiffness matrix for all wave equation types, although these gains are restricted to hypercube-type elements with a tensorized basis (Deville et al. 2002). Secondly, and more generally, this approach allows us to save costs in the computation of complex constitutive relationships which modify the stress and / or strain. Using the explicit stiffness matrix would require the recomputation of these terms.
In general, the implementation of the discretized wave equation with C++ template mixins allows an almost one-to-one correspondence with the hierarchical model notation introduced in section 2. This can be expressed with an analogy to building blocks, as illustrated in Fig. 1. Here we use the term 'Decorator' to represent any modification to the linear elastic rheology that can be written in a form similar to eq. (26).

Constructing mixins in complex domains
The above approach lends itself well to complex coupled domains, as each individual element in the domain can just be constructed from an arbitrary model/mixin. Consider, for example, the 4 element domain illustrated in Fig. 2. Each element is alternatively solid or fluid, the top domain boundary is of type Dirichlet, and the other three domain boundaries are meant to absorb energy. In mixin notation, we can write each element as follows: With the equivalence of the mixin notation and the model notation, this explicit characterization of each element ensures that the proper equations will be solved on each element. The question remains how to construct the mixins automatically for domains of arbitrary complexity. Domains with simple fluid-solid boundaries are found in many seismological applications. Coupling between the mantle and outer core, and oceanic exploration surveys, are some common examples. In these cases it is relatively simple to mark specific facets (a generalization of edges in 2-D, and faces in 3-D) in the mesh as coupling boundaries. However, this is disadvantageous as such boundaries need to be re-computed if the mesh discretization ever changes. More difficulties are encountered in examples where each individual element may vary between a fluid and solid. This may be the case, for example, in a saturated porous medium. The domain illustrated in Fig. 2(a) gives a simple example of such a pathological case. To handle such complex domains we build on the functionality provided by DMPLEX, which is a library for handling unstructured meshes contained within PETSC (Knepley & Karpeev 2009;Lange et al. 2016). DMPLEX represents a computational mesh as a directed acrylic graph (DAG), and allows us to trivially construct complete mixins for elements in a domain of arbitrary geometric and physical complexity.
The mesh outlined in Fig. 2(a) contains five distinct physical components. On the elemental level, we consider elements that are responsible for simulating either the acoustic (blue) or elastic (brown) wave equation. In the fluid elements we consider a formulation of the acoustic wave equation based on a scalar displacement potential (eq. 8) in order to maintain an explicit time stepping scheme (Chaljub & Valette 2004). Along the left, bottom, and right edges we consider absorbing style boundary conditions (green) (Clayton & Engquist 1977;Komatitsch et al. 2000). Finally, along the top boundary we consider a homogeneous Dirichlet-type rigid boundary (red). The right side of Fig. 2(a) also demonstrates another way to look at the mesh, here focusing on the topological entities. Following the same order as in the previous paragraph, we proceed from entities of dimension two (faces F N ), to entities of dimension one (edges E N ), to those of dimension zero (vertices V N ). As in Fig. 2(a), the mesh entities are coloured by the corresponding physics they introduce. Here we can consider the acoustic and elastic terms as forming the element physics, with the boundary terms serving to mix-in with the element physics. And additional complication introduced by the Dirichlet boundaries. Since these operate in a point-wise fashion, they must also be associated with the vertices of each edge they decorate, in contrast to the absorbing boundary terms which only consider surface integrals over an element facet.
Of course, there are several mixins which are absent from Fig. 2(a). These are the coupling mixins which we expect along E3, E5, E6 and E8. As well, since each model in eq. (33) is explicitly constructed once per element, it is essential that all mixins are known at the element (F N ) level. Fig. 2(b) recasts the topology outlined in Fig. 2(a) as a Hasse diagram, functionality provided by We can perform this communication in two stages. First, we traverse the upward and downward closure for each edge (Fig. 2c), and investigate the labels residing on each connected entity. As an example, for E0 the upward closure visits V0 and V1 while the downward closure visits F0. Any labels on these entities are then added to the set of labels on each edge. Once these sets are constructed, we can simply traverse one level in the upward closure of each element (F N ), and add all labels associated with the connected edges to the set of elemental physics (Fig. 2d). At this stage we have all the information necessary to construct the complete model for each element.

Examples
Using the Hasse Diagrams is an attractive way of determining which decorators to add to the element physics as it is local to each element, but also agnostic to both element type and dimension. For simplicial elements, the number of edges and vertices associated with each element changes, but the Hasse diagram can be drawn in a similar way. As well, in three dimensions, another stratum is added to the bottom of the diagram indicating element volumes, but the discussions about closure traversal hold. Taken together,  these factors allow us to flexibly and efficiently discretize complex domains. An example of this flexibility can be seen when considering a medium in which each element randomly simulates either the acoustic or elastic wave equation, with the separate solutions coupled along element edges. This may, for example, represent a type of fluid-filled porous medium, but it is more appropriate to consider this example as a complex end-member in the class of coupled simulations through unstructured domains. A snapshot of wave propagation through one realization of such a domain can be seen in Fig. 3. Note that here two distinct equations are being solved: the isotropic elastic wave equation, and an acoustic wave equation based on the scalar displacement potential, the weak forms of which are given in eqs (4) and (8) respectively. On the model generation side, all that was required was the labelling of individual elements as either acoustic or elastic. The coupling boundaries were found automatically at runtime by traversing the Hasse Diagram provided by DMPLEX as outlined in Section 3.2.
The benchmarks outlined in Fig. 4 present more examples of the benefits of both the model/mixin design and the generality of the Hasse diagrams, in this case through the discretization of the domain with four distinct element types. Simulated is coupled acoustic/elastic wave propagation in 2-D and 3-D on quadrilateral, triangular, hexahedral, and tetrahedral meshes. The 2-D or 3-D domain is identical in all cases, as are all models/mixins above the level specifying the polynomial basis. The functions computing the propagation in the acoustic and elastic subdomains, as well as the functions computing the coupling terms, are identical. The Hasse diagram provided by DMPLEX was used to automatically determine the coupling edges. The only differences are at the bottom of the mixin chain with, for example, TensorGll QuadP1/HexP1 computing the spatial operations for the hypercubic elements, and EnrichedLagrange Tri1/TetP1 computing the spatial operations for the simplical elements. A detailed exploration of the associated convergence properties is beyond the scope of this paper.

D I S C U S S I O N
The concepts outlined in Sections 2 and 3 have been implemented within the SALVUS software package. The next paragraphs will discuss several examples from this implementation and cover issues such as performance and scaling.

Performance and scaling
Abstractions in software often negatively affect performance. This point is especially relevant when considering full-waveform inversion, which has its place amongst the largest PDE-constrained optimization problems (Afanasiev et al. 2016;Bozdag et al. 2016). While the decomposition of the wave equation into models/mixins allows for a very concise representation of increasingly complex terms, in reality this approach is practically useless unless the resulting software is performance competitive. Considering that a realistic problem may include millions of elements, tens of thousands of time-steps, thousands of sources, and hundreds of iterations, it is likely that some of the functions described in the above Listings will be called 10 19 times over the course of an inversion.
Rather than being detrimental, we believe that the software abstractions outlined above in fact allow for improved performance. One of the reasons is technical, and has to do with the implementation via C++ templates. During the code compilation stage, for each complete and meaningful wave propagation model a separate copy of the entire mixin chain is generated, meaning that there is no runtime cost associated with determining the physics being simulated on each individual element. On the contrary, since the compiler is able to see all code paths during the compilation stage, most of the function calls are actually 'inlined' -this is these instances, the overhead associated with the function call itself in completely eliminated. The inlining of all the individual components in Listing 3 allows the viscoelastic stiffness to be calculated without deferring to any other function. This can be contrasted with more classical object-oriented approaches involving virtual function lookups, where each function call would trigger a Virtual Method Table lookup, stalling the process until the appropriate delegate was found. This process would need to be repeated for each layer of the mixin chain, which can grow to be quite complex. Another reason for the efficiency of this design stems from the power of the abstractions themselves. Since each individual model is simple, care can be taken to ensure that each component is fully optimized. For example, the functions computing the wavefield gradient for each element type in eqs (9) and (10) are completely independent from the rest of the code. Since these functions are called from any combination of mixins, speedups in these functions have a wide-ranging impact.
To demonstrate that the abstractions minimally impact the runtime efficiency, we performed a simple single-thread benchmark. The test machine was powered by an Intel R Core TM i7-6850K (Broadwell) processor with a base operating frequency of 3.60 Hz.  Receivers measuring pressure were placed 5 m above the interface, and those measuring displacement were placed 5 m below (black dots). V p and V s in the solid were 5800 and 4000 m s -1 , respectively, while V p in the fluid was 4000 m s -1 . ρ was 2600 kg m -3 . Panels on the right show normalized comparisons between numerically generated solutions, computed using the methods described in this paper, and semi-analytical solutions, generated with Gar6more2D (Diaz & Ezziani 2010a) and Gar6more3D (Diaz & Ezziani 2010b). The scalar pressure (p) is plotted in the acoustic medium, along with the vector displacement (u x , u y , u z ) in the elastic medium. n-dimensional snapshots of the simulation volume are shown in panels on the left. In the fluid domain (blue), the color scale qualitatively shows the scalar displacement potential, while in the solid domain the x-component of displacement is visualized.  The peak theoretical double precision floating-point performance of the entire chip is reported as 345.6 GFLOPS and with six compute cores we expect a maximum per-core performance of 57.6 GFLOPS (Intel 2018). Our benchmark simulation consisted of a 3-D hexahedral mesh with 20736 4th-order isotropic elastic elements, and was run for 300 time steps. We measured the cumulative time spent in the computation of the elastic internal forces, as outlined in Listing 2. Carefully counting the number of floating-point operations within the strain, stress, and stiffness functions, we recorded a performance of 10.57 GFLOPS for double precision calculations. As this reaches an appreciable fraction of the maximum theoretical efficiency, and considering the memory-bound character of the computations, we can conclude that a significant amount of compute time is spent performing floating point operations, rather than resolving function calls within the mixin hierarchy. As further evidence of this, we can look at the total time spent within the internal forces calculation, and compare this time between single (18.37 s) and double (35.615 s) precision runs. The single precision run is almost exactly twice as fast as the double precision run. Due to memory bandwidth limitations, and fixed-width vector registers, this behaviour is expected. As the time taken to resolve the function calls are identical in both cases, we see that the expense of the floating-point operations are by far the dominant factor contributing to the wall time. If performance degradation from the mixin-based design was significant, we would expect to see the wall-time differential between single and double precision runs to narrow, but this is not the case.
In addition to the efficiency of the element-wise operations, parallel scaling performance is extremely important. To assist with parallelism, we can again leverage the expertise of PETSC, which provides a distributed vector API and hides the complexity of broadcasting the degrees of freedom on partition boundaries. The diagonal nature of the spectral-element mass matrix ensures that the communication halo never extends into neighbouring partitions, allowing for very efficient communication patterns which scale with the surface area of each parallel partition. Fig. 5 shows scaling statistics on the Piz Daint supercomputer, an XC50 system composed of 12-core Intel Haswell nodes located at the Swiss National Supercomputing Center. Strong scaling results, where a variable number of compute cores are used to solve a problem of a fixed size, are shown on the left. Here we consider a 3-D mesh with 729 000 elements and a polynomial degree of 4, resulting in 125 degrees of freedom per element. Visco-elastic wave propagation is simulated. The scaling is near optimal until the number of elements per core drops to approximately 1000, where we see slightly greater than 100 per cent parallel efficiency. We attribute this 'super-scaling' to processor cache effects; since less memory is used per core, it is more likely that data will remain in the processor cache throughout the simulation. Eventually, the parallel efficiency drops if there are too few elements per core and the communication overhead starts to dominate the cost of elemental operations. This can be seen from the last data point in Fig. 5(a), which uses 1024 nodes (12 288 cores) and roughly 60 elements per core. Fig. 5(b) presents weak scaling results. In this case, the number of elements per core remains fixed at 2250, while the total problem size is increased. Here we again see greater than 97 per cent parallel efficiency when increasing the total number of elements from 27 000 to 3 456 000.
To lower the communication costs associated with distributed memory parallelism, we overlap computation and communication during the global finite-element assembly phase. On each core the degrees of freedom shared between neighbouring elements are duplicated, and the stiffness term for each element is computed sequentially. Following this, we begin a broadcast of the ghost nodes. Asynchronously from this broadcast we sequentially assemble the global stiffness vector on each core, and once this is complete halt until the communication has finished. Finally, we finish the parallel global assembly by incorporating the communicated stiffness values on each core. Currently all communication is handled via the Message Passing Interface (MPI) (MPI Forum 2009), and no optimizations for shared memory are made. The explicit use of shared memory is a topic for future research, but as the communication overhead is already small we do not expect a significant increase in performance when considering CPU-only architectures.

Outlook
The abstractions introduced in this paper allow for a significant amount of application flexibility. The dimension independence promotes experimentation on small 2-D domains, the results of which can be immediately used and translated to large 3-D experiments. An example might be experimental design in exploration seismic surveys. Given a model of the subsurface, the optimal spacing of . Qualitative illustrations of SALVUS running forward simulations at the laboratory, exploration, regional and planetary scale. (a) Elastic simulation of ultrasonic waves in a steel engine support clamp, using a setup similar to that described in Fig. 7(a) and discretized with approximately 10 000 elements; (b) Elastic simulation through a 2-D slice of the SEG Overthrust model (Aminsadeh 1996), also discretized with approximately 10 000 elements; (c) Elastic simulation of the 2017 Linthal Valley earthquake in Switzerland, using the PREM (Dziewoński & Anderson 1981) as a background velocity model, and discretized with 6.5 million elements; (d) 6.7 million element coupled acoustic/elastic wave propagation through a radially symmetric 1-D model of Mars (Khan et al. 2016), over which is imposed a crustal model with 3-D Moho and surface topography (Smith et al. 1999;Wieczorek & Zuber 2004). In all of these examples, hypercube-type elements and the GLL basis were used. In addition to the different spatial scales simulated, the computing power required for each simulation varied significantly: (a) was run on a desktop workstation with 8 processor cores, (b) was run on a single core on a laptop, (c) was run on 1200 processor cores on the Piz Daint supercomputer at the Swiss National Supercomputing Center (CSCS), (d) was run with 2400 cores on the same machine.
sources and receivers along a seismic line can be determined in 2-D, using computing resources no more powerful than a standard desktop computer. Once the design is set, the same interface can be used to scale the problem to 3-D on massively parallel supercomputers (Fig. 6).
The flexibility also encourages the application of FWI to novel domains. Recently, the method has been applied to problems in ultrasonic medical imaging Korta Martiartu et al. 2017) and non-destructive testing (Rao et al. 2016;Seidl & Rank  . Gallery of normalized gradients for different wave propagation physics and scales ranging from millimetres to thousands of kilometres, meant to exhibit the range of problems accessible to a general implementation. All examples consider a single source-receiver pair. Blue hues represent regions where an increase in the value of the relevant parameter will decrease the associated misfit measure; red hues represent the opposite. (a) L 2 sensitivity to shear modulus μ on an engine support clamp, using a wavefield generated by a 200 kHz Ricker source; (b) L 2 sensitivity to the speed of sound in a synthetic breast phantom (grey-scale image) for ultrasound tomography, using a wavefield generated by a 500 kHz Ricker source. (c) L 2 sensitivity to fluid compressibility (purple/green hues) and shear modulus (red/blue hues) in a coupled acoustic/elastic domain, using a 3 Hz Ricker wavelet in the fluid region as a source; (d) Pdiff sensitivity to cross-correlation traveltime shifts on a 2-D slice through Earth, using PREM (Dziewoński & Anderson 1981) as a background model. The source used was a moment tensor with a dominant period of 10 s. 2017). These situations bring realistic domains of significant complexity. In medical imaging, this includes coupling between tissue, which is mostly modelled as a fluid, to elastic bones, and to the emitting devices themselves. In non-destructive testing applications, complex rheologies including hyperelastic and weakly non-linear wave propagation physics may be essential for accurate modelling. These situations can be investigated with the help of the model/mixin approaches outlined above, without any major changes in software design. A preliminary gallery of sensitivity kernels (Tarantola 1984;Tromp et al. 2005;Fichtner et al. 2006), a first step towards FWI, is given in Fig. 7. Additionally, the geometric complexities of these applications may warrant simulation on non-conforming meshes, motivating the implementation of discontinuous-Galerkin (DG) style basis functions Wilcox et al. 2010). As with the complex rheologies, the implementation of these bases need only involve the B model, without changes to the WaveEquation and Element mixins.

C O N C L U S I O N S
Waveform modelling within domains of a realistic physical complexity almost always requires fully numerical solutions to the wave equation. While traditionally undertaken within a geophysical context, advances in acquisition hardware, computational power, and method development has led to the application of these solutions to a growing variety of problems both within and external to seismology. This growth in the application space has driven an equivalent growth in the complexity of waveform modelling software which represents a threat to maintainability of current and future developments. As one of the primary analytical tools used to investigate the character of Earth, this is a serious issue.
In this paper, we have introduced a set of conceptual and practical developments which attempt to mitigate the effects of the combinatorial complexity that characterizes modern applications of full-waveform modelling and inversion. We first focussed on a modular interpretation of seismologically relevant wave equations, where the wave propagation physics and the spatial discretization were separated into distinct and functionally orthogonal mathematical models. We then showed how these abstract concepts could be made concrete in modelling software, and gave examples of the variety of applications made accessible by such an approach. It is our hope that this manuscript demonstrates that it is indeed possible and practical to develop maintainable and scalable modelling tools.