A high-order time-parallel scheme for solving wave propagation problems via the direct construction of an approximate time-evolution operator

Our manuscript demonstrates a technique for efficiently solving the classical wave equation, the shallow water equations, and, more generally, equations of the form ∂u/∂t=Lu∂u/∂t=Lu, where LL is a skew-Hermitian differential operator. The idea is to explicitly construct an approximation to the time-evolution operator exp(τL)exp(τL) for a relatively large time-step ττ. Recently developed techniques for approximating oscillatory scalar functions by rational functions, and accelerated algorithms for computing functions of discretized differential operators are exploited. Principal advantages of the proposed method include: stability even for large time-steps, the possibility to parallelize in time over many characteristic wavelengths and large speed-ups over existing methods in situations where simulation over long times are required. Numerical examples involving the 2D rotating shallow water equations and the 2D wave equation in an inhomogenous medium are presented, and the method is compared to the 4th order Runge–Kutta (RK4) method and to the use of Chebyshev polynomials. The new method achieved high accuracy over long-time intervals, and with speeds that are orders of magnitude faster than both RK4 and the use of Chebyshev polynomials.


Problem formulation
We present a technique for solving a class of linear hyperbolic problems (1.1) 2 of 29 T. S. HAUT ET AL. The basic approach is classical, and involves the construction of a rational approximation to the time evolution operator exp(τ L) in the form where the time-step τ is fixed in advance and M scales linearly in τ . Once the time-step τ has been fixed, an approximate solution at times τ , 2τ , 3τ , . . . can be obtained via repeated application of the approximate time-stepping operator, since exp(nτ L) = (exp(τ L)) n . The computational profile of the method is that it takes a moderate amount of work to construct the initial approximation to exp(τ L), but once it has been built, it can be applied very rapidly, even for large τ .
The efficiency of the proposed scheme is enabled by (i) a modified version of Damle et al. (2013) for constructing near optimal rational approximations to oscillatory functions such as e ix over arbitrarily long intervals, and by (ii) the development (Martinsson, 2013) of a high-order accurate and stable method for pre-computing approximations to operators of the form (τ L − α m ) −1 . The near optimality of the rational approximations ensures that the number 2M + 1 of terms needed for a given accuracy is typically much smaller than standard methods that rely on polynomial or rational approximations of L.
Although our main contribution is in combining the techniques in Damle et al. (2013) and Martinsson (2013) in order to yield an efficient and time-parallel means of applying the operator exponential, we also develop in this paper several key technical advances in each of the component algorithms. First, the sub-optimal rational approximations developed here require substantially fewer poles than in Damle et al. (2013) (and, in fact, are very close to optimal in the L ∞ norm). Secondly, the direct solver in Martinsson (2013) is modified in order to allow body loads.
The proposed scheme has several advantages over typical methods, including the apparent absence of stability constraints on the time step τ in relation to the spatial discretization, the ability to parallelize in the time variable over many characteristic wavelengths (in addition to any spatial parallelization), and great acceleration when integrating equation (1.1) for long times or for multiple initial conditions (e.g., when employing an exponential integrator on a nonlinear evolution equation, cf. Section 5). A drawback of the scheme is that it is more memory intensive than standard techniques. Another potential limitation is that, since the proposed technique relies on a spectral element discretization, the initial condition needs to be sufficiently smooth u(x, 0) in order to achieve spectral accuracy with respect to p-refinement.
The ability to solve (1.1) in a time-parallel manner can be used to construct efficient parallel-intime schemes for the fully nonlinear evolution equations in the presence of time scale separation (see Haut & Wingate, 2014). This extra source of parallelism can be useful if the speedup due to spatial parallelization saturates. In addition, even when the speedup from spatial parallelization is not saturated, the ability to parallelize with space-time blocks can have efficiency benefits. In fact, it is expected that the cost of communication versus computation decreases with a space-time decomposition relative to a purely spatial decomposition (heuristically, the 'surface-to-volume' ratio of a domain decomposition decreases in higher dimensions). As far as parallelization of the direct solver, the algorithm involves applying, at a number of levels that is logarithmic in the spatial grid size, small dense matrices that can all be applied independently of each other. Since applying many small dense matrices in parallel is amenable to efficient parallelization, we expect good parallel scaling for the direct solver; however, this issue needs to be explored further . We restrict the scope of this paper to when the application (τ L − α m ) −1 u 0 can be reduced to the solution of an elliptic-type partial differential equation (PDE) for one of the unknown variables. This situation arises in geophysical fluid applications (among others), including the rotating primitive equations HIGH-ORDER TIME-PARALLEL APPROXIMATION OF EVOLUTION OPERATORS 3 of 29 that are used for climate simulations. However, the direct solver presented in Section 2 is quite general, and in principle can be extended to first-order linear systems of hyperbolic PDEs with little modification (though such an extension is speculative and, in particular, has not been tried).

Time discretization
In order to time-discretize (1.1), we fix a time-step τ (the choice of which is discussed shortly), a requested precision 0 < δ < 1, and 'band-width' Λ ∈ (0, ∞) which specifies the spatial resolution (in effect, the scheme will accurately capture eigenmodes of L whose eigenvalues λ satisfy |λ| Λ). We then use an improved version of the scheme of Damle et al. (2013) to construct a rational function, and where P Λ projects functions onto the subspace spanned by eigenvectors of L with modulus at most Λ.
Here the only property of L that we use is that L is skew-Hermitian, and hence has a complete spectral decomposition with a purely imaginary spectrum.
In the absence of spatial discretization errors, the bound (1.4) ensures that the repeated application of R M (τ L) is stable on the entire imaginary axis. It also turns out that the number 2M + 1 of terms needed in the rational approximation in (1.3) is close to optimally small (for the given accuracy δ). It is important to point out that, when the above temporal discretization is coupled with the spatial discretization discussed in Section 2, stability analysis of the time-stepping method requires an error analysis for the direct solver discusses in Section 2.3, which is nontrivial and beyond the scope of this paper. We simply note that the stability of the time-stepping scheme is substantiated via extensive numerical experiments and using a broad range of different time steps τ . In all of these numerical experiments, we have not observed any instabilities, including when we choose large τ (resulting in hundreds of terms in (1.2)) and nonsmooth initial conditions. The scheme described above allows a great deal of freedom in the choice of the time step τ . While classical methods typically require the time step to be a small fraction of the characteristic wavelength, we have freedom to let τ cover a large number of characteristic wavelengths. Therefore, the scheme is well suited to parallelization in time, since all the inverse operators in the approximation of the operator exponential can be applied independently. In fact, the only constraint on the size of τ is on the memory available to store the representations of the inverse operators (as explained in Section 1.3, the memory required for each inverse scales linearly in the number of spatial discretization parameters, up to a logarithmic factor).  introduced to discretize (2.12) in Section 2.2. The figure shows a simplified case involving 4 × 4 squares, each holding a 6 × 6 local tensor product grid of Chebyshev nodes. The PDE (2.12) is enforced via collocation using spectral differentiation on each small square at all solid ('internal') nodes. At the hollow ('boundary') nodes, continuity of normal fluxes is enforced.
In serial, the efficiency of this approach is relatively insensitive to the number M of terms used, since the number of poles per wavelength in (1.5) is constant (and just larger than the lower bound imposed Nyquist constraint). However, since the M solution operators can be applied independently of one another, this approach allows the ability to parallelize the computation in time over a large number of characteristic wavelengths. In contrast, standard methods for applying the operator exponential are inherently serial in the time variable.

Pre-computation of rational functions of L
The time discretization technique described in Section 1.2 requires us to build explicit approximations to differential operators on the domain Ω such as (τ L − α m ) −1 . We do this using a variation of the technique described in Martinsson (2013). A variety of different domains can be handled, but for simplicity, suppose that Ω is a rectangle. The idea is to tessellate Ω into a collection of smaller rectangles, and to put down a tensor product grid of Chebyshev nodes on each rectangle, as shown in Fig. 1 represented via tabulation on the nodes, and then L is discretized via standard spectral collocation techniques on each patch. The patches are glued together by enforcing continuity of both function values and normal derivatives. This discretization results in a block sparse coefficient matrix, which can rapidly be inverted via a procedure very similar to the classical nested dissection technique of George (1973). The resulting inverse is dense but 'data-sparse,' which is to say that it has internal structure that allows us to store and apply it efficiently.
In order to describe the computational cost of the direct solver, let N denote the number of nodes in the spatial discretization. For a problem in two dimensions, the 'build stage' of the proposed scheme constructs 2M The build stage has asymptotic cost O(MN 1.5 ), and storing the matrices requires O(MN log(N)) memory. The cost of applying a matrix A m is O(N log(N)). (We remark that the cost of building the matrices {A m } M m=−M can often be accelerated to optimal O(MN) complexity (Gillman & Martinsson, 2014), but since the pre-factor in the O(MN 1.5 ) bound is quite small, such acceleration would have negligible benefit for the problem sizes under consideration here.) Section 2 describes the inversion procedure in more detail.
We remark that the spatial discretization procedure we use does not explicitly enforce that the discrete operator is exactly skew-Hermitian. The fact that the spatial discretization is done to very high accuracy means that it is in practice very nearly so, and numerical experiments also indicate that the scheme as a whole is stable in every regime where it was tested. However, a rigorous investigation of the numerical stability of the scheme is currently lacking and is the subject of future investigation.

Comparison to existing approaches
The approach of using proper rational approximations for applying matrix exponentials has a long history. In the context of operators with negative spectrum (e.g., for parabolic-type PDEs), many authors have discussed how to compute efficient rational approximations to the decaying exponential e −x , including using Cauchy's integral formula coupled with Talbot quadrature (cf. Schmelzer & Trefethen, 2007b), and optimal rational approximations via the Carathéodory-Fejer method (cf . Schmelzer & Trefethen, 2007b) or the Remez algorithm (Cody et al., 1969). However, such methods are less effective (or not applicable) when applied to approximating oscillatory functions such as e ix over long intervals. For computing functions of parabolic-type linear operators, the approach of combining rational approximations and compressed representations of the solution operators using so-called H-matrices has been proposed in Gavrilyuk et al. (2005).
Common approaches for applying the exponential of skew-Hermitian operators include high-order time-stepping methods, scaling-and-squaring coupled with Padé approximations (cf. Higham, 2005) or Chebyshev polynomials (cf. Bergamaschi & Vianello, 2000) and polynomial or rational Krylov methods (cf. Hochbruck & Lubich, 1997;Güttel, 2013). Krylov methods, in particular, have enjoyed enormous success due to their ability to handle very large problem sizes and their favourable approximation properties (see the review article Hochbruck & Ostermann, 2010). We note that rational Krylov methods also exhibit near optimal approximation properties (see Güttel, 2013). Note that all these methods iteratively build up rational or polynomial approximations to the operator exponential, and correspondingly approximate the spectrum e iω n τ of e τ L with polynomials or rationals. Therefore, the near optimality of (1.2) and the speed of applying the inverse operators in (1.5) will generally translate into high efficiency relative to standard methods (or, in the case of rational Krylov methods, comparable efficiency). Comparing the proposed method with these more standard approaches, one major advantage is that 6 of 29 T. S. HAUT ET AL. the method can be trivially parallelized in time over many characteristic wavelengths. The ability to parallelize in the time variable is of particular relevance to large-scale simulations in geophysical fluid applications, where the speedup from spatial parallelization alone is beginning to saturate.
In addition to approaches that rely on polynomial or rational approximations, let us mention two alternative approaches for time-stepping on wave propagation problems. The authors in Beylkin & Sandberg (2005) combine separated representations of multi-dimensional operators, partitioned low rank compressions of matrices, and (near) optimal quadrature nodes for band-limited functions, in order to compute compressed representations of the operator exponential over 1 − 2 characteristic wavelengths. Along different lines, the authors in Demanet & Ying (2009) use wave atoms to construct compressed representations of the (short time) operator exponential, and in particular can bypass the CFL constraint.

Outline of manuscript
The paper is organized as follows. In Section 2, we briefly describe the direct solver in Martinsson (2013). We then discuss in Section 3 a technique for constructing efficient rational approximations of general functions, and specialize to the case of approximating the exponential e ix and the phi-functions for exponential integrators (Cox & Matthews, 2002). In Section 4, we present applications of the method for both the 2D rotating shallow water equations and the 2D wave equation in inhomogenous medium. In particular, we compare the accuracy and efficiency of this approach against 4th order Runge-Kutta (RK4) and the Chebyshev polynomial method (in our comparisons, we use the same spectral element discretization). Finally, Appendix A contains error bounds for the rational approximations constructed here.

Spectral element discretization
This section describes how to efficiently compute a highly accurate approximation to the inverse operator (L − α) −1 , where L is a skew-Hermitian operator. As mentioned in the introduction, we restrict our discussion to environments where application of the inverse can be reformulated as a scalar elliptic problem. This reformulation procedure is illustrated for the classical wave equation and for the shallow water equations in Section 2.1. Section 2.2 describes a high-order multidomain spectral discretization procedure for the elliptic equation. Section 2.3 describes a direct solver for the system of linear equations arising upon discretization.

Reformulation as an elliptic problem
In many situations of practical interest, the task of solving a hyperbolic equation (L − α)u = f , where L is a skew-Hermitian operator, can be reformulated as an associated elliptic problem. In this section, we illustrate the idea via two representative examples. Example 1 is of particular relevance to geophysical fluid applications, which serve as a major motivation of this algorithm.
Example 1 (the shallow water equation) We consider the rotating shallow water equations in the square domain x ∈ [0, 1] × [0, 1] with periodic boundary conditions: ) denotes the fluid velocity, η(x, t) denotes perturbed surface elevation, f is the (constant) Coriolis frequency and For simplicity, we assume that the prescribed initial velocity components v j (x, 0) ∈ L 2 ([0, 1] × [0, 1]) and the initial surface elevation η(x, t) ∈ L 2 ([0, 1]) are continuous (so that pointwise evaluation on the spectral element grid is well-defined). This condition can likely be relaxed to piecewise smooth initial conditions, but is outside the scope of the current paper.
We write system (2.1) in the form We note that the method generalizes to nonconstant coefficient f and more general domains and boundary conditions in a transparent manner, and is of particular relevance for a spectral element discretization on the cubed sphere. In fact, a spatial domain that is composed of a union of squares can be easily be handled by the direct solver. By smoothly mapping curvilinear patches near the boundary to square patches, in theory more general domains can be handled without much difficulty. Of particular relevance to geophysical fluid applications (which serve as a big motivation), we plan to explore the proposed method for the rotating shallow water equations on the cubed sphere. In this setup, six square patches are mapped to patches on the sphere. The direct solver algorithm remains essentially unchanged, except that the elliptic equation associated with the RSW equations now contain nonconstant coefficients that reflect the underlying geometry, which is also easily handled by the proposed scheme.
In order to apply the method in this paper, we use the standard fact (cf. Paldor & Sigalov, 2011) then η satisfies the elliptic equation Here A α is defined by When f is constant, equation (2.4) reduces to Example 2 (the wave equation) Consider the wave propagation problem where κ(x) κ 0 > 0 is a smooth uniformly positive function, the initial conditions u(x, 0) and u t (x, 0) are prescribed square integrable and continuous functions, and periodic boundary conditions are used.
In order to apply the method in this paper, we reformulate (2.7) as a first-order system in both time and space by defining v = u t , w = u x and z = u y . Then we have that ⎛ Here the scalar function u to the original system (2.7) can be recovered after the final time step by solving the elliptic equation Δu = w x + z y .
To apply the method in this paper, we compute the solution to as follows. First, solving for w and z in terms of v, Once v is known, w and z can then be computed directly via (2.10).

Discretization
In this section, we describe a high-order accurate discretization scheme for elliptic boundary value problems (BVPs) such as (2.6) and (2.11) which arise in the solution of hyperbolic evolution equations. Specifically, we describe the solver for a BVP of the form where B is an elliptic differential operator. To keep things simple, we consider only square domains Ω = [0, 1] 2 , but the solver can easily be generalized to other domains. The solver we use is described in detail in Martinsson (2015), our aim here is merely to give a high-level conceptual description.
HIGH-ORDER TIME-PARALLEL APPROXIMATION OF EVOLUTION OPERATORS 9 of 29 The PDE (2.12) is discretized using a multidomain spectral collocation method. Specifically, we split the square Ω into a large number of smaller squares (or rectangles), and then put down a tensor product grid of p × p Chebyshev nodes on each small square, see Fig. 1. The parameter p is chosen so that dense computations involving matrices of size p 2 × p 2 are cheap (p = 20 is often a good choice). Let {x j } N j=1 denote the total set of nodes. Our approximation to the solution u of (2.12) is then represented by a vector u ∈ C N , where the j'th entry is simply an approximation to the function value at node x j , so that u j ≈ u(x j ). The discrete approximation to (2.12) then takes the form The j'th row of (2.13) is associated with a collocation condition for node x j . For all j for which x j is a node in the interior of a small square (filled circles in Fig. 1), we directly enforce (2.12) by replacing all differentiation operators by spectral differentiation operators on the local p × p tensor product grid. For all j for which x j lies on a boundary between two squares (hollow squares in Fig. 1), we enforce that normal fluxes across the boundary are continuous, where the fluxes from each side of the boundary are evaluated via spectral differentiation on the two patches (corner nodes need special treatment; see Martinsson, 2015).

Direct solver
The discrete linear system (2.13) arising from discretization of (2.12) is block-sparse. Since it has the typical sparsity pattern of a matrix discretizing a 2D differential operator, it is possible to compute its LU factorization in O(N 1.5 ) operations using a nested dissection ordering of the nodes (Duff et al., 1986;George, 1973) that minimizes fill-in. Once the LU-factors have been computed, the cost of a linear solve is O(N log N). In the numerical computations presented in Section 4, we use a slight variation of the nested-dissection algorithm that was introduced in Martinsson (2013) for the case of homogeneous equations. The extension to the situation involving body loads is straightforward, see Martinsson (2015). We note that by exploiting internal structure in the dense sub-matrices that appear in the factors of B as the factorization proceeds, the complexity of both the factorization and the solve stages can often be reduced to optimal O(N) complexity (Gillman & Martinsson, 2014). However, for the problem sizes considered in this manuscript, there would be little practical gain to implementing this more complex algorithm.

Constructing rational approximations
We now discuss how to construct efficient rational approximations to smooth functions f (x) defined on the real line. For concreteness, we consider approximating the phi-functions that arise for high-order exponential integrators (cf. Schmelzer & Trefethen, 2007a and the review article Hochbruck & Ostermann, 2010). By considering the real and imaginary components separately, we assume that f (x) is real-valued (it turns out that the poles in the approximation will be the same for the real and imaginary components, as explained shortly). The construction proceeds in two steps; the second step is actually a pre-computation and needs only be done once, but is presented last for clarity. (see Section 3.1 for details), Here h is inversely proportional to the bandlimit of f (x), and M controls the interval Λ over which the approximation is valid (roughly |x| Mh). When f (x) = e ix , the coefficients are explicitly given by b m = (ψ h (1)/h)e −2πimh , and the approximation is remarkably accurate (see (3.6) for error bounds). Secondly, using the approach in Damle et al. (2013), a rational approximation to ψ 1 (x) = (4π) −1/2 e −x 2 /4 is constructed over the real line (see Section 3.2 for details), ( 3.2) Note that the imaginary parts of the poles in the above approximation are integer multiples j = 0, ±1, . . . , ±L. For L = 11, we construct μ and coefficients a j such that the L ∞ approximation error δ 2 satisfies δ 2 < 10 −12 (see Table 1). Finally, combining (3.1) and (3.2), we obtain a rational approximation to f (x), Here the coefficients c n are given by Importantly, constructing the rational approximation (3.2) to ψ(x) needs only be done once. In particular, once μ and the coefficients a j are pre-computed, rational approximations to general functions f (x) over arbitrarily long-spatial intervals can be obtained with minimal effort, as discussed in Section 3.1. We present μ, and the coefficients a j , j = −11, . . . , 11, in Table 1, which are sufficient to yield an L ∞ error δ 1 ≈ 7 × 10 −13 in (3.2).
Using the reduction algorithm in Haut & Beylkin (2012), we find that the rational approximation constructed for e ix is close to optimal in the L ∞ norm, for a given accuracy δ and spatial cutoff A. In fact, the construction in this paper uses only 1.2 times more poles than the near optimal rational approximation obtained from Haut & Beylkin (2012) when δ = 10 −10 and Λ = 56π , which we use in our numerical experiments. We chose Λ = 56π in the numerical experiments to demonstrate that the computation can in theory be parallelized over hundreds of characteristic wavelengths, but this choice for Λ is otherwise arbitrary and can be taken smaller or larger depending on the application. We note that the residues corresponding to this near optimal approximation can be very large and, for this reason, we prefer to use the sub-optimal approximation instead.
As clarified in Sections 3.1 and 3.2, the same poles can be used to approximate multiple functions with the same bandlimit. For example, we can use the same poles to approximate all functions e 2πitx , for 0 t 1, since all these functions have bandlimit less than or equal to e 2πix ; the dependence on t is only through the coefficients, which are given explicitly by b m = (ψ h (t)/h)e −2πinth . In particular, the poles α m = h(μ + im) are independent of t and yield uniformly accurate approximations to e itx on the same interval [−Λ, Λ]. This observation enables the efficient computation of multiple operator exponentials e s k L u 0 , for s k = tk/L, using the same computed solutions (tL − α m ) −1 u 0 , m = 1, . . . , M . A similar comment applies to the phi-functions from exponential integrators.
Generally, any rational approximation to e ix (or more general functions) must share the same number of zeros within the interval of interest; in particular, since the rational approximation can be expressed as a quotient of polynomials, it is therefore subject to the Nyquist constraint. However, one advantage of this approximation method is that it allows efficient rational approximations of functions that are spatially localized. In fact, since the approximation (3.1) involves highly localized Gaussians, the subsequent rational approximations are able to represent spatially localized functions as well as highly oscillatory functions using (perhaps a subset) of the same collection of poles. This allows the ability to take advantage of spectral gaps (e.g., from scale separation between fast and slow waves) and possibly bypass the Nyquist constraint under certain circumstances.

Gaussian approximations to a general function
We discuss how to construct the approximation (3.1). To do so, we choose h small enough that the Fourier transformf (ξ ) is zero (or approximately so) outside the interval [−1/(2h), 1/(2h)]. Then we can expandf (ξ )/ψ h (ξ ) in a Fourier series, where Transforming (3.4) back to the spatial domain, we have that Note that the functions ψ h (x + mh) are tightly localized in space, and truncating the above series from −M to M yields accurate approximations for Similarly, for functions ϕ 1 (x) and ϕ 2 (x), the coefficients c m can be obtained numerically using the fact thatφ For example, the coefficients c m , e.g., φ 1 (x) can be computed via discretization of the integral, In Fig. 2, we plot the error, for the phi-functions ϕ 1 (x) and ϕ 2 (x), where we choose h = 1 and M = 200; note that the choice of h corresponds to the bandlimit of ϕ j (x). As shown in Fig. 2, the error is smaller than ≈ 3 × 10 −13 for all −191 x 191, and is shown to begin to rise at the ends of the intervals, which are close to Mh. This behaviour can be understood by noting that where we used that the support ofφ j is contained in [− 1 2 , 1.2]. Since the functions ψ 1 (x + m) for m > M decay rapidly away from x = −m, the error from truncation is negligible when |x| (M − m 0 ) and m 0 = O(1).
We remark that, for the function e ix , it can be shown (see Appendix) that the approximation for e ix satisfies where c m is defined in (3.5). We see that the first sum is negligible, e.g., h 1, owing to the tight frequency localization of ψ h . Similarly, the second sum is negligible when |x| (M − m 0 )h and m 0 = O(1), owing to the tight spatial localization of ψ.

Rational approximation to a Gaussian
We now discuss how to construct the approximation (3.2). To do so, we first use Adamyan-Arov-Krein (AAK) theory (see Damle et al., 2013 for details) to construct a near optimal rational approximation, For an accuracy of δ ≈ 10 −13 , 13 poles γ j are required.
14 of 29 T. S. HAUT ET AL. Setting μ = min j Re(α j ), we next look for a rational approximation to (4π) −1/2 e −x 2 /4 of the form where we take L = 11. We find the coefficients a j by minimizing the L ∞ error where the points x n ∈ [−30, 30] are chosen to be more sparsely distributed outside the numerical support of e −x 2 /4 ; the interval [−30, 30] is found experimentally to yield high accuracy for the approximation over the entire real line. Finding the coefficients a j , j = −L, . . . , L, that minimize the L ∞ error can be cast as a convex optimization problem, and a standard algorithm can be used (we use Mathematica). The resulting approximation error is shown in Fig. 3; the error remains less than ≈ 7 × 10 −13 for all x ∈ R.
We display the real number μ, and the coefficients a j , j = 1, . . . , 11 in Table 1. In particular, these numbers are the only parameters that are needed in order to construct rational approximations to general functions on spatial intervals of any size.
In Fig. 4, we show the resulting rational approximations of cos(2π x) and sin(2π x), which use the same 172 complex-conjugate pairs of poles; the L ∞ error is seen to be ≈ 10 −10 over the interval −28 x 28. Table 1 Coefficients a j , j = −11, . . . , 11 and number μ, in

Constructing rational approximation of modulus bounded by unity
For our applications, it is important that the approximation to e ix is bounded by unity on the real line. In particular, the Gaussian approximation for e ix constructed in Section 3.1 has absolute value larger than one when |x| ≈ Mh, and this can lead to instability in repeated applications of e tL .
The basic idea is to construct a rational function S(ix) that satisfies S(ix) ≈ 1 for |x| M 0 h and S(ix) ≈ 0 for |x| M 0 h. As long as M 0 is slightly less than M , the function S(ix)R M (ix) accurately approximates e ix for |x| M 0 h, and decays rapidly to zero for |x| M 0 h. Therefore, |S(ix)R M (ix)| 1 for all x ∈ R, and repeated application of S(tL)R M (tL)u 0 is stable for all t > 0. In Fig. 5, we plot a rational filter that uses 33 complex-conjugate poles; we see that |S(ix) − 1| ≈ 10 −10 for −28 x 28.
Although the above approach results in a stable method, a slightly modified version can reduce the amount of computation by a factor of 2. This is motivated by the following simple observation: since (3.8) Recalling that the poles from Section 3.2 come in complex-conjugate pairs, only half the matrix inverses need to be pre-computed and applied if (3.8) is used. However, directly using ( The fix is to eliminate the errors in the high frequency components by instead computing S(k 0 Δ)R M (tL)u 0 , where the parameter k 0 is adjusted experimentally so that (1) S(k 0 Δ)u 0 − u 0 2 is smaller than the desired approximation accuracy (2) Decreasing k 0 by a factor of 2 results in the error S(k 0 Δ)u 0 − u 0 2 being larger than the desired approximation accuracy The error and the stability of the time-stepping scheme appears relatively insensitive to the precise choice of k 0 and the above procedure has sufficed for all problems we have examined. We note that the transition region between S(ix) ≈ 1 and S(ix) ≈ 0 can be made arbitrarily small (see Fig. 5), and the rational function S(ix) uses a number of poles that scales logarithmically in the width of the transition region.
We now discuss how to construct the rational filter function S(ix): ( 3.9) The poles β m and residues d m are explicitly given in Tables 2 and 3 for M 1 = 33 (see also Fig. 5).
To do so, we use that (see Müller & Varnhorn, 2007) 1 which follows from the Poisson summation formula. For h 1, the right-hand side is negligible, owing to the tight frequency localization ofψ h (ξ ). Truncating the above sum and using the tight spatial localization of ψ h (x), we see that the function is approximately unity for |x| M 0 h, and decays to zero rapidly when |x| M 0 h. It also holds out that |χ(x)| 1 for all x ∈ R. Therefore, using the techniques from Sections 3.1 and 3.2, we construct a 18 of 29 T. S. HAUT ET AL. Table 2 Poles for rational filter function S(ix), β j , j = 1, . . . , 33, in (3.9) β 1 = (− 5.6815244593211665, 195.60368900644573 The number of poles required to represent the sub-optimal approximation for Q(x) can be drastically reduced with the reduction algorithm (Haut & Beylkin, 2012), which produces another proper rational function S(x) such that We apply the algorithm in the spatial domain [0, 1] × [0, 1], using periodic boundary conditions and a constant Coriolis force f = 1. In this case, an exact solution can be computed analytically since the matrix exponential is diagonalized in the Fourier domain, and can be rapidly applied via the fast Fourier transform. In particular, Explicit expressions for the eigenvectors r l k can be found in Majda (2003). We first compare the accuracy and efficiency of applying e nτ L u 0 , for τ = 3 and n = 1, . . . , 10, against RK4 and against using Chebyshev polynomials. In particular, the Chebyshev method uses the approximation coupled with the standard recursion for applying T k (ΔtL).
For stability, Δt must be chosen so that the spectrum of the spatially discretized version of the operator ΔtL is O(1) in magnitude (otherwise the terms in the sum (4.1) get very large and this results in catastrophic cancellation).
We choose a polynomial degrees of 12 and 9 for the high-accuracy and low-accuracy simulations, respectively, which we find experimentally is a good compromise between the time step size Δt needed for a given accuracy, and the number of applications of L. In all the time-stepping schemes, we use the same spectral element discretization and parameter values as described above. All the algorithms are implemented in Octave, including the direct solver described in Section 2. The rational filter parameter k 0 is chosen according to the steps discussed in Section 3.3. For the all the following numerical examples, when applying the operator exponential using the rational approximation (1.5), we take M = 160 in (3.3), which results in 376 overall terms in (1.5) ((2 × (160 + 11) + 1 + 33) terms, with 33 coming from the rational filter function); the parameter h in (3.3) is taken to be 1 3 for the lower accuracy simulations and 1 5 for the higher accuracy simulations.

First test case for the shallow water equations
We first consider the initial conditions: For these initial conditions, we use 6 × 6 = 36 elements of equal area, and 16 × 16 = 256 Chebyshev quadrature nodes for each element. To assess the accuracy of the method, the exponential e nτ L u 0 is applied in the Fourier domain.
When applying the operator exponential using the rational approximation (1.5), we take M = 160 and h = 1 5 in (3.3), which results in 376 overall terms in (1.5) ((2 × (160 + 11) + 1 + 33) terms, with 33 coming from the rational filter function).We also choose a big time step τ = 3; with this overall choice of parameters, this results in an L ∞ error of 3.4 × 10 −10 for a single (large) time step.
For this choice of parameters in the spectral element discretization, the cost of applying the solution operator of (2.3)-i.e., forming the right hand side of (2.6), solving (2.6), and evaluating (2.5)-is about 4.5 times more expensive than the cost of applying the forward operator (2.2) directly.
For the three time-stepping methods, the L ∞ errors in the approximation of e nτ L u 0 , n = 1, . . . , 10, are plotted in Fig. 6(a). Similarly, the total computation times (in minutes) of approximating e nτ L u 0 , n = 1, . . . , 10, are plotted in Fig. 6(b) (this includes the pre-computation time for representing the inverses). From Fig. 6(a), we see that the L ∞ errors from all three methods remain < 10 −8 for 22 of 29 T. S. HAUT ET AL. . . ,10. From Fig. 6(b), we see that the first time step for the rational approximation method is about half the cost of both RK4 and the Chebyshev polynomial method. However, subsequent time steps for the new method is about 40 times cheaper than both RK4 and the Chebyshev polynomial method (for about the same accuracy).

4.1.2
Second test case: doubling the spatial resolution Next, we compute e τ L u 0 , τ = 1.5, with the initial conditions: In particular, we double the bandlimit in each direction. In each of the time-stepping schemes, we use 12 × 12 = 144 elements of equal area, and 16 × 16 = 256 Chebyshev quadrature nodes for each element. We again use the same parameters for the rational approximation (1.5) as described in Section 4.1.1.
We only examine the error and computation time for one big time step. For the rational approximation method, we present both the pre-computation time for obtaining data-sparse representations of the 376 inverses in (1.5), and the computation time for applying the approximation in (1.5) (once the data-sparse representations are known). The results are summarized in Table 4. Since we only consider a single time step, the pre-computation time and application time are included separately. The main conclusion to draw from these results is that doubling the spatial resolution does not appreciably change the relative efficiency of the three time-stepping methods (once representations for the inverse operators in (1.5) are pre-computed).

Third test case: applying the operator exponential over a long time interval
We now access the accuracy of the new method when repeatedly applying e τ L , τ = 1, in order to evolve the solution over longer time intervals. In this example, we use the initial conditions: η(x) = exp(−100((x − 1/2) 2 + (y − 1/2) 2 )), v 1 (x) = cos(6π x) cos(4π y) − 4 sin(6π x) sin(4π y), v 2 (x) = cos(6π x) cos(6π y).  Fig. 7. Plot of the L ∞ error, u n − e nτ L u 0 ∞ , versus the big time step nτ , where τ = 1 and 1 n 300. Here u n denotes the numerical approximation to e nτ L u 0 , as computed by the rational approximation (1.5) and the direct solver from Section 2.
Note that these initial conditions cannot be expressed as a finite sum of eigenfunctions of L. We use the same spatial discretization parameters as in Section 4.1.2.
In Fig. 7, we show the L ∞ error of the computed approximation u n (x) to u(x, nτ ), n = 1, . . . , 300. As expected, the error increases linearly in the number of applications of the exponential. Note that, due to the large step size of τ = 1, the error accumulates slowly in time and the solution can be propagated with high accuracy over a large number of characteristic wavelengths. 4.1.4 Fourth test case: applying the operator exponential with lower accuracy We now repeat the first example 4.1.1, but this time using lower accuracy for the temporal discretization. In particular, we again use use M = 376 inverses for the rational approximation. However, we take a both smaller accuracy by choosing h = 1 3 in the rational approximation (3.3), as well as a larger single (large) time step τ = 5. This choice of parameters results in an L ∞ error of 4.04 × 10 −6 for a single (large) time step.
The results are summarized in Table 5. From this table, we see that the pre-computation time needed to represent the M = 376 solution operators in (1.5) is 17.8 min, and the computation time needed to apply the exponential is 0.925 min; the final accuracy in the L ∞ norm is given by 4.0 × 10 −6 . For the Chebyshev polynomial method, we used degree 9 polynomials and a time step of 0.004; the overall time for the application of the exponential is 23.3 min, and the final accuracy is given by 3.7 × 10 −6 . Finally, for RK4 we used a time step of 0.002; using RK4 takes an overall time of 5.44 min, with a final accuracy of 1.37 × 10 −5 . 4.1.5 Fifth test case: applying the operator exponential to a nonsmooth initial condition We now repeat the third example 4.1.3 for long time simulations, but this time we use an initial condition with a 24 of 29 T. S. HAUT ET AL.  Fig. 8. Plot of the L ∞ error, u n − e nτ L u 0 ∞ , versus the big time step nτ , where τ = 3 and 1 n 170. Here the initial condition in Section 4.1.5 is used, which contains a cusp-type singularity, and u n denotes the numerical approximation to e nτ L u 0 obtained from the rational approximation method.
cusp-type singularity. In particular, we use τ = 3, M = 376 inverses for the rational approximation, the same spectral element grid from Section 4.1.1, and initial conditions: (4.5) Note that the interface variable η now has a cusp-type singularity at ( 1 2 , 1 2 ).  Fig. 8, we show the L ∞ error of the computed approximation u n (x) to u(x, nτ ), n = 1, . . . , 300. As expected, the singularity in η results in essentially no accuracy (due to a lack of spatial resolution in the spectral element grid). However, note that the time-stepping scheme itself remains stable throughout the evolution.

Example 2
4.2.1 First test case: applying the operator exponential with high accuracy In our second example, we consider the wave propagation problem where κ(x) > 0 is a smooth function, the initial conditions u(x, 0) and u t (x, 0) are prescribed, and periodic boundary conditions are used.
Since the procedure and results are similar to those in Section 4.1, we simply test the efficiency and accuracy of this method over a single time step τ = 1.5. In particular, we compare the accuracy and efficiency for one application e τ L u 0 , τ = 1.5, against RK4 and against using Chebyshev polynomials. In our numerical experiments, we use the initial condition u(x, y, 0) = sin(2π x) sin(2π y) + sin(4π x) sin(4π y), ( 4.7) and u t (x, y, 0) = 0. We also use κ(x, y) = 3 + sin(4π x) 4 1/2 3 + sin(4π y) 4 1/2 . Finally, in the spatial discretization, we use 12 × 12 = 144 elements with 16 × 16 = 256 points per element (for all three time-stepping methods), and M = 376 poles in (1.5). For these parameters, the time to apply the inverse of (2.9)-which involves forming the right hand side in (2.11), solving for v, and computing (2.10)-is about 5.2 times more expensive than directly applying the forward operator (2.8).
Unlike Section 4.1, the operator exponential is not diagonalized in the Fourier domain. To assess the accuracy, we use the Chebyshev polynomial method with a small enough step size to yield an estimated error of < 10 −10 . In particular, we verify that the L ∞ residual, u(x, t; Δt) − u(x, t; Δt/2) ∞ , using numerical approximations to u(x, t) computed with step sizes Δt and Δt/2 and the Chebyshev polynomial method, is < 10 −10 . We then use u(x, t; Δt/2) as a reference solution.
26 of 29 T. S. HAUT ET AL. Table 7 Comparison of three methods for a lower accuracy computation of the operator exponential e tL u 0 and t = 2.5, for system (2.8) and u 0 in (4.7). The comparison uses RK4, Chebyshev polynomials and the rational approximation (1.5); in the spatial discretization of all three comparisons, 12 × 12 = 144 elements and 16 × 16 = 254 Chebyshev quadrature nodes per element are used e tL , t = 2.5 L ∞ error Time (min.) Pre-comp. (min.) Rational approx., M = 376 terms 1.1 × 10 −6 4.0 121 RK4 1.4 × 10 −6 16.2 NA Cheby. poly., degree 9 8.5 × 10 −6 73.8 NA The results are summarized in Table 6. From this table, we see that the pre-computation time needed to represent the M = 376 solution operators in (1.5) is 113.4 min, and the computation time needed to apply the exponential is 3.7 min; the final accuracy in the L ∞ norm is given by 1.6 × 10 −9 . For the Chebyshev polynomial method, 575 time steps of size Δt ≈ .0026 are taken, for an overall time of 57 min; the final accuracy is given by 3.5 × 10 −8 . Finally, for RK4, 7, 500 time steps of size Δt = 1 5 × 10 −3 are taken, for an overall time of 63.9 minutes; the final accuracy is 3.5 × 10 −10 .

Second test case: applying the operator exponential with lower accuracy
We now apply the operator exponential with lower accuracy, using the same initial conditions and spatial parameter values as Section 4.2.1. In particular, we again use M = 376 inverses for the rational approximation, but instead choose a larger temporal discretization parameter h = 1 3 in the rational approximation (3.3), as well as a larger single (large) time step τ = 5 2 , which results in both ≈ 10 −6 accuracy. We verify stability of the scheme for many time steps (though we only show efficiency and accuracy comparison for a single time step).
We assess the accuracy of all three methods by against the solution obtained via use of Chebyshev polynomial method with degree 12 polynomials and Δt = .0025/2. The results are summarized in Table 7. For the rational approximation method, the total precomputation time for computing a representation of the operator exponential is 121 min, and the time for a single application of the operator exponential is 4.0 min; the final accuracy is given by 1.1 × 10 −6 . Similarly, for the Chebyshev polynomial method, the total time for applying the operator exponential using a time step of 0.0025 and degree 9 polynomials is 73.8 min, with a final accuracy of 8.5 × 10 −6 . Finally, the time for applying the operator exponential using RK4 and a time step of 0.00125 is 16.2 min, with a final accuracy of 1.41 × 10 −6 .

Generalizations
The manuscript presents an efficient technique for explicitly computing a highly accurate approximation to the operator ϕ(τ L) for the case where L is a skew-Hermitian operator and where ϕ(t) = e t , so that ϕ(τ L) is the time-evolution operator of the hyperbolic PDE ∂u/∂t = Lu. The technique can be extended to more general functions ϕ. In particular, in using exponential integrators (cf. Cox & Matthews, 2002), it is desirable to apply functions ϕ j (τ L), where ϕ j (·) are the so-called phi-functions. In Section 3, we presented (near) optimal rational approximations of the first few phi-functions. An important property of these representations is that the same poles can be used to simultaneously apply