Equilibrated flux a posteriori error estimates in $L^2(H^1)$-norms for high-order discretizations of parabolic problems

We consider the a posteriori error analysis of fully discrete approximations of parabolic problems based on conforming $hp$-finite element methods in space and an arbitrary order discontinuous Galerkin method in time. Using an equilibrated flux reconstruction, we present a posteriori error estimates yielding guaranteed upper bounds on the $L^2(H^1)$-norm of the error, without unknown constants and without restrictions on the spatial and temporal meshes. It is known from the literature that the analysis of the efficiency of the estimators represents a significant challenge for $L^2(H^1)$-norm estimates. Here we show that the estimator is bounded by the $L^2(H^1)$-norm of the error plus the temporal jumps under the one-sided parabolic condition $h^2 \lesssim \tau$. This result improves on earlier works that required stronger two-sided hypotheses such as $h \simeq \tau$ or $h^2\simeq \tau$; instead our result now encompasses the practically relevant case for computations and allows for locally refined spatial meshes. The constants in our bounds are robust with respect to the mesh and time-step sizes, the spatial polynomial degrees, and also with respect to refinement and coarsening between time-steps, thereby removing any transition condition.


Introduction
We consider the heat equation where Ω ⊂ R d , 1 ≤ d ≤ 3, is a bounded, connected, polytopal open set with Lipschitz boundary, and T > 0 is the final time. We assume that f ∈ L 2 (0, T ; L 2 (Ω)), and that u 0 ∈ L 2 (Ω). We are interested here in the a posteriori error analysis in the L 2 (H 1 )-norm of fully discrete numerical methods for (1.1). In particular, we consider an arbitrary-order discontinuous Galerkin finite element method (DGFEM) in time, coupled with a conforming hp-FEM in space. A posteriori error estimates should ideally provide guaranteed upper bounds on the error, without unknown constants. Otherwise, if the estimators constitute an upper bound on the error up to an unknown constant, then we say instead that the estimators are reliable. Furthermore, the estimators should be locally efficient, meaning that the local estimators should lie below the error measured in a local space-time neighbourhood, up to a generic constant. Finally, the estimators should ideally be robust, with all constants in the bounds being independent of all discretization parameters. Furthermore, on a practical side it is highly desirable that the estimators be locally computable. We refer the reader to [Verfürth(2013)] for an introduction to these concepts. Our motivation for considering the heat equation (1.1) as a model problem is that the a posteriori error estimates developed in this context serve as a starting point for extensions to diverse applications, for example nonlinear problems (see [Amrein & Wihler(2016), Di Pietro et al.(2015, Dolejší et al.(2013), Kreuzer(2013)]), as well as playing a central role in adaptive algorithms (see [Chen & Feng(2004), Gaspoz et al.(2016), Kreuzer et al.(2012)]). For nonconforming discretization methods in space, we refer to [Ern & Vohralík(2010)] as well as [Georgoulis et al.(2011), Nicaise & Soualem(2005)].
The literature shows that the structure of parabolic problems leads to several outstanding challenges facing the central goals in a posteriori error estimation. In particular, several difficulties arise in the analysis of the efficiency and robustness of the estimators. To explain some of the challenges, first recall that the a posteriori error analysis of parabolic problems admits a range of norms in which to measure the error: for instance, these include the L 2 (H 1 )-norm (see [Picasso(1998), ]), L 2 (L 2 )-norm (see ]), L ∞ (L 2 )-norms and L ∞ (L ∞ )-norms (see [Eriksson & Johnson(1995)]), L ∞ (L 2 )∩L 2 (H 1 )-norms (see [Lakkis & Makridakis(2006), Makridakis & Nochetto(2003), Schötzau & Wihler(2010)] and more recently [Georgoulis et al.(2017)]), and L 2 (H 1 ) ∩ H 1 (H −1 )-norms (see [Bergam et al.(2005), Ern & Vohralík(2010), Gaspoz et al.(2016), Nicaise & Soualem(2005), Repin(2002), Verfürth(2003)]). To our knowledge, efficiency results have so far only been proved in the case of the L 2 (H 1 ) norm and of the L 2 (H 1 ) ∩ H 1 (H −1 ) norm. Although no analysis of efficiency is yet available in the setting of other norms, the optimal order of convergence of the estimators has nonetheless been observed in [Lakkis & Makridakis(2006), Makridakis & Nochetto(2003)] for instance. The efficiency results in the L 2 (H 1 ) norm have been attained under restrictions linking mesh and time-step sizes, whereas in the L 2 (H 1 ) ∩ H 1 (H −1 ) norm, such restrictions have been removed. It is important to observe that these two functional settings admit an inf-sup theory for the continuous problem that establishes an equivalence between appropriate norms of the error and of the residual. However, a difference between these two settings is that for L 2 (H 1 )∩H 1 (H −1 )-estimates, the dual norm on the residual localizes straight-forwardly with respect to time, whereas this is not the case for the L 2 (H 1 )-estimates.
A posteriori error estimators in the L 2 (H 1 )-norm for a class of nonlinear parabolic problem have been studied in ]. In particular,  found that the ratio between the constants in the upper and lower bounds for the error by the estimators depends on 1 + τ h −2 + τ −1 h 2 + |log h|, see [Verfürth(1998), Prop. 4.1], where h denotes the spatial mesh size and τ denotes the time-step size, and thus the efficiency of the estimators is subject to the assumption that τ h 2 . [Picasso(1998)] studied implicit Euler discretizations of the heat equation: under the assumption that τ h, he showed that the spatial estimator can be bounded from above by the L 2 (H 1 )norm of the error plus the temporal jump estimator; in particular, the temporal jump estimator, denoted by ε n K in [Picasso(1998), eq. (2.11)], appears on the right-hand side of the lower bound in [Picasso(1998), eq. (2.24)]. In both [Picasso(1998), ], the two-sided restrictions between the time-step and mesh sizes have the disadvantage of necessarily requiring that the meshes must be quasi-uniform, and thus theoretically prohibiting adaptive refinement.
Starting with [Verfürth(2003)], one approach to removing these two-sided restrictions has been to consider a different functional framework for the a posteriori error analysis, namely by estimating the L 2 (H 1 ) ∩ H 1 (H −1 )-norm of the error. Part of the justification of this approach is to be found in the observation in [Verfürth(2003), p. 198, Par. (5)], showing that the estimators of [Picasso(1998), ] are upper bounds to not only the L 2 (H 1 )-norm of the error, but also the L 2 (H 1 ) ∩ H 1 (H −1 )-norm of the error, up to data oscillation. It was then shown in [Verfürth(2003)] that these estimators are efficient, locally-in-time yet only globally-in-space, with respect to the L 2 (H 1 ) ∩ H 1 (H −1 )-norm of the error, without requiring conditions between mesh and time-step sizes; see also [Bergam et al.(2005)]. Given that the estimators used in both frameworks are the same up to data oscillation, it is of course natural that more general efficiency results are obtainable when including the H 1 (H −1 ) part of the norm, since it allows for the appearance of additional terms on the right-hand side in the efficiency bounds.
Recently, we developed in [Ern et al.(2017b)] a posteriori error estimators, based on equilibrated fluxes, for arbitrary order discretizations of parabolic problems within the L 2 (H 1 ) ∩ H 1 (H −1 )-norm setting, that are guaranteed, locally efficient, and robust. In particular, the analysis does not require any coupling between mesh and time-step sizes, and overcomes the problem of obtaining local-in-space and local-in-time efficiency by considering a natural extension of the L 2 (H 1 ) ∩ H 1 (H −1 )-norm to the time-nonconforming approximation space. The estimators are robust not only with respect to the mesh and time-step sizes, but also with respect to the polynomial degrees in space and time, and also with respect to mesh coarsening and refinement, thereby removing the so-called transition conditions previously encountered in [Verfürth(2003)]. These results are built upon the analysis for elliptic problems in [Braess et al.(2009), Ern & Vohralík(2010, Ern & Vohralík(2015), Ern & Vohralík(2016)]. We refer to [Dolejší et al.(2016)] for numerical experiments showing the performance of these estimators in practice.
In this work, we present a posteriori error estimates for the L 2 (H 1 )-norm of the error, which are based on the same locally computable equilibrated flux as in [Ern et al.(2017b)], thereby showing that the same methodology can be used in the L 2 (H 1 )-norm estimates as for the L 2 (H 1 ) ∩ H 1 (H −1 )-norm. Our main contributions, presented in Theorem 5.1 in section 5 below, include guaranteed upper bounds for the L 2 (H 1 )-norm of the error, and local-in-space-and-time lower bounds for the spatial estimator under the one-sided condition h 2 τ . We therefore remove the need for the two-sided conditions encountered previously, and we note that the assumptions in [Picasso(1998),  were stronger than our assumption. We emphasize that the regime where h 2 τ is of practical interest in computations, since implicit methods offer the possibility for large time-steps. This condition connecting spatial and temporal resolutions is apparently related to the localisation of the dual norm: the error in the L 2 (H 1 )-norm is connected to the dual norm of the residual for a space of test functions in L 2 (H 1 ) ∩ H 1 (H −1 ) featuring regularity across both time and space, as shown in Theorem 2.1 below. Therefore the dual norm of the residual is not trivially localized in time. In comparison, for estimates of the error in the L 2 (H 1 ) ∩ H 1 (H −1 )-norm, the corresponding dual norm of the residual does localize in time because the test space there is L 2 (H 1 ), see, e.g., [Ern et al.(2017b), Theorem 2.1]. It is still unclear whether the condition h 2 τ is really necessary or just technical. It is, however, worth noting that the recent adaptive algorithm from [Gaspoz et al.(2016)] guarantees a uniform lower-bound for the time-step sizes and subordinates the spatial approximation to the temporal indicators; therefore our assumption is not necessarily restrictive in an adaptive context. Our lower bound is similar to [Picasso(1998)] in at least one respect, namely that the right-hand side of our lower bound includes the temporal jump estimator, since it does not appear possible to show in general that this estimator is locally bounded from above by the L 2 (H 1 )-norm of the error. Furthermore, we show that the constant of the lower bound is robust with respect to the spatial polynomial degree, and is also robust with respect to refinement and coarsening of the meshes, thereby allowing us to remove the so-called transition conditions. We also show that our results imply local-in-space and local-in-time efficiency when considered in the framework of the augmented norms that were proposed in [Akrivis et al.(2009), Makridakis & Nochetto(2006), Schötzau & Wihler(2010)].
Our analysis rests upon the following key ingredients. First, in section 2, we present the inf-sup identity which relates the L 2 (H 1 )-norm of the error to an appropriate dual norm of the residual on test functions in a subspace of L 2 (H 1 ) ∩ H 1 (H −1 ). After setting the notation for the class of finite element methods in section 3, we recall the construction of the equilibrated flux from [Ern et al.(2017b)] in section 4. We state the main results in section 5. In section 6 we use the inf-sup framework to prove the guaranteed upper bounds and the proof of the lower bounds is the subject of section 7. It is based on the combination of two key ideas. The first is to take advantage of the semidiscreteness in time of the test functions appearing in the fundamental efficiency result of [Ern et al.(2017b), Lemma 8.2] in order to gain control over a negative norm on the time derivatives of the test functions; see Lemma 7.2 below. The second idea is to appeal to a specific pointwise-in-time identity for the discontinuous Galerkin time-stepping method, see Lemma 7.3 below. Thus, we employ the definition of the numerical scheme for proving the lower bounds, which is somewhat unusual for a posteriori error analysis.
The combination of these two ideas then yields the lower bounds stated in section 5 under the relaxed hypothesis that h 2 τ only.
Throughout this paper, the notation a b means that a ≤ Cb, with a generic constant C that depends possibly on the shape-regularity of the spatial meshes and the space dimension d, but is otherwise independent of the mesh-size, time-step size, as well as the spatial and temporal polynomial degrees, or on refinement and coarsening between time-steps.
The starting point of the analysis is the weak formulation of problem (1.1) where the time derivative has been cast onto a test function, using integration by parts in time. In particular, the solution space X and test space Y T are defined by (2.1) The spaces X and Y T are equipped with the norms where ·, · denotes the duality pairing between H −1 (Ω) and H 1 0 (Ω). Then, problem (1.1) admits the following weak formulation: find u ∈ X such that (2.4) The well-posedness of (2.4) is well-known and can be shown by Galerkin's method, see for instance the textbook by [Wloka(1987)]. Note that in this weak formulation, the initial condition u(0) = u 0 is expressed as a natural condition, appearing in (2.4), rather than as an essential condition imposed by the choice of solution space.
The following result states an inf-sup stability result for the bilinear form B X . This inf-sup stability result has the interesting and important property of taking the form of an identity, which is advantageous for the sharpness of a posteriori error analysis, and shows that the choice of norms for the spaces X and Y T in (2.2) above are optimal. We refer the reader to [Tantardini & Veeser(2016)] for further results on the inf-sup theory of parabolic problems.
Proof. The arguments in the proof of [Ern et al.(2017b), Theorem 2.1] can be used to show the following inf-sup identity: for any ϕ ∈ Y T , we have This problem can simply be seen as a backward-in-time parabolic problem with final time condition ϕ * (T ) = 0. Hence, we and completes the proof of (2.5).
In order to estimate the error between the solution u of (1.1) and its approximation, we define the residual functional R where v ∈ X and ϕ ∈ Y T , and where the equality follows simply from (2.4). The dual norm of the residual R (2.8) Theorem 2.1 implies the following equivalence between the error and dual norm of the residual: Remark 2.1. Problem (1.1) admits an alternative weak formulation where the test space is X and the trial space is L 2 (0, T ; H 1 0 (Ω)) ∩ H 1 (0, T ; H −1 (Ω)). We refer the reader to [Ern et al.(2017b)] and the references therein for further details. The solution of the problem remains the same for the two weak formulations, although each weak formulation is tied to a different inf-sup condition that relates the norm of the error and of the residual, exactly in the way (2.5) and (2.6) interplay.

3 Finite element approximation
The time interval (0, T ) is partitioned into sub-intervals I n := (t n−1 , t n ), with 1 ≤ n ≤ N , where it is assumed that [0, T ] = N n=1 I n , and that {t n } N n=0 is strictly increasing with t 0 = 0 and t N = T . For each interval I n , we let τ n := t n − t n−1 denote the local time-step size. No special assumptions are made about the relative sizes of the time-steps to each other. A temporal polynomial degree q n ≥ 0 is associated with each time-step I n , and we gather all the polynomial degrees in the vector q = (q n ) N n=1 . For a general vector space V , we shall write Q qn (I n ; V ) to denote the space of V -valued univariate polynomials of degree at most q n over the time-step interval I n .

Meshes
For each 0 ≤ n ≤ N , let T n denote a matching simplicial mesh of the domain Ω, where we assume shape-regularity of the meshes uniformly with respect to n. We consider here only matching simplicial meshes for simplicity, although we indicate that mixed simplicialparallelepipedal meshes, possibly containing hanging nodes, can also be treated: see [Dolejší et al.(2016)] for instance. The mesh T 0 will be used to approximate the initial datum u 0 . For each element K ∈ T n , let h K := diam K denote the diameter of K. We associate a local spatial polynomial degree p K ≥ 1 with each K ∈ T n , and we gather all spatial polynomial degrees in the vector p n = (p K ) K∈T n . In order to keep our notation sufficiently simple, the dependence of the local spatial polynomial degrees p K on the time-step is kept implicit, although we bear in mind that the polynomial degrees may change between time-steps.

Approximation spaces
Given a general matching simplicial mesh T and given a vector of polynomial degrees where P p K (K) denotes the space of polynomials of total degree at most p K on K. To shorten the notation, let V n := V h (T n , p n ) for each 0 ≤ n ≤ N . Let Π h u 0 ∈ V 0 denote an approximation to the initial datum u 0 , a typical choice being the L 2 -orthogonal projection of u 0 onto V 0 . Given the collection of time intervals {I n } N n=1 , the vector q of temporal polynomial degrees, and the hp-finite element spaces {V n } N n=0 , the finite element space V hτ is defined by Functions in V hτ are generally discontinuous with respect to the time-variable at the temporal partition points. We take them to be left-continuous: for all 1 ≤ n ≤ N , we define v hτ (t n ) as the trace at t n of the restriction v hτ | In . Moreover, functions in V hτ also have a well-defined value at t 0 = 0. For all 0 ≤ n < N , we denote the right-limit of v hτ ∈ V hτ at t n by v hτ (t + n ). Then, the temporal jump operators · n are defined by v hτ n :

Refinement and coarsening
Similarly to other works, e.g., [Verfürth(2003), p. 196], we assume that we have at our disposal a common refinement mesh T n of T n−1 and T n for each 1 ≤ n ≤ N , as well as associated polynomial degrees It is assumed that the shape-regularity of T n is equivalent up to uniform constants to those of T n−1 and T n , and that every element K ∈ T n is wholly contained in a single element K ∈ T n−1 and a single element K ∈ T n . We emphasize that we do not require any assumptions on the relative coarsening or refinement between successive spaces V n−1 and V n . In particular, we do not need the transition condition from [Verfürth(2003), p. 196, 201], which requires a uniform bound on the ratio of element sizes between T n and T n . In practice, we expect that most adaptive algorithms will obtain each mesh T n from an initial coarse mesh, with coarsening/refinements between two successive meshes, using a standard algorithm such as newest vertex bisection. We refer the reader to [Nochetto et al.(2009)] for a discussion in this context. To derive our results, we note that we do not need to restrict ourselves to any particular refinement algorithm.

Numerical method
The numerical scheme consists of finding u hτ ∈ V hτ such that u hτ (0) = Π h u 0 , and such that for all test functions v hτ ∈ Q qn (I n ; V n ) and for each time-step interval I n , n = 1, . . . , N .
Here the time derivative ∂ t u hτ is understood as the piecewise time-derivative on each time-step interval I n . The numerical solution u hτ ∈ V hτ can thus be obtained by solving the fully discrete problem (3.4) on each successive time-step. At each time-step, this requires solving a linear system that is symmetric only in the case q n = 0; this can be performed efficiently in practice for arbitrary orders following [Smears(2017)]. Note further that the initial condition u hτ (0) = Π h u 0 does not guarantee that the right-limit u hτ (0 + ) should equal Π h u 0 .

Reconstruction operator
For each time-step interval I n and each nonnegative integer q, let L n q denote the polynomial on I n obtained by mapping the q-th Legendre polynomial under an affine transformation of (−1, 1) to I n . It follows that L n q (t n ) = 1 for all q ≥ 0, and L n q (t n−1 ) = (−1) q , 8 and that the mapped Legendre polynomials {L n q } q≥0 are L 2 -orthogonal on I n , and satisfy In |L n q | 2 dt = τn 2q+1 for all q ≥ 0, see for instance [Schwab(1998), Appendix C]. Following [Makridakis & Nochetto(2006)] (see also [Smears(2017), Remark 2.3]), we introduce the reconstruction operator I defined on V hτ by It is clear that I is a linear operator on V hτ . Furthermore, the definition ensures that Iv hτ | In (t n ) = v hτ (t n ), and that Iv hτ | In (t + n−1 ) = v hτ (t n−1 ) for all 1 ≤ n ≤ N . This implies that Iv hτ is continuous with respect to the temporal variable at the interval partition points {t n } N −1 n=0 and hence Iv hτ ∈ H 1 (0, T ; H 1 0 (Ω)). Furthermore, Iv hτ | In ∈ Q qn+1 I n ; V n for any v hτ ∈ V hτ , where we recall that V n−1 + V n ⊂ V n . It is wellknown from [Ern & Schieweck(2016), Makridakis & Nochetto(2006), Smears(2017)] that we may rewrite the numerical scheme (3.4) as ( 3.6) Note also that Iu hτ (0) = Π h u 0 .

Construction of the equilibrated flux
The a posteriori error estimates presented in this paper are based on a discrete and locally computable H(div)-conforming flux σ hτ that satisfies the key equilibration property where Iu hτ is defined in section 3.5, and f hτ ≈ f is an approximation of the data that is defined in (4.4) below. We call σ hτ an equilibrated flux. The construction of σ hτ given here is exactly the same as in [Ern et al.(2017b)]. This has the practical benefit that a single construction of the equilibrated flux can be used for both a posteriori error estimates in the L 2 (H 1 ) ∩ H 1 (H −1 )-norm and also in the L 2 (H 1 )-norm.

Local mixed finite element spaces
For each 1 ≤ n ≤ N , let V n denote the set of vertices of the mesh T n , where we distinguish the set of interior vertices V n int and the set of boundary vertices V n ext . For each a ∈ V n , let ψ a denote the hat function associated with a, and let ω a denote the interior of the support of ψ a , with associated diameter h ωa . Furthermore, let T a denote the restriction of the mesh T n to ω a . Recalling that the common refinement spaces V n were obtained with a vector of polynomial degrees p n = (p K ) K∈ T n , we associate with each a ∈ V n the fixed polynomial degree For a polynomial degree p ≥ 0, let the piecewise polynomial (discontinuous) spaces P p ( T a ) and RT N p ( T a ) be defined by where RT N p ( K) := P p ( K) + P p ( K)x denotes the Raviart-Thomas-Nédélec space of order p on the simplex K. It is important to notice that whereas the patch ω a is subordinate to the elements of the mesh T n around the vertex a ∈ V n , the spaces P p ( T a ) and RT N p ( T a ) are subordinate to the submesh elements in T a ; of course, in the absence of coarsening, this distinction vanishes. We now introduce the local spatial mixed finite element space V a h , defined by We then define the space-time mixed finite element space where we recall that Q qn (I n ; V a h ) denotes the space of V a h -valued univariate polynomials of degree at most q n over the time-step interval I n .

Data approximation
Our a posteriori error estimates given in section 5 involve certain approximations of the source term f appearing in (1.1). It is helpful to define these approximations here. For each 1 ≤ n ≤ N and for each a ∈ V n , let Π a,n hτ be the L 2 ψa -orthogonal projection from L 2 (I n ; L 2 ψa (ω a )) onto Q qn (I n ; P pa−1 ( T a )), where L 2 ψa (ω a ) is the space of measurable functions v on ω a such that ωa ψ a |v| 2 dx < ∞. In other words, the projection operator Π a,n hτ is defined by In (ψ a Π a,n hτ v, q hτ ) ωa dt = In (ψ a v, q hτ ) ωa dt for all q hτ ∈ Q qn (I n ; P pa−1 ( T a )). We adopt the convention that Π a,n hτ v is extended by zero from ω a × I n to Ω × (0, T ) for all v ∈ L 2 (I n ; L 2 ψa (ω a )). Then, we define f hτ by See [Ern et al.(2017b)] for further remarks concerning the approximation properties of f hτ . In particular, it is shown there that f hτ is a data approximation that is at least of the same order as the one used in the numerical scheme (3.4).
For each time-step interval I n and for each vertex a ∈ V, let the space V a,n hτ be defined by (4.3). Let g a,n hτ and τ a,n hτ be defined by (4.5). Let σ a,n hτ ∈ V a,n hτ be defined by Then, after extending σ a,n hτ by zero from ω a × I n to Ω × (0, T ) for each a ∈ V and for each 1 ≤ n ≤ N , we define σ hτ := N n=1 a∈V n σ a,n hτ . (4.8) Note that σ a,n hτ ∈ V a,n hτ is well-defined for all a ∈ V n : in particular, for interior vertices a ∈ V n int , we use (4.6) to guarantee the compatibility of the datum g a,n hτ with the constraint ∇·σ a,n hτ = g a,n hτ . The following key result is quoted from [Ern et al.(2017b)].
Moreover, for the purpose of implementation, it is known that on each patch of the mesh and at each time-step, the solution of the minimization problem (4.7) decouples into q n +1 independent spatial mixed finite element linear systems, which helps to reduce the cost of computing the flux σ hτ .

Main results
We introduce the following a posteriori error estimators and data oscillation terms: where, K ∈ T n , 1 ≤ n ≤ N , the equilibrated flux σ hτ is prescribed in Definition 4.1, and where the data approximation f hτ is defined in section 4.2. The total estimator for the error is defined by The flux estimator η n F,K and the temporal jump estimator η n J,K are the two main estimators. In particular, the flux estimator η n F,K measures the lack of H(div)-conformity of ∇u hτ , and the temporal jump estimator η n J,K measures the lack of temporal conformity of u hτ . Indeed, η n J,K is related to the jump u hτ n−1 , since it was shown in [Schötzau & Wihler(2010), Ern et al.(2017b)] that η n J,K can be equivalently rewritten as η n J,K = τn(qn+1) Given that η n F,K and η n J,K respectively measure the lack of spatial and temporal conformity of the approximate solution, it is common in the literature to call η n F,K the spatial estimator and η n J,K the temporal estimator. However, such terminology must not be interpreted as stating that these estimators bound the errors due respectively to the spatial and temporal discretization. In practice, these estimators can be computed by quadrature on the common refinement mesh T n . Note that it is possible to split η n J,K into further components, for instance to quantify the effect of coarsening, although this is not strictly necessary for the purposes of the upper and lower bounds on the error to be given below, which is why η n J,K is given in its current form.
Theorem 5.1 (X-norm a posteriori error estimate). Let u ∈ X be the weak solution of (1.1), and let u hτ ∈ V hτ denote the solution of the numerical scheme (3.4). Let η X be defined by (5.2). Then, we have the following X-norm a posteriori error estimate: If K ∈ T n , 1 ≤ n ≤ N , is an element such that h 2 ωa ≤ γ a τ n for each a ∈ V K , with V K the set of vertices of the element K, with some constant γ a > 0, where h ωa denotes the diameter of the patch ω a , then we have the local lower bound for the flux estimator η n where the local data ocillation η a,n osc is defined by Furthermore, under the hypothesis that there exists γ > 0 such that h 2 ωa ≤ γ τ n for every a ∈ V n and every 1 ≤ n ≤ N , then we have the global lower bound The constants C γa,qn in (5.5) and C γ,qn in (5.7) satisfy C γ,qn (q n + 1) 1 2 + γ(q n + 1) 5 2 , and may depend on the shape regularity of T n and T n and on the dimension d, but otherwise do not depend on the mesh-size, time-step size, spatial polynomial degrees, or on refinement and coarsening between time-steps.
The proof of Theorem 5.1 is given in several stages throughout the following sections. In the first stage, we give the proof of the upper bound (5.4) immediately after the helpful data oscillation estimate of Lemma 6.2 below in section 6. In the second stage, we show the lower bounds (5.5) and (5.7) in section 7.
Remark 5.1 (Bounds for the jump estimator). In the local lower bound (5.5), we have In ∇(u hτ − Iu hτ ) 2 ωa dt = K⊂ωa [η n J,K ] 2 , see also (5.3), where the sum is over all elements K of T n contained in ω a . Similarly, in the global lower bound (5.7), the term u hτ − Iu hτ 2 X = N n=1 K∈T n [η n J,K ] 2 appears. Thus our result here is comparable to those in [Picasso(1998)] where the jump estimator also appears on the right-hand side of the local lower bounds. The reason for the appearance of this term can be essentially traced back to the lack of Galerkin orthogonality for the temporal reconstruction Iu hτ , see (3.6). Though a priori error analysis shows that u hτ − Iu hτ X converges with the same order as u − u hτ X if u is assumed to have some smoothness, a difficulty is that if the jump estimator u hτ − Iu hτ X becomes too large compared to u − u hτ X , then the meaning of the lower bound (5.7) becomes less clear, and similarly for (5.5). However, we note that [Ern et al.(2017b), Theorem 5.1] have shown that the (time-local but space-global) jump estimators are bounded from above by the (time-local spaceglobal) L 2 (H 1 ) ∩ H 1 (H −1 )-norm of the error in Iu hτ , i.e., In ∇(u hτ − Iu hτ ) 2 dt ≤ 8 In ∇(u − Iu hτ ) 2 + ∂ t (u − Iu hτ ) 2 H −1 (Ω) dt, up to possible data oscillation.
Remark 5.2 (Comparison with L 2 (H 1 ) ∩ H 1 (H −1 )-norm estimators). As pointed out by the remark in [Verfürth(2003), p. 198, Par. (5)] concerning the equivalence of residualbased estimators for both L 2 (H 1 ) and L 2 (H 1 ) ∩ H 1 (H −1 ) norms, it is important to observe that in the absence of data oscillation, the estimator η X defined above in (5.2) is equivalent (up to the factor 3+ [Ern et al.(2017b), Eq. (5.10b)]. However, an important difference between these estimators concerns the data oscillation. Indeed, it is known since [Verfürth(2003)] that L 2 (H 1 ) ∩ H 1 (H −1 ) estimators generally contain a data oscillation term that can be of same temporal order as the error. By comparison, the data oscillation term (5.1c) features an additional half-order with respect to the time-step size. Therefore we expect that the X-norm estimator given above may be of special use in situations with significant data oscillation in time.
Theorem 5.1 is our main result on a posteriori error estimation of u − u hτ X . Due to the challenge of bounding u hτ −Iu hτ X , several authors have also considered various augmented norms and error measures. We refer in particular to [Akrivis et al.(2009), Makridakis & Nochetto(2006), Schötzau & Wihler(2010)] where the norm of the error includes simultaneously norms for u − u hτ and u − Iu hτ . This can be motivated by a priori error analysis, where it can be shown that u hτ − Iu hτ X converges with same order as u − u hτ X when u is sufficiently smooth. For instance, we can define the error measure (5.8) in an analoguous manner to the norms in, for instance, [Makridakis & Nochetto(2006), eq. (34)] and [Schötzau & Wihler(2010), eq. (28)] without the L ∞ (L 2 )-norm terms. The choice in (5.8) is only one of many possibilities; for instance we could equally well consider u−u hτ X + u hτ −Iu hτ X . The interest of this approach is that the bounds (5.4), (5.5) and (5.7) immediately yield a global upper bound and local-in-time and local-in-space efficiency with respect to this error measure, see Corollary 5.2 below. However, it is important to note that it does not appear possible to show in general an equivalence between E X and u − u hτ X , see Remark 5.1.
Corollary 5.2. Let E X be defined by (5.8). Then, we have the guaranteed upper bound E X ≤ 2 η X , (5.9) If K ∈ T n , 1 ≤ n ≤ N , is an element such that h 2 ωa ≤ γ a τ n for each a ∈ V K with some constant γ a > 0, where h ωa denotes the diameter of the patch ω a , then we have the local efficiency bound where the local error measures E a,n X , a ∈ V n , are defined by Furthermore, under the hypothesis that there exists γ > 0 such that h 2 ωa ≤ γ τ n for every a ∈ V n and every 1 ≤ n ≤ N , then we have the global efficiency bound (5.12) 6 Proof of the guaranteed upper bound (5.4) We will make use of the following preparatory lemmas.
Proof of the upper bound (5.4). Recall from (2.9) on the equivalence of norms and residuals that u − u hτ X = R X (u hτ ) [Y T ] , so we turn our attention to bounding R X (u hτ ), ϕ for an arbitrary test function ϕ ∈ Y T . By adding and subtracting T 0 (∂ t Iu hτ + ∇ · σ hτ , ϕ) dt and recalling the flux equilibration identity (4.1), we get where we have used integration by parts with respect to time for the time derivative ∂ t Iu hτ , noting that Iu hτ (0) = Π h u 0 and that ϕ(T ) = 0, and also where we have used integration by parts over Ω for the flux σ hτ ∈ L 2 (0, T ; H(div, Ω)). Employing the shorthand notation ϕ 2 Y (In) := In ∂ t ϕ 2 H −1 (Ω) + ∇ϕ 2 dt, we then use Lemma 6.2 and the Cauchy-Schwarz inequality to bound In (6.4) We then combine (6.3) and (6.4) with the Cauchy-Schwarz inequality to find that R X (u hτ ), ϕ ≤ η X ϕ Y T ; since ϕ ∈ Y T was arbitrary, we obtain u − u hτ X ≤ η X as a result of (2.9), thereby completing the proof of (5.4).
7 Proof of the bounds (5.5) and (5.7) We start by observing that σ hτ | K×In = a∈V K σ a,n hτ | K×In , and thus In [η n F,K ] 2 dt = In a∈V K (σ a,n hτ +ψ a ∇u hτ ) 2 K dt ≤ |V K | a∈V K In σ a,n hτ +ψ a ∇u hτ 2 K dt, (7.1) where we recall that V K stands for the vertices of the element K and |V K | stands for its cardinality. We shall now bound the right-hand side of (7.1). For each 1 ≤ n ≤ N and each a ∈ V n , we introduce the patch residual functional R a,n hτ : L 2 (I n , H 1 0 (ω a )) → R defined by R a,n hτ , ϕ = In Π a,n hτ f − ∂ t Iu hτ , ϕ ωa − ∇u hτ , ∇ϕ ωa dt ∀ ϕ ∈ L 2 (I n ; H 1 0 (ω a )).
( 7.2) The essential result that forms the starting point for our analysis is the following abstract efficiency result first shown in [Ern et al.(2017b), Lemma 8.2], which is an application of a more general underlying key result in [Ern et al(2017a), Theorem 1.2] concerning the existence of polynomial-degree robust liftings of piecewise polynomial data into discrete subspaces of H(div), which itself is based on the fundamental results of [Costabel & McIntosh(2010), Braess et al.(2009)].
Lemma 7.1 (Space-time stability bound). Let σ a,n hτ denote the patch-wise flux reconstructions of Definition 4.1, and let R a,n hτ denote the local patch residual defined by ( where Q qn (I n ; H 1 0 (ω a )) denotes the space of H 1 0 (ω a )-valued univariate polynomials of degree at most q n on I n . In particular, the constant in (7.3) does not depend on the mesh-size, time-step size, spatial and temporal polynomial degrees, or on refinement and coarsening between time-steps.
As explained above in the introduction, our analysis of the efficiency of the equilibrated flux estimator η n F,K relies on two original ideas. We now detail the first one, which is based on the key observation that the set of test functions appearing in (7.3) are polynomials with respect to the time variable. Hence, in order to obtain estimates on the efficiency of the estimators with respect to the X-norm of the error, we shall show that the set of test functions appearing in (7.3) can be restricted to functions vanishing at the end-points of the time interval and thereby lying in the test space Y T through a bubble-in-time argument, provided that h 2 ωa τ n . We start by defining the space H 1 † (ω a ) through where the supremum is taken among all test functions v ∈ H 1 † (ω a ) such that ∇v ωa = 1. The motivation for working with the space H 1 † (ω a ) is that the ψ a -weighted mean value of the function u − Iu hτ possesses special properties derived from the numerical scheme; in particular, see Lemma 7.3 and the discussion surrounding (7.13) below.
Lemma 7.2 (Stability with test functions vanishing at both endpoints of I n ). Let a ∈ V n , 1 ≤ n ≤ N , and suppose that there exists a constant γ a > 0 such that the patch diameter h ωa and τ n satisfy h 2 ωa ≤ γ a τ n . Then, In σ a,n hτ + ψ a ∇u hτ 2 ωa dt 1 2 ≤ C γa,qn sup ϕ∈Q qn+2 (In;H 1 0 (ωa)) ∩H 1 0 (In;H 1 0 (ωa)) R a,n hτ , ϕ where H 1 0 (I n ; H 1 0 (ω a )) denotes the space of functions in H 1 (I n ; H 1 0 (ω a )) that vanish at both endpoints t n−1 and t n of the time interval I n . In particular, the constant C γa,qn in (7.5) satisfies C γa,qn (q n +1) 1 2 +γ a (q n +1) 5 2 , and may depend on the shape regularity of T n and T n and on the space dimension d, but otherwise does not depend on the meshsize, time-step size, spatial polynomial degrees, or on refinement and coarsening between time-steps.
Proof. The starting point for the proof is Lemma 7.1. Keeping in mind the righthand side of (7.3), for each ϕ ∈ Q qn (I n ; H 1 0 (ω a )), we shall construct a new function ϕ * ∈ Q qn+2 (I n ; H 1 0 (ω a )) defined by It follows from the fact that L n q (t n−1 ) = (−1) q and that L n q (t n ) = 1 for all q ≥ 0 that ϕ * (t + n−1 ) = ϕ * (t n ) = 0 and hence ϕ * ∈ H 1 0 (I n ; H 1 0 (ω a )). Recalling that the functions Π a,n hτ f , ∂ t Iu hτ , and ∇u hτ appearing in (7.2) are polynomials of degree at most q n in time, it also follows from the orthogonality of the Legendre polynomials that R a,n hτ , ϕ * = R a,n hτ , ϕ .
It is then seen that we shall obtain (7.5) as a result of (7.3) provided that we can bound In ∂ t ϕ * where the constant is independent of all other quantities since ϕ ∈ Q qn (I n ; H 1 0 (ω a )) is discrete with respect to time. Note in particular that the inverse inequality is valid even though Q qn (I n ; H 1 0 (ω a )) is itself an infinite dimensional space, see Remark 7.1 below. Therefore, we find from (7.6) and (7.7) that In ∇ϕ * 2 ωa dt (q n + 1) In ∇ϕ 2 ωa dt. (7.8) To bound In ∂ t ϕ * 2 [H 1 † (ωa)] dt, we recall that ϕ * (t) ∈ H 1 0 (ω a ) for all t ∈ I n , and therefore satisfies the Poincaré inequality ϕ * (t) ωa h ωa ∇ϕ * (t) ωa for all t ∈ I n . Furthermore, we also have a similar Poincaré inequality for all test functions v ∈ H 1 † (ω a ). Therefore, we find that ϕ * (t) [H 1 † (ωa)] h 2 ωa ∇ϕ * (t) ωa , for all t ∈ I n . Thus, we obtain, using an inverse inequality in time (see Remark 7.1 below for details), In ∂ t ϕ * where the constant C γa,qn (q n + 1) 1 2 + γ a (q n + 1) 5 2 . The bound (7.5) then follows from (7.9) and the identity R a,n hτ , ϕ = R a,n hτ , ϕ * given above.
Remark 7.1 (Inverse inequality). The proof of the inverse inequalities appearing above in (7.7) can be found simply by expanding the function ϕ in any orthogonal basis {ψ k } ∞ k=1 of H 1 0 (ω a ) as ϕ(t) = ∞ k=1 c k (t)ψ k , where the coefficient functions c k are real-valued polynomials of degree at most q n , for all k ≥ 1, and then by applying coefficient-wise well known inverse inequalities for real-valued functions, see [Schwab(1998), p. 148].
Lemma 7.2 constitutes the first step towards the local lower bound (5.5). In particular, we see that the test functions in (7.5) are bounded in the H 1 (I n ; [H 1 † (ω a )] ) norm. In order to exploit this property, we use a second key idea for our analysis, which is to employ the following special property of the time-discretization scheme. Together, these two ingredients allow us to obtain the lower bounds assuming only that h 2 τ , rather than the stronger requirements used in [Picasso(1998), ].
Proof. Since a ∈ V n int , it follows that φψ a ∈ Q qn (I n ; V n ) for any polynomial φ in time of degree at most q n over I n . Therefore, the numerical scheme (3.6) implies that, for any real-valued polynomial φ in time of degree at most q n , In φ [(f, ψ a ) − (∂ t Iu hτ , ψ a ) − (∇u hτ , ∇ψ a )] dt = 0.
Furthermore, the definition of Π a,n hτ implies that In φ(f, ψ a )dt = In φ(Π a,n hτ f, ψ a )dt for any real-valued polynomial φ in time of degree at most q n . Since the function t → (∂ t Iu hτ (t), ψ a ) + (∇u hτ (t), ∇ψ a ) − (Π a,n hτ f (t), ψ a ) is a real-valued polynomial of degree at most q n over I n , it follows that it vanishes everywhere in I n . We therefore obtain (7.10).
We now give the proof of the bounds (5.5) and (5.7) under the hypothesis stated in Theorem 5.1.