A linear implicit Euler method for the finite element discretization of a controlled stochastic heat equation

We consider a numerical approximation of a linear quadratic control problem constrained by the stochastic heat equation with non-homogeneous Neumann boundary conditions. This involves a combination of distributed and boundary control, as well as both distributed and boundary noise. We apply the finite element method for the spatial discretization and the linear implicit Euler method for the temporal discretization. Due to the low regularity induced by the boundary noise, convergence orders above 1/2 in space and 1/4 in time cannot be expected. We prove such optimal convergence orders for our full discretization when the distributed noise and the initial condition are sufficiently smooth. Under less smooth conditions, the convergence order is further decreased. Our results only assume that the related (deterministic) differential Riccati equation can be approximated with a certain convergence order, which is easy to achieve in practice. We confirm these theoretical results through a numerical experiment in a two dimensional domain.


Introduction
This paper is devoted to a numerical scheme for a linear quadratic control problem constrained by the stochastic heat equation with non-homogeneous Neumann boundary conditions. We prove optimal convergence orders for a full discretization which combines a linear implicit Euler method in time and a finite element discretization in space.
For time-dependent heat distributions considered in a bounded domain, noise terms in the sense of random heating or cooling phenomenas arise due to imperfect insulation and other uncertain environmental effects. In engineering applications, this might lead to undesired behavior. To keep a desired heat profile, it is therefore necessary to regulate the system. This task can be formulated as a linear quadratic control problem constrained by the stochastic heat equation, where controls and additive noise terms are located inside the domain as well as on the boundary. Here, we treat the case of noise terms defined by Q-Wiener processes. In stochastic control theory, it is well known that the concept of mild solutions is useful to include non-homogeneous boundary conditions, see [7,9,10,14,34]. In this context, we also refer to related deterministic control problems, see [3,4] and the references therein. Typically, optimal controls as solutions of stochastic linear quadratic control problems are characterized by a feedback law, see [1,2,6,8,16]. These feedback laws often involve the solution to a suitable operator-valued differential Riccati equation. In this paper, the Riccati equation is deterministic resulting from the fact that only additive noise terms are included. As a consequence, the optimal heat distribution fulfills a system of a linear stochastic partial differential equation (SPDE), referred to as the controlled stochastic heat equation, which is coupled to the operator-valued differential Riccati equation. The main obstacle is that both the controlled stochastic heat equation as well as the Riccati equation can not be solved explicitly. For that reason, we analyze a numerical approximation of the system describing the optimal heat distribution.
For the spatial discretization, we use the finite element method as introduced in [29], where only parabolic equations with homogeneous boundary conditions are considered. The case of non-homogeneous boundary conditions is studied in, e.g., [19]. Here, we need a generalization of this theory since the system includes Q-Wiener processes. Numerical simulations for Q-Wiener processes with values in Hilbert spaces as well as for some specific SPDEs are demonstrated in [23].
Temporal discretization of SPDEs has become an active research area within the last years. Equations driven by additive noise terms are considered in [31], and [17,22,28] also consider the case of multiplicative noise terms. These papers have in common that the linear implicit Euler method is used for the temporal discretization. This is essentially the usual implicit Euler method but with the noise terms treated explicitly, since treating them implicitly makes no sense. The stochastic part of the equation is therefore treated explicitly, and the deterministic part implicitly. We follow the same approach in this paper. The error analyses are mainly based on the fact that the underlying equation involves a closed operator generating an analytic semigroup, such that fractional powers of this closed operator are well defined.
The shortcoming of the papers mentioned above is that they only consider equations with homogeneous boundary conditions We will extend these results by including instead non-homogeneous Neumann boundary conditions. Because this leads to a less regular solution, the convergence order is decreased. However, the theory of fractional powers to closed operators can still be applied, and we use this to prove optimal convergence orders under the assumption that the associated Riccati equation can be well approximated. We make such an assumption mainly because there is a lack of temporal error analyses applicable to the current situation, and providing such a proof is out of the scope of this paper. We refer to [20] for related results on deterministic linear quadratic control problems and their corresponding Riccati equations.
In order to illustrate our theoretical results, we implement our method in MATLAB and perform a numerical experiment which shows the expected convergence orders. We also confirm that achieving the assumed convergence orders for the approximation of the Riccati equation is straightforward in practice.
The paper is organized as follows. In Section 2, we introduce the linear quadratic control problem constrained by the stochastic heat equation. We state the optimal controls and derive the resulting system describing the optimal heat distribution. Section 3 is devoted to the numerical scheme of the controlled stochastic heat equation and the Riccati equation. We also state the main result concerning the convergence order. In order to prepare for the proof of this theorem, we derive several auxiliary results on continuity, consistency and stability in Section 4. The proof of the main result then follows in Section 5. Finally, in Section 6, we discuss the implementation and illustrate the theoretical results through a numerical experiment.

A linear quadratic control problem constrained by the stochastic heat equation
Throughout this paper, let (Ω, F , P) be a complete probability space endowed with a filtration (F t ) t≥0 satisfying F t = s>t F s for all t ≥ 0 and F 0 contains all sets of F with P-measure 0. We use E to denote the expectation with respect to this probability space. Moreover, we assume that D ⊂ R n for n ≥ 1 is either a bounded domain with sufficiently smooth boundary ∂D or a bounded and convex domain. First, we introduce some basic notations and we state properties of operators frequently used in the remaining part. For s ≥ 0, let H s (D) denote the usual Sobolev space. We set H = L 2 (D) and let I denote the identity operator on H. The characterization of the domain results from existence and uniqueness results of the corresponding elliptic problem, see [13]. The operator A is the infinitesimal generator of an analytic semigroup e At t≥0 of contractions such that for λ > 0, fractional powers of λ − A denoted by (λ − A) α with α ∈ R are well defined. For more details in a more general framework, we refer to [25,30], but we have also collected the main properties which we need in Section 4.
For α ∈ R, the space D((λ − A) α ) equipped with the inner product becomes a Hilbert space. The corresponding norm is denoted by · α . In general, the domain of (λ − A) α for α ∈ (0, 1) can be expressed explicitly by interpolation of the spaces H and D(A), see [21]. In case that D is bounded with sufficiently smooth boundary, we have where we refer to [12]. We set H b = L 2 (∂D) and introduce the Neumann operator N : where λ > 0. If D is bounded with sufficiently smooth boundary, the result N ∈ L H b ; H 3/2 (D) was proven in [21]. In this case, we can therefore conclude that N ∈ L(H b ; D((λ − A) α )) for α ∈ (0, 3/4), which means that the operator (λ − A) α N is linear and bounded by the closed graph theorem. If D is instead bounded and convex, then D has a Lipschitz boundary and satisfies the cone property, see [13]. We therefore again obtain N ∈ L(H b ; D((λ − A) α )) for α ∈ (0, 3/4), see [18]. Next, we introduce the controlled stochastic heat equation with non-homogeneous Neumann boundary conditions as an evolution equation. Here, we include distributed and boundary controls as well as distributed and boundary noise. Let U contain all F t -adapted processes (u(t)) t∈[0,T ] with values in an arbitrary Hilbert spaceŪ satis- We consider the following controlled system in H for t ∈ [0, T ] and λ > 0: where (u(t)) t∈[0,T ] and (v(t)) t∈[0,T ] represent the distributed and the boundary controls. We assume that u ∈ U , B ∈ L(Ū ; H), and v ∈ V . The processes (W (t)) t≥0 and (W b (t)) t≥0 are independent and F t -adapted Q-Wiener processes with values in H and H b , respectively. The corresponding covariance operators are denoted by Q ∈ L(H) and Q b ∈ L(H b ). We make the following assumptions: The results shown in this section also holds for an F 0 -measurable initial value ξ ∈ L 2 (Ω; H). We make the additional regularity requirement due to the main result stated in the following section.

Definition 1.
A predictable process (y(t)) t∈[0,T ] with values in H is a mild solution of system (2) if sup and for all t ∈ [0, T ] and P-a.s.
For an existence and uniqueness result of a mild solution to system (2), we refer to [2]. Next, we introduce the cost functional J : where C ∈ L(H; Z) represents an observation operator mapping H into an arbitrary Hilbert space Z. The operators R ∈ L(H) and R b ∈ L(H b ) are given scaling factors for the costs of the controls and are assumed to be invertible. The aim is to find controls u ∈ U and v ∈ V such that The controls u ∈ U and v ∈ V are called optimal controls. In [1,2,6,8,9,16], similar control problems are considered with the result that the optimal controls satisfy a feedback law. We follow the same approach here, and therefore introduce the following Riccati equation in L(H): where H(t) = (λ − A) 1−α P(t), G = (λ − A) α N with α ∈ (1/2, 3/4). We make the following definition, where Σ(H) denotes the space of all symmetric operators on H and C([0, T ]; Σ(H)) is endowed with the topology of uniform convergence: Definition 2. The process (P(t)) t∈[0,T ] is a mild solution of (3) if • P(t)y ∈ D((λ − A) 1−α ) for every y ∈ H and all t ∈ [0, T ), • lim t→0 t 1−α (λ − A) 1−α P(t)y = 0 for every y ∈ H, and for all t ∈ [0, T ] and every y ∈ H In [4, Part IV], existence and uniqueness results of a mild solution to system (3) are shown for some special cases. The ideas of these proofs are easily adapted to the current situation, and therefore there exists a unique mild solution of system (3).
The optimal controls u ∈ U and v ∈ V satisfy a.e. on [0, T ] and P-a.s.
Plugging in these formulas in (2) results in the following controlled system in H:

A linear implicit Euler method for the finite element discretization
In this section, we introduce a fully discrete scheme for system (5). We denote by T h a triangulation of the domain D with meshwidth h ∈ (0, 1]. Let Y h ⊂ Y = D((λ − A) 1/2 ) be the set of continuous functions that are piecewise linear over T h . We introduce the for every y ∈ H and every z ∈ Y h . Then we have the following basic estimate for a constant K > 0 and every y ∈ D((λ − A) ρ/2 ) with ρ ∈ [0, 2], see [29]. Moreover, let R h : Y → Y h be the Y -projection given by for every y ∈ Y and every z ∈ Y h . We have the following relation between the L 2projection P h and the Y -projection R h : for every y ∈ D(A), see [22,Lemma 3.1]. We consider the following semi-discrete version of system (5) in Y h : As a consequence of inequality (6), we get for a constant K > 0 and every where C h = CP h . By definition, the operator A h is again the infinitesimal generator of an analytic semigroup (e A h t ) t≥0 on Y h such that fractional powers of λ − A h with λ > 0 are well defined. We can therefore introduce the solutions of system (8) and (10) in a mild sense analogously to Definitions 1 and 2. We note that the mild solution of system (10) coincides again with the weak solution according to Remark 2.
Next, let t 0 , t 1 , ..., t M be a partition of the time interval [0, T ] such that 0 = t 0 < t 1 < ... < T M = T . We assume that t m − t m−1 = ∆t for each m = 1, ..., M with ∆t ∈ (0, 1]. Applying a linear implicit Euler method to system (8) gives us the following fully discrete system in Y h for m = 1, ..., M : where . The operator P m h ∈ L(Y h ) results from a time discretization of system (10). We make the following assumption.
where c > 0 is a constant.
Remark 3. Note that we can at least write formally . Hence, the Assumption 3 provides especially the convergence rate for For some convergence results, we refer to [20]. Here, we will verify the convergence rates by a numerical experiment in Section 6.
We are now in a position to state the main result of the paper: be the mild solution of system (5) and let y m h satisfy the fully discrete system (11) for m = 0, 1, ..., M − 1. If Assumptions 1-3 are fulfilled, then there exists a constant c > 0 such that for sufficiently small ε > 0, The proof of this theorem will be provided in Section 5. In order to prepare, we will first collect and derive a number of lemmata in the following section.

Auxiliary results
We start by collecting some well-known properties of fractional powers of operators. For a proof, see e.g. [25,30]. (i) for α ≤ 0, the operator (λ−A) α is linear and bounded and for α > 0, the operator (λ − A) α is linear and closed; (vi) the operator (λ − A) α e At is linear and bounded for α > 0 and t > 0. Moreover, we have for every y ∈ H

Continuity of mild solutions to the controlled system
Next, we show some useful properties of the exact mild solution y to the controlled system (5). In the following, we use c > 0 as a generic constant, which may take different values at different points.
Proof. By definition, we get where , and .

Continuity of mild solutions to the Riccati equation
We also need similar continuity properties of the mild solution P to the Riccati equation (3) and the transformed version H = (λ − A) 1−α P. In the following, we use c > 0 as a generic constant that may changes from time to time.
Proof. Let y ∈ H. We set for all t ∈ [0, T ] Note that the operators J (t) and K(t) are linear and bounded. By definition, we get P(τ 2 )y − P(τ 1 )y H ≤ I 1 + I 2 , where I 1 = Recall that the operators H(t), G, R −1 b and C are linear and bounded for all t ∈ [0, T ]. Similarly to the above, we obtain Substituting the inequalities (23) and (24) in (22) yields the result.
Lemma 5. Let (H(t)) t∈[0,T ] be given by for α ∈ (1/2, 3/4), where (P(t)) t∈[0,T ] is the mild solution of system (3). Then there exists a constant c > 0 such that for all τ 1 , τ 2 ∈ [0, T ) with τ 1 < τ 2 and all γ ∈ (0, α), Proof. Let y ∈ H. We set for all t ∈ [0, T ] Note that the operators J (t) and K(t) are linear and bounded. By definition, we get where Recall that the semigroup (e At ) t≥0 is a contraction and that the operators P(t), B and R −1 are linear and bounded for all t ∈ [0, T ]. Lemma 1 (i) and (iv) -(vii) show that Since the operators H(t), G, R −1 b and C are linear and bounded for all t ∈ [0, T ], a very similar argument leads to Substituting the inequalities (26) and (27) in (25) yields the result.

Discretized solution operators
Finally, we collect some results that compare the spatially discretized solution operator e A h t to the exact solution operator e At , and to the fully discretized time stepping operator S h,∆t .

Proof of Theorem 1
After all the preparation in the previous section, we can now prove the main result.
Proof of Theorem 1. The mild solution of system (5) can be rewritten P-a.s.
By a change of variables, we obtain .
By a change of variables, we get .

Numerical experiments
In order to illustrate the proposed method and the bounds given in Theorem 1 we have implemented the algorithm in MATLAB 1 and performed a number of numerical experiments on a two-dimensional linear quadratic control problem with noise. We ran all the experiments on one node of the Mechthild computing cluster at the Max Planck Institute Magdeburg. Such a node consists of two Intel Xeon Skylake Silver 4110 processors with 8 cores/CPU, a clockrate of 2.1 GHz and 384 GB RAM.

Implementation
Let {φ h k } N h k=1 be the standard finite element basis of Y h , consisting of the piecewise linear so-called hat-functions. These take the value 1 at the k-th node of T h and are zero at all other nodes. Then for y h ∈ Y h we have y h = N h k=1 y k φ h k for some coefficients {y k } N h k=1 . Similarly, let the distributed noise P h G δW m with G = I and the boundary noise B b h δW m b be represented by the coefficient vectors δW m and δW m b , respectively. Using these representations in (11) and testing with φ h j shows that (11) is equivalent to for j, k = 1, . . . , N h . To simplify this, we introduce the mass matrix M , the stiffness matrix A, the distributed and boundary input matrices B and B b , the output matrix C and the weighting matrices R, R b and Q, satisfying Here, {φ U i }, {φ V i } and {φ Z i } denote orthonormal bases for the input and output spaces U ,V and Z, respectively. We omit the dependency on h to reduce notational clutter.
The matrices given above were all generated using the FreeFEM++ library 2 , see [15], and then imported to MATLAB. With these at hand, we can first rewrite (10) as the matrix-valued equation where P denotes the matrix representation of P h satisfying see, e.g., [24]. Further denote the coefficients at time t m by P m . Then we can rewrite (58) as an equation for the coefficients y m k as where S h,∆t = (M − ∆tA) −1 . Note the similarity to (11), with M taking on the role of the identity operator. Since (59) is matrix-valued, numerically approximating its solution for reasonably fine spatial discretizations is unfeasible unless we can utilize features such as lowrank structure. For this reason, we assume that the input operators are of the form R nu ∋ u → u 1 Ψ 1 + · · · + u nu Ψ nu with Ψ j ∈ H and n u ≪ N h . Similarly, we assume that the output operator C : H → R nz with n z ≪ N h . This means that B, B b and C are rectangular matrices, which typically leads to a solution P of low numerical rank. See e.g. [27] for supporting theory in the operator-valued setting. In order to approximate the solution, we apply the Strang splitting scheme [26] available in the MATLAB package DREsplit 3 . We note that Strang splitting is a second-order method, which means that we get a more accurate approximation in time than what we need according to Assumption 3. It is, however, essentially as cheap to apply as the corresponding first-order scheme, which is why we use it.
Generating the noise can be done in many ways. Since we only consider rectangular domains in our experiments, we compute samples of the distributed noise using FFT techniques as outlined in [23,Chapter 10]. In particular, we assume that the eigenvalues λ j,k and corresponding eigenvectors ϕ j,k of the covariance operator Q are given by λ j,k = (j 2 + k 2 ) −β−ǫ and ϕ j,k (x 1 , x 2 ) = cos(jπx 1 ) cos(kπx 2 ) with β = 1 and ǫ = 10 −4 . Then the increments δW m = W (t m ) − W (t m−1 ) are given by where ξ n j are the i.i.d. increments of N(0,1) Gaussian distribution [23]. This leads to noise satisfying Assumption 2. We note that the sum should actually go to infinity, and the truncation to (N + 1) 2 terms represents a discretization. We use N = N h in our experiments, which means that the truncation does not affect the convergence order [32].
A similar procedure could conceivably be followed for the boundary noise. However, we found it simpler to express the one-dimensional noise δW b,m on each of the edges as √ ∆t N k=0 λ k cos(kπx) with x ∈ [0, 1] and λ k = k −β−ǫ . Then the map N can be explicitly constructed by using the observation that the function satisfies d dx1 ρ = 0 at x 1 = 0 and x 1 = 1, d dx2 ρ = 0 at x 2 = 1 and d dx2 ρ = cos(kπx 1 ) at x 2 = 0. Further, it satisfies λρ = ∆ρ in the interior of the domain. The constructions for the other parts of the boundary are similar. Summing up the four parts gives then the solution of (1). We then computed the Ritz projections of these functions in FreeFem++ by solving R h ρ, φ Y = ρ, φ Y for φ ∈ Y h . Finally, the resulting coefficient vectors were multiplied with λM − A.
The latter construction was also used for the boundary input operator, by computing N applied to the constant function 1 on the boundary. This requires no further calculations, since it corresponds to the first eigenvector.

Test problem
For simplicity, we consider the problem on the unit square D = [0, 1] 2 . We let the distributed control operator B : R nu → L 2 (D) be defined by tapering off exponentially as we move away radially from p j . The locations of these points are illustrated in Figure 1. We note that B ∈ L(R nu , L 2 (D)). For this example, we picked n u = 9. For the boundary control, we consider a single boundary condition ∂ ∂ν y(t, x) = v with v ∈ R. As output we take the operator Cy = 10 2 D y(x)χ T1 (x) + · · · + y(x)χ Tn z (x) dx, where χ S denotes the characteristic function of the set S and T j denote different areas, illustrated in Figure 1. Thus we attempt to control the mean value of the solution in these areas. We note that C ∈ L(H, R nz ). Here, n z = 3. Finally, we use a diffusion coefficient of 10 −2 , λ = 1, and the scaling factors R = 10 −2 and R b = 25. The latter was chosen such that the distributed and boundary controls influence the solution to a similar extent.

Results
We first verify Assumption 3 by computing the errors is shown in Figure 2 (left), and shows clear second-order temporal convergence, as expected. We then choose ∆t = 2 9 and take h = 2 j , j = 1, . . . , 6, with the reference solution P ref h having the same ∆t and h = 2 7 . The result is shown in Figure 2 (right) and also demonstrates second-order spatial convergence except for the first few coarse discretizations.
Next, we check Theorem 1. By choosing h = ∆t 2 , the expected error is O ∆t 1/4 and there is only one parameter to adjust. We therefore choose h = 2 −2j and ∆t = 2 −j for j = 1, . . . , 6, and compute a reference solution with j = 7. We start by computing the noise for the finest discretization first. Then for each coarser discretization, we add up the temporal increments and compute the L 2 -projection onto the coarser space. In this way we use the same noise for all the discretizations of each of the 100 sample paths. The resulting errors measured at t = T are shown in Figure 3, both for the controlled system and for the corresponding uncontrolled system where b = v = 0. We can observe that they decrease with a rate which is decidedly less than 1/2 and close to 1/4. Since our theoretical bound is for the worst-case situation, this is fully in line with Theorem 1.

Conclusions
We have proved convergence with optimal orders of a numerical scheme for an optimal control problem with both distributed and boundary control, as well as distributed and boundary Q-Wiener noise. Due to the irregularity of the noise, we can expect at most order 1/4 in time and order 1/2 in space. A numerical experiment confirms that this bound is optimal.