On the numerical solution of a T-Sylvester type matrix equation arising in the control of stochastic partial differential equations

We outline a derivation of a nonlinear system of equations, which ﬁnds the entries of an m × N matrix K , given the eigenvalues of a matrix D , a diagonal N × N matrix A and an N × m matrix B . These matrices are related through the matrix equation D = 2 A + BK + K t B t , which is sometimes called a t-Sylvester equation. The need to prescribe the eigenvalues of the matrix D is motivated by the control of the surface roughness of certain nonlinear SPDEs (e.g., the stochastic Kuramoto-Sivashinsky equation) using nontrivial controls. We implement the methodology to solve numerically the nonlinear system for various test cases, including matrices related to the control of the stochastic Kuramoto-Sivashinsky equation and for randomly generated matrices. We study the effect of increasing the dimen-sions of the system and changing the size of the matrices B and K (which correspond to using more or less controls) and ﬁnd good convergence of the solutions.


Introduction
We present an algorithm for finding the entries of an m × N matrix K related to the matrices D, B and A by (1.1) where t denotes the matrix transpose, the N × N matrix D has prescribed eigenvalues {µ i } 1≤i≤N , A is a given diagonal N × N matrix and B is a given N × m matrix, with m < N. This is a problem that arises when applying linear, pointactuated feedback controls to linear stochastic partial differential equations (SPDEs), as will be shown below.
The motivation for developing this algorithm is an application in control theory of nonlinear SPDEs such as the stochastic Kuramoto-Sivashinsky (sKS) equation, (1.2) which models the interface of systems that are part of surface roughening processes such as ion-sputtering processes or fluid flows influenced by thermal noise. The equation is considered in the periodic domain x ∈ (0, 2π), ν is an instability parameter, ξ (x,t) is a Gaussian noise with mean zero and a given covariance, and σ > 0 is its strength. We note that when ν < 1, there are positive eigenvalues associated with the eigenfunctions sin(kx), cos(kx) of the linear operator −ν∂ xxxx − ∂ xx for k < 1 √ ν .
In surface roughening processes, it is of particular interest to monitor and control the surface roughness of the interfaces, and this is measured by the L 2 -norm of its fluctuations around a steady stateū (usuallyū(x,t) = 0). The control of the sKS equation using nonlinear distributed feedback controls, i.e., controls that are applied everywhere in the physical domain, and depend nonlinearly on the solution at every time step, was studied by Lou, Christofides and collaborators (see, e.g., Lou (2005)). There, the authors essentially add controls that are proportional to νu xxxx + u xx + uu x and modify the dynamics so that the SDE becomes linear. Recently, Gomes et al. (2017) focussed on the case when the feedback controls are linear, and its application is point-actuated. In this case, the authors use a splitting method, which allows to separate the SDE into a nonlinear deterministic PDE (with random coefficients) which is coupled to a linear SDE w t = −νw xxxx − w xx + σ ξ (x,t). (1.4) In this paper, we focus on equation (1.4). This is because with the appropriate choice of controls applied to the deterministic equation (1.3), one can guarantee that v = 0 and therefore we are able to control the surface roughness of the solution u to (1.2) by controlling the surface roughness of w. The discretisation of this problem leads to the necessity to solve equation (1.1), as will be shown in Section 2. This paper is organised as follows. In Section 2 we outline the derivation of equation (1.1), make an overview of the relevant literature, and explain why the existing static output feedback control and Sylvester equation solvers are not enough to solve the problem. Section 3 will be concerned with the derivation of the system of nonlinear equations for the entries of K, and in Section 4 we solve this system of equations using standard optimisation techniques, for randomly generated matrices, as well as for the linear sKS equation. We close with conclusions and further work motivated by our algorithm in Section 5.
(2.1) in the domain x ∈ [0, 2π], with periodic boundary conditions, and initial condition w(x, 0) = w 0 (x). Furthermore, we assume that the solution w has mean zero. Here, ν is the instability parameter, m is the number of controls, b i (x), i = 1, . . . , m are the point actuated functions (we consider b i (x) = δ (x − x i ), where δ (·) is the Dirac delta function) and f i (t) are the control amplitudes, which we will choose to be a linear function of the solution w (more precisely, of its Fourier series coefficients as will become clear below). When we write the linear sKS equation in Fourier space, the linear operator −ν∂ xxxx − ∂ xx is represented by the matrix A = diag(a), where a 2k−1 = a 2k = −νk 4 + k 2 for k = 1, 2, . . . , and the control actuators are represented by . We truncate the system at N Fourier modes and by noting that {sin( jx), cos( jx)} j∈N is a basis ofL 2 p (0, 2π), the space of periodic functions of zero mean in (0, 2π), we expect that, for Nlarge enough, the discretised version is a good approximation to the continuous version.
Denoting by w = [w 1 , . . . , w N ] t and Ξ = [ξ 1 , . . . , ξ N ] t the vectors of Fourier coefficients of w and ξ respectively, the controlled system can be discretised in the form w t = Aw + BF + σ Ξ, where we defined the controls as a linear function of the Fourier coefficients of w, F = Kw. In summary, we need to solve the linear stochastic PDE (2.2) We note that due to the nature of the matrix B this system is fully coupled, which is the source of the difficulty on the solution to this problem.
As was noted above, the goal of Gomes et al. (2017) is to control the expected value of the surface roughness of the solution w to the linear sKS equation. The surface roughness is defined as follows.
wherew(t) = 2π 0 w(x,t) dx (which is zero in our case). This quantity is known to saturate for large times (Barabasi and Stanley, 1995), and we wish to control its saturated value. We can control the long-time behaviour of the surface roughness by controlling the expected value of its square. Considering the Fourier series representation of w and denoting the expected value of a function by < · > , we can write this quantity as which, in turn, is the trace of the covariance matrix of the solution w, Q(t), defined as Q i j (t) =< w i (t)w j (t) >. For a decoupled system (i.e., if B and K are diagonal matrices), the solution of this problem is straightforward (see Gomes et al. (2017) or Hu et al. (2008)). However, when the matrix B is full, the closed loop system is not decoupled and therefore standard ways of obtaining the second moment of the coefficients are not applicable here. We (see Gomes et al. (2017) for more details on this derivaiton) instead use Theorem 4 in Jimenez (2015) to find that the covariance matrix Q(t) of the solution w to w t = Cw + σ Ξ, is given by where F 1 and H 1 are the (1, 1) and (1, 3) blocks of the matrix e Mt , respectively. Here, I is the N × N identity matrix and the zeros stand for appropriately sized zero matrices. We compute e Mt and conclude that F 1 = e Ct , and from where we obtain The problem now is that due to the nature of the matrix B, C is not normal, i.e. it does not commute with its transpose. However, we are not interested in controlling the full matrix Q(t) but only its trace. Making use of the linearity of the trace and its continuity, as well as identities of the type we obtain tr(Q(t)) = σ 2 tr(I)t + tr(C +C t ) t 2 2 + tr C +C t 2 t 3 3! + tr C +C t 3 t 4 4! + tr C +C t 4 t 5 5! + · · · .
We assume that C + C t is invertible so that we can multiply by I = (C + C t ) −1 (C + C t ) and add and subtract adequate terms to obtain tr(Q(t)) = −σ 2 tr C +C t −1 + σ 2 tr C +C t −1 ∑ n∈ C +C t n t n n! .
(2.6) Moreover, if we guarantee that all of the eigenvalues of D = C + C t are negative, then as t → ∞ we obtain that the saturated value of the surface roughness is REMARK 2.1 We note that Equation (2.2) is a multidimensional Ornstein-Uhlenbeck process, and therefore we could use an alternative method for computing its covariance matrix by applying Itô isometry, obtaining e Cs σ σ T e C T s ds. (2.7) Using the fact that σ is scalar and the dominated convergence theorem would give us the Taylor expansion (2.5) directly. Furthermore, taking the limit t → ∞ in (2.7), it follows that Q = lim t→∞ Q(t) solves the Lyapunov equation from where we can also deduce the expression for r 2 d . Hence, we need to find a matrix K so that the trace of the matrix D −1 is prescribed. This can be obtained by controlling the eigenvalues of D. Using the definition of D, we observe that we wish to solve the matrix problem (1.1). As far as we are aware, there is no known solution to this equation. The usual pole placement (Kautsky et al., 1985;Yang & Orsi, 2006) or LQR (Zabczyk, 1992) algorithms well known in control theory are only valid to either stabilise the matrix C, i.e., make all of its eigenvalues negative, or prescribe the eigenvalues of C. These are not useful here, since prescribing the eigenvalues of C gives us, at best, a loose estimate on the eigenvalues of D.
Another possibility would be to use algorithms for static output feedback pole placement for symmetric systems, where the matrix K is constrained to be of the form SB t for a symmetric m × m matrix S. These algorithms are designed to prescribe the eigenvalues of C = A + BK and since C is symmetric in this case, this also prescribes the eigenvalues of D = 2(A + BSB t ). Algorithms for the numerical solution of this problem were proposed in Mahony et al. (1993) and Yang & Orsi (2006); our methodology is more general in that it does not require C to be symmetric. Mahony et al. (1993) reformulates the probem as a least squares problem and derive two gradient flow solutions which are based on the solution of ODEs for the matrix S and a gradient flow, Θ. The equilibrium solutions (which are shown to exist) of this system of ODEs define a minimizer of the least squares problem. In Yang & Orsi (2006) the authors use an iterative algorithm based on alternate projection ideas. Their method works for the general problem, and considers several problems, namely classic pole placement (where they prescribe the eigenvalues of C), stabilization (where the only requirement is that eigenvalues of C have negative real part), relaxed pole placement (where the eigenvalues are required to be in a disc with a prescribed center and radius), and hybrid problems (where the eigenvalues are required to belong to some prescribed region of the complex plane). This methodology is not guaranteed to find a solution, but numerical experiments show that the methodology is effective in practice.
There are many other control-related matrix equation problems which have been posed and solved. One example is the Sylvester equation (Sylvester, 1884), for which we need to find the matrix X solving for given matrices A, B and C. A further example is the famous Lyapunov equation, where an algorithm was given by Bartels & Stewart (1972), where the aim is to solve There are well known sufficient conditions for existence of solutions for these equations. None of these, however, can be adapted to solve equation (1.1). An equation of the form AX + X t B = C (2.10) is sometimes called T-Sylvester equation. The solution to this equation with C = 0 was recently studied in De Terán et al. (2013), where the authors describe how to find the general solution for arbitrary complex matrices A ∈ C m×n and B ∈ C n×m . They show that the solution can be obtained in terms of the matrix pencil A + λ B * and matrices related to the pencil's Kronecker canonical form, and give a complete description of the solution as long as these matrices are known. However, development of numerical methods to find the general solution of AX + X * B = 0, and for determining when AX + X * B = C is consistent and computing its general solution is still an open problem. Braden (1998) gives necessary and sufficient conditions for the equation to admit solutions and gives the general solution for X, where B is a (given) arbitrary square matrix. In this paper, A needs not be square or invertible, and therefore the method uses pseudo-inverses (i.e., matrices G such that AGA = A) only. The author proves that Equation (2.11) has solutions if and only if B t = ±B and (1 − P t 1 )B(1 − P 1 ) = 0, where P 1 is a projection operator, and gives a general solution in terms of B, P 1 , A, G and arbitrary matrices satisfying certain symmetry requirements.
Equation (1.1) differs from (2.11) and (2.10) in that it has an extra term, the matrix D, whose entries are not specified, and the only requirement is that it has certain eigenvalues. It is, therefore, an additional unknown in the problem, which makes our problem more general. If one decides to fix D such that D − 2A is symmetric (or antisymmetric) and satisfies the second condition of (Braden, 1998, Thm1), then we can use the general solution derived there. However, this would require yet another solution of a matrix equation for D.

The nonlinear system of equations for the entries of K
In this section, we will derive a system of nonlinear equations for the entries of the m × N matrix K, given the N × N diagonal matrix A = Diag ((a i ) 1≤i≤N ), the eigenvalues of the matrix D, {µ i } 1≤i≤N , and the entries of the N × m matrix B. We first write this system of equations, distinguishing the case m = 1 and m > 1 and then proceed to the derivation of the equations.
In what follows, we use the notation det J (M) for the diagonal minor of the matrix M corresponding to the entries in the subset J ⊆ {1, · · · , N}, i.e., the determinant of the submatrix of M formed from the rows and columns corresponding to the index set J.
For the case m = 1, we have to solve the system of nonlinear equations: Here we have denoted for convenience the entries of the column vector B as b q and the entries of the row vector K as k q . For the case m ≥ 2, we have the system of equations

The key idea behind the derivation
We wish to prescribe only the eigenvalues for D and therefore we need only the information provided by the characteristic polynomial of D, χ D (t). We have that In general, the characteristic polynomial of a matrix M can be expressed in terms of the sum over all diagonal minors of size l, denoted σ l , by: Therefore, we need to calculate the diagonal minors of 2A + BK + K t B t . For the individual subcases this is done in the following subsections. Once the diagonal minors are calculated, we obtain the set of N equations where det J denotes the diagonal minor of the matrix formed from the rows and columns corresponding to the index set J.

The key Lemmas
The nonlinear system of equations in terms of the entries of K is derived by using two key lemmas. For each lemma, we have a special formula for the m = 1 case. The matrix determinant lemma is found in Press et al. (1972)[p. 73] and is applied to the Google matrix for a more specialised problem in Ding & Zhou (2007). (3.10) The above lemma can be written in terms of column vectors u and v in the case m = 1 and may be written without determinants. (3.11) If A is not invertible, then we have the following version: We have a slightly complicated inverse to calculate in our particular application and so we employ the Woodbury identity (Woodbury, 1950).
LEMMA 3.2 (WOODBURY MATRIX IDENTITY) For an N × N invertible matrix A, N × m matrix B and m × N matrix K, we have the following identity: as long as I m + KA −1 B is invertible.
There is a specialisation to rank one updates to inverse matrices called the Sherman Morrison formula (Sherman & Morrison, 1950). It is noted that this formula appeared earlier in the literature by Hager (Hager, 1989). COROLLARY 3.2 (SHERMAN MORRISON FORMULA) Let A be an N × N invertible matrix and u and v be N × 1 vectors, and suppose that 1 + v t A −1 u = 0. Then we have the following expansion of the rank one perturbation of the inverse of the matrix A: (3.14) We appropriately extend these lemmas for the purposes of our problem.
Proof. We apply the Matrix-Determinant Lemma (Lemma 3.1) to det (2A + BK + K t B t ), factorising out 2A + BK: (3.16) We then apply the Matrix Determinant lemma (Lemma 3.1) to the first determinant, to take out a factor of det (2A): (3.17) We now have the first two factors and we need to use the Woodbury Matrix Identity (Lemma 3.2) to expand the inverse in the third determinant around 2A: We then combine (3.18) with (3.17) to obtain the required expression.
In the rank one case, we may use the Sherman Morrison formula and obtain a simple corollary.
Proof. This identity follows from repeated application of Lemma 3.1 as follows: (3.20) We then apply the Sherman Morrison formula to the last bracket to obtain We realise that v t (2A) −1 u = u t (2A) −1 v, since (2A) −1 is symmetric and the transpose of a scalar is the same scalar. We therefore multiply through by the second factor to obtain the expression 3.3 Derivation for the case m = 1 We use vector notation for B and K and denote them by u and v t respectively. This is to emphasise that we are dealing with a column vector and a row vector. In this case, we may rewrite (1.1) as To calculate det J (2A + uv t + vu t ), we use Corollary 3.3, to write it as The matrix products are all done with index set J and so multiplying out this expression and writing it in component form gives To obtain σ k , we sum over all index sets of size k and make the identification B = u and K = v t in order to obtain the expression (3.1).

The case m ≥ 2
To derive the nonlinear system (3.2), we need to calculate det J (2A + BK + K t B t ) and so we use the Symmetric Extension of the Matrix Determinant Lemma (Lemma 3.3) to obtain the expression where the indices of matrix multiplication made inside these determinants are from j ∈ J.
Recalling the definition of η (J) i j , we realise the second determinant is precisely ∆ (m,J) and that the first two terms in the third determinant are δ i j + η ji , when written in index form. It is also evident from the definitions of β

Implementation and numerical results
In order to solve the matrix equation (1.1), we need to solve a nonlinear system of equations for the entries of the m × N matrix K. When m = 1, system (3.1) is a system of N equations for N unknowns, in which case we can use a standard Newton solver in MATLAB. For m > 1 the system (3.2) has N equations for mN unknowns and therefore it is underdetermined. In this case, the system may have no solutions, one solution, multiple ones or even infinitely many -we note that in the m = 1 case there is also the possibility that a solution does not exist. We use MATLAB's nonlinear solver to solve the desired system of equations, and for under-and overdetermined systems, MATLAB uses the Levenberg-Marquardt algorithm (Levenberg, 1944;Marquardt, 1963). This is an algorithm that solves nonlinear least squares problems by interpolating between the Gauss-Newton and gradient descent methods. It is a more robust method than the application of Gauss-Newton by itself, which guarantees that in most cases a solution will be found, even if the starting point is far from a minimum. However, there is the possibility that the solution found is a local minimum, and not necessarily a global one. We tested the efficiency of the algorithm described here for both m = 1 and m > 1 and for various values of N and different matrices A and B. We define two types of error for this problem. The first type of error we consider, E 1 , is defined by how far are the obtained eigenvalues µ k from the prescribed ones (μ k ): E 1 is small if we find a solution to the system of nonlinear equations. However, since there is a possibility that solutions do not exist, we consider a second type of error, which measures how far the obtained trace is from the desired trace, ∑ N k=1μ k : We expect E 2 to be small independently of whether the system has an exact solution or not, since one of the N equations (the one for |J| = 1) corresponds precisely to E 2 being zero.

Control of the stochastic Kuramoto-Sivashinsky equation
We first apply our algorithm to our motivating example, the linear sKS equation. In this application, we wish to maintain the negative eigenvalues and change the positive ones to a value that will allow the prescribed trace for the matrix D. For the purpose of this example, we will choose (4.1) We used ν = 0.5, which has two positive entries in the matrix A. In this case, it makes sense to use m = 2 and we chose x 1 = π 3 and x 2 = 5π 3 . The convergence results for the sKS equation are as summarised below in Table 1, where we describe the running time, number of iterations, function evaluation count and errors E 1 and E 2 as a function of N, the size of the system. We note that N is even, due to the nature of the eigenvalues of the original problem.

Control of randomly generated matrices
To test the validity of our algorithm in more general cases, we generate a vector a with N entries randomly distributed, following a Gaussian law with mean zero and variance 1. We define the initial µ 0 = 2a and change 1, 2, . . . , 5 of its elements to obtain the µ that we require. We then use m = 1, 2, . . . , 5, respectively, to obtain the matrix K that gives us the desired eigenvalues. We specify below the randomly generated matrix B and vector a we use in Tables 2-9.
We then extracted, for each N, the first N rows of a and B to generate the matrices A and B, respectively. In the case of matrix B, we extract its first m columns as well. Finally, the initial guess for the matrix K is the zero matrix, with ones in the position k j j , j = 1, . . . , m. We limit the function count to 10000.
For the case m = 1, we tried two different values for the desired value of µ: µ 1 = 1.5 and µ = 0.5. Our results for m = 1 are presented in Tables 2-3 and we observe that when chose other values for the desired µ 1 , e.g. µ 1 = 2, the algorithm does not provide us with a good solution. We did observe, however, that the results were better for this value of µ 1 with increasing N. Nevertheless, values closer to the initial value of µ 1 yield better results. This is in agreement with the fact that there might not always be a solution to this problem, and conditions for existence of solutions are an object of current work.  Table 2: Results of the numerical method for the solution of the general case with m = 1 and when we chose to change µ = 1 = 1.5.
Another interesting test for our method is the case when we are willing to have more than one control to change only one eigenvalue. We tested our algorithm for N = 10, and m = 2, 3, 4, 5, and µ 1 = 1.5, 0.5.
We observe that for the case m = 5 there is no point in using N = 4, since there would not be 5 eigenvalues to    It appears that as we increase the number of eigenvalues to change (and/or the number of controls to use), the method benefits from larger values of N. However, increasing N much further than 10 makes the convergence of the nonlinear solver much slower. The results obtained are, however, satisfactory for the application our work was motivated from.
The final test case we perform is to use less controls than the number of eigenvalues we wish to change. It is known in control theory that, depending on the properties of A and B it is even possible to shift the whole spectrum with only one scalar control (for example, when A is an N × N Jordan block and B = e N -the N-th unit basis vector -by the Hautus criterion the pair (A, B) is controllable, so that by the Pole Placement Theorem the eigenvalues of A + BK can be arbitrarily assigned by choosing a suitable K). Therefore, we investigate if this is possible to do with our algorithm. We choose the last case we presented, i.e., change the first 5 eigenvalues of 2A to the values prescribed before, and as an example we use m = 3 controls. Our results are presented in Table 9 below.

Analysis, Conclusions and Further Work
We have outlined a method for finding a matrix K such that we can prescribe eigenvalues to the matrix D = 2A + BK + K t B t , when we are given an N × N diagonal matrix A and an N × m full matrix B. The motivation was a particular application in linear control theory of sPDEs. We demonstrate the good performance of the algorithm by showing its good convergence in the two metrics we used to measure the error in each iteration. We note that we were only interested in the case when the matrix A is diagonal, but the method presented here is expected to work for full matrices A, as long as these matrices are symmetric (see Corollary 3.3).
We present the results of the application of our method for two different examples, for various values of N and small values of m, which were sufficient for the application we considered. In the sKS example, we observe good convergence of the algorithm in both measures we consider, at least for small N. This is explained through the fact that the matrix B has a regular structure and A has repeated eigenvalues. We observe that the algorithm converges faster when the distance between the diagonal entries of 2A and the required eigenvalues is small. There may be difficulties in wanting to make a perturbation that is too large, and this idea still needs to be quantified.
We focus mostly on using the same number of controls as eigenvalues to change due to the application that motivated us: here we are already using the same number of controls as unstable modes (i.e., the eigenvalues we need to change) for the equation (1.3) and so there is no necessity to use less controls here. However, we have also tested the algorithm for cases when we use more and less controls than the number of eigenvalues we wish to prescribe and again found good convergence, which is in agreement with other pole placement algorithms.
The proposed algorithm performance improves with increased N in the sensethat the running time decreases with increasing N for the randomly generated matrices A and B. The maximum running time for the sKS equation example is large, but for the purposes of the application this was sufficient, as it was a small addition to the already long computational time.
Finally, there are a number of questions related to the analysis presented here. Existence and uniqueness of solutions of the system of nonlinear equations is an interesting and not straightforward problem to study. In fact, it is easy to establish conditions on the desired eigenvalues and the entries of the matrix A for existence of solutions in the case N = 2, m = 1, but this problem becomes considerably harder as we increase N or m. Our numerical experiments suggest that solutions exist for diagonal matrices A and m at least as large as the number of eigenvalues we wish to change.
Another interesting question is related to the initial guess for the matrix K. Here, we chose the initial guess to be [I m 0], where I m is the m × m identity matrix and 0 stands for the zero matrix of size m × (N − m) motivated by the sKS example, where we wish to change the already stable eigenvalues as little as possible. Numerical experiments started with the zero matrix, or a matrix whose entries are all equal to one produced similar results, with the second case presenting slower convergence.
It would also be interesting to establish the connection with similar work done on perturbations to the Google matrix in Ding & Zhou (2007) and it would be interesting to check if our algorithm may be simplified in this case to provide updated eigenvalues. We shall examine these and related issues in future studies.