Solving inverse problems in physics by optimizing a discrete loss: Fast and accurate learning without neural networks

Abstract In recent years, advances in computing hardware and computational methods have prompted a wealth of activities for solving inverse problems in physics. These problems are often described by systems of partial differential equations (PDEs). The advent of machine learning has reinvigorated the interest in solving inverse problems using neural networks (NNs). In these efforts, the solution of the PDEs is expressed as NNs trained through the minimization of a loss function involving the PDE. Here, we show how to accelerate this approach by five orders of magnitude by deploying, instead of NNs, conventional PDE approximations. The framework of optimizing a discrete loss (ODIL) minimizes a cost function for discrete approximations of the PDEs using gradient-based and Newton’s methods. The framework relies on grid-based discretizations of PDEs and inherits their accuracy, convergence, and conservation properties. The implementation of the method is facilitated by adopting machine-learning tools for automatic differentiation. We also propose a multigrid technique to accelerate the convergence of gradient-based optimizers. We present applications to PDE-constrained optimization, optical flow, system identification, and data assimilation. We compare ODIL with the popular method of physics-informed neural networks and show that it outperforms it by several orders of magnitude in computational speed while having better accuracy and convergence rates. We evaluate ODIL on inverse problems involving linear and nonlinear PDEs including the Navier–Stokes equations for flow reconstruction problems. ODIL bridges numerical methods and machine learning and presents a powerful tool for solving challenging, inverse problems across scientific domains.


Main
Classical numerical solutions for partial differential equations (PDE) rely on their discretization, for example by finite difference [10] or finite element techniques [23], given initial and boundary conditions. However, these techniques may not be readily extended for problems where either the equations have missing parameters or no sufficient data is available to form a correct initial-value problem. Such problems are encountered in various fields of science and engineering and they are handled by various methods such as PDE-constrained optimization [6], data assimilation [11], optical flow in computer vision [4], and system identification [12]. There is a strong need to develop efficient methods for solving such equations efficiently.
One approach for solving such problems represents the solution using neural networks. This approach was introduced by Lagaris et al. [9]. At the time, the method did not offer significant advantages for solving ODEs and PDEs in particular due to its computational cost. Neural networks were also applied for back-tracking of an N -body system [16]. Twenty years later, the method was revived by Raissi et al. [17] who used modern machine learning methods and software (such as deep neural networks, automatic differentiation, and TensorFlow) to carry out its operations. The method has been popularized by the term physics-informed neural network (PINN).
PINNs cannot match standard numerical methods for solving classical well-posed problems involving PDEs but they were positioned as a universal and convenient tool for solving ill-posed and inverse problems. However, to the best of our knowledge, there is no baseline for comparison and assessment of the capabilities of these methods. While the authors admit that conventional numerical solvers are more efficient for classical problems [17], a direct comparison with such solvers reveals important drawbacks of the neural network approach.
In PINNs, the cost of evaluating the solution at one point is proportional to the number of weights of the neural network, while for conventional grid-based methods the cost is constant. Furthermore, as each weight of the neural network affects the solution at all points, the method is not consistent with the locality of the original differential problem. This is a crucial issue as in PINNs, even for linear problems, the solution is approximated by nonlinear neural networks. Consequently, the gradient of the residual with respect to the weights at one point is a dense vector and the Hessian of the loss function is a dense matrix. This makes efficient optimization methods such as Newton's method unsuitable. In contrast, for linear problems Newton's method would find the solution exactly after one step and for nonlinear problems would converge quadratically and typically require few iterations to reach the desired accuracy. The supposed convenience of representation and ease of formalism by PINN comes with a cost. While PINN evaluates the differential operator exactly on a set of collocation points, it makes no allowance for physically and numerically motivated adjustments that speed up convergence and ensure discrete conservation such as deferred correction, upwind schemes, and discrete conservation laws as represented in the finite volume method. Furthermore, decades of development and analysis that went into conventional solvers enables understanding, prediction, and control of their convergence and stability properties. Such information is not available with neural networks and recent works are actually aiming to remedy this situation [14] Nevertheless, the PINN framework is actively studied as a tool to solve ill-posed and inverse problems [7].
We introduce the Optimizing a Discrete Loss (ODIL) framework that combines discrete formulations of PDEs with modern machine learning tools to extend their scope to ill-posed and inverse problems. The idea of solving the discrete equations as a minimization problem is known as the discretize-then-differentiate approach in the context of PDE-constrained optimization [6], has been formulated for linear problems as the penalty method [19], and is related to the 4D-VAR problem in data assimilation [11]. This procedure has two key aspects. Firstly, the discretization itself, that defines the accuracy, stability and consistency of the method. Secondly, the optimization algorithm to solve the discrete problem. Our method uses efficient computational tools that are available for both of these aspects. The discretization properties are inherited from the classical numerical methods building upon the advances in this field over multiple decades. Since the sparsity of the problem is preserved, the optimization algorithm can use a Hessian and achieve a quadratic converge rate, which remains out of reach for gradient-based training of neural networks. Our use of automatic differentiation to compute the Hessian, makes the implementation as convenient as applying gradient-based methods.

Wave equation. Accuracy and cost of PINN
In this section, we apply ODIL to the solution of the initial-value problem for the wave equation u tt = u xx . We compare the results of ODIL with those of PINNS in terms of accuracy and computational cost. Both methods reduce the problem to a minimization of a loss function. ODIL represents the solution on a uniform grid and constructs the loss from a second-order finite volume discretization of the equation, while PINN represents the solution as a fully connected neural network and evaluates the residuals of the original differential equation exactly on a finite set of collocation points [17]. The initial conditions for u and u t and Dirichlet conditions on the boundaries are specified from an exact solution u(x, t) = 1 10 5 k=1 (cos (x − t + 0.5)πk + cos (x + t + 0.5)πk) in a rectangular domain x ∈ (−1, 1), t ∈ (0, 1). For a given number of parameters N , ODIL represents the solution on a √ N × √ N grid, while PINN consists of two equally sized hidden layers with tanh activation functions and N being the total number of weights. The number of collocation points for PINN is fixed and amounts to 8192 points inside the domain and 768 points for the initial and boundary conditions. Figure 1 shows examples of the solutions obtained using both methods as well as measurements of their accuracy and cost depending on the number of parameters. The measurements for PINN include 25 samples for the random initial weights and positions of the collocation points. In the case of ODIL, solutions from two optimization methods L-BFGS-B and Newton coincide. The execution time is a product of the number of optimization epochs required to reach a 150% of the error obtained after 80000 epochs and an average execution time over the last 100 epochs. The measurements are performed on Intel Xeon E5-2690 v3 CPUs. Both methods demonstrate similar accuracy. The error of ODIL scales as 1/N corresponding to a second-order accurate discretization. The error of PINN is significantly scattered, although the median error also scales as 1/N . The execution times, however, are drastically different. With N = 20000 parameters, optimization of PINN using L-BFGS-B takes 15 hours, optimization of ODIL using L-BFGS-B takes 40 minutes, and Newton with a direct linear solver [20] takes only 0.5 seconds, which is 100 000 times faster than PINNs.

Lid-driven cavity. Forward problem
The lid-driven cavity problem is a standard test case [5] for numerical methods for the steady-state Navier-Stokes equations in two dimensions: u x +v y = 0, uu x +vu y = −p x +1/Re(u xx +u yy ), uv x +vv y = −p y +1/Re(v xx +v yy ), where u(x, y) and v(x, y) are the two velocity components and p(x, y) is the pressure. The problem is solved in a unit domain with no-slip boundary conditions. The upper boundary is moving to the right at a unit velocity while the other boundaries are stagnant. We use a finite volume discretization on a uniform Cartesian grid with 128×128 cells based on the SIMPLE method [15,3] with the Rhie-Chow interpolation [18] to prevent oscillations in the pressure field and the deferred correction approach that treats high-order discretization explicitly and low-order discretization implicitly to obtain an operator with a compact stencil. Figure 2 shows the streamlines for values of Re between 100 and 3200, as well as a convergence history of various optimization algorithms. The L 2 error is computed relative to the solution of the discrete problem. The number of iterations it takes for L-BFGS-B to reach an error of 10 −3 at Re = 400 is in the order of 10000 which is comparable to the number of iterations typically needed for training PINNs for flow problems [17]. Adam has only reached an error of 0.05 even after 50 000 iterations. Conversely, Newton's method takes about 10 iterations to reach an error of 10 −6 , and the number of iterations does not significantly change between Re = 100 and 400.

Lid-driven cavity. Flow reconstruction
Here we solve the same equations as in the previous case, but instead of the no-slip we only impose free-slip boundary conditions and additionally require that the velocity field equals a reference velocity field in a finite set of points. The loss function is a sum of three types of terms: residuals of equations, terms to impose known velocity values, and a regularization term k 2 reg u xx 2 + u yy 2 + v xx 2 + v yy 2 with k reg = 10 −4 . The problem is solved on a 128 × 128 grid and the reference velocity field is obtained from the forward problem. Figure 3 shows the results of reconstruction from 32 points. Surprisingly, this small number of points is sufficient to reproduce features of the flow.  u(X) as the best reconstruction error given K points, where u(X) is the velocity field reconstructed from points X and the minimum is taken over all sets of K points. Then we define a measure of flow complexity for a given accuracy ε > 0 as the minimal number of points required to achieve that accuracy K min (ε) = {K | E(K) < ε}.

Measure of flow complexity
To illustrate the measure, we consider four types of flow: uniform velocity, Couette flow (linear profile), Poiseuille flow (parabolic profile), and the flow in a lid-driven cavity. The reference velocity fields in the first three cases are chosen once at an arbitrary orientation and such that the maximum velocity magnitude is unity. The problem is solved on a 64 × 64 grid, the velocity at the boundaries is extrapolated from the inside with second order. The error is defined through the L 1 norm. The loss function includes the same regularization term as in with k reg = 10 −3 . Figure 4 shows the values of E(K) and K min (0.05). As expected, the uniform flow takes 1 point to reconstruct while adding higher order terms to the velocity increases the number of points. Finally, 29 points are required to reach an error of 0.05 for the lid-driven cavity flow.

Velocity from tracer
We consider the evolution of a tracer field governed by the advection equation in two dimensions. The problem is to find a velocity field u(x, y) given that the tracer field c(x, t) satisfies the advection equation ∂c ∂t + u · ∇α = 0 in a unit domain, and the initial c(x, 0) = c 0 (x) and final c(x, 1) = c 1 (x) profiles are known. The discrete problem is solved in space and time on a 64 × 64 × 64 grid. The loss function includes a discretization of the equation with a first-order upwind scheme, terms to impose known initial and final profiles of the tracer, and regularization terms 10 −3 ∇ 2 u 2 and u t 2 to prioritize velocity fields that are smooth and stationary. The results of inference are shown in Figure 5. The inferred velocity field stretches the initial tracer representing a circle to match the final profile.
Inferring the velocity field from tracers is an example of the optical flow problem [4,13] which is important in computer vision. Furthermore, reconstructing the velocity field from a concentration field can assist experimental measurements.

Discussion
We present a framework (ODIL) for computing the solutions of PDEs by casting their discretization as an optimization problem and applying optimization techniques that are widely available in machine learning software libraries. The concept of casting the PDE as a loss function has similarities with similarities with neural network formulations such as PINNs. However the fact that we use the discrete approximation of the equations allows for ODIL to be orders of magnitude more efficient in terms of computational cost and accuracy compared to the PINNs for which complex flow problems "remain elusive" [21]. The present results suggest that ODIL can be a suitable candidate for solving large scale problems such as weather prediction, complex biological flows, and other scientific and industrial simulations.

Methods
In the following, we formulate the ODIL framework for the one-dimensional wave equation u tt = u xx discretized with finite differences. Instead of solving this equation by marching in time from known initial conditions, we rewrite the problem as a minimization of the loss function where the unknown parameter is the solution itself, a discrete field u n i on a Cartesian grid. The loss function includes the residual of the discrete equation in grid points Ω h and imposes boundary and initial conditions u = g in grid points Γ h . To solve this problem with a gradient-based method, such as Adam [8] or L-BFGS-B [22], we only need to compute the gradient of the loss which is also a discrete field ∂L/∂u. To apply Newton's method, we assume that the loss function is a sum of quadratic terms such as L(u) = F where F s and G s denote F [u s ] and G[u s ]. A minimum of this function provides the solution u s+1 at the next iteration and satisfies a linear system (∂F s /∂u) T (∂F s /∂u) + (∂G s /∂u) T (∂G s /∂u) (u s+1 − u s ) + (∂F s /∂u) T F s + (∂G s /∂u) T G s = 0.
Cases with terms other than F and G are handled similarly. We further assume that F [u] and G[u] at each grid point depend only on the value of u in the neighboring points. This makes the derivatives ∂F /∂u and ∂G/∂u sparse matrices. To implement this procedure, we use automatic differentiation in TensorFlow [1] and solve the linear system with either a direct [20] or multigrid sparse linear solver [2].

Code availability
The results are obtained using software available at https://github.com/cselab/odil.