Robust data driven discovery of a seismic wave equation

Despite the fact that our physical observations can often be described by derived physical laws, such as the wave equation, in many cases, we observe data that do not match the laws or have not been described physically yet. Therefore recently, a branch of machine learning has been devoted to the discovery of physical laws from data. We test such discovery algorithms, with our own flavor of implementation D-WE, in discovering the wave equation from the observed spatial-temporal wavefields. D-WE first pretrains a neural network (NN) in a supervised fashion to establish the mapping between the spatial-temporal locations (x,y,z,t) and the observation displacement wavefield function u(x,y,z,t). The trained NN serves to generate meta-data and provide the time and spatial derivatives of the wavefield (e.g., u_tt and u_xx) by automatic differentiation. Then, a preliminary library of potential terms for the wave equation is optimized from an overcomplete library by using a genetic algorithm. We, then, use a physics-informed information criterion to evaluate the precision and parsimony of potential equations in the preliminary library and determine the best structure of the wave equation. Finally, we train the"physics-informed"neural network to identify the corresponding coefficients of each functional term. Examples in discovering the 2D acoustic wave equation validate the feasibility and effectiveness of D-WE. We also verify the robustness of this method by testing it on noisy and sparsely acquired wavefield data.


introduction
Seismic wave equations play a critical role in seismology to constrain and define wave propagation in a specific medium.Having an accurate equation contributes to our understanding of wave propagation.Seismologists previously followed some basic physical laws to propose numerous wave equations [1,2,3,4,5].The presented equations provide the central ingredient to describe and model seismic wave propagation in the Earth's interior.However, we can not ignore that many these equations are derived based on some approximations and assumptions.Moreover, wave propagation in certain media may involve elusive and complex mechanisms [6,7,8,9].Hence, the resulting governing equations might not accurately describe the wavefield, such as in attenuating media, as they inherently depend on the underlying assumptions [10].In this case, a natural solution is to seek a new accurate mathematical equation based on existing knowledge and physical principles to replace the original wave equation.Actually, it's undeniable that this process can be difficult and time consuming.Then, we ask ourselves: can we discover a sufficiently precise wave equation directly from observed data without relying on physical laws?
Benefiting from the recent advances in machine learning (ML) and data-processing capabilities, the dawn of this question maybe in the horizon.More recently, data-driven discovery methods have been developed to identify the underlying partial differential equations (PDEs) of physical problems.Specifically, data-driven discovery methods are based on constructing a library composed of candidate functional terms, and then using various optimization algorithms to select the most appropriate combination of candidate terms, generating the general form of the equations.In terms of library construction approaches, current methods are mainly divided into closed and expandable library methods.
Closed library methods, which are the most widely used, first build an overcomplete library and then use sparse regression methods to extract the dominating candidate function terms.Lasso [11], sequential threshold ridge regression [12], and SINDy [13] are the leading representative sparse regression methods for discovering PDEs from the closed construction library.Although sparse regression methods remain highly efficient, the fact is that they are limited to a complete candidate library given beforehand.It's difficult to provide an overcomplete candidate library that must contain true PDE terms, especially in a setting in which we lack prior knowledge.The growing size of candidate libraries, meanwhile, leads to the increasing difficulty of sparsifying them, which means that we may discover a wrong PDE.Furthermore, in the process of constructing a candidate library, the derivatives of the observation data are often obtained by using finite difference, polynomial interpolation and other methods.This is not robust in the face of data acquired on irregular grids and maybe exposed to noise.
In contrast, expandable library methods have a stronger ability to identify PDEs with complicated structures than the closed library methods.This is because expandable library methods don't require a pre-determined overcomplete library.They just require a randomly generated incomplete initial library, which will evolve to produce unlimited combinations through the introduction of cross-over and mutation operations of genetic algorithms (GA) [14].However, processing noisy and sparse data still remains a challenge.With the rapid development of machine learning (ML), some researchers [15,16] utilized NN functional representation to calculate derivatives through automatic differentiation.Compared to conventional numerical methods, the calculation of derivatives provided by NN is more stable and robust to noisy data [16].Meanwhile, NNs can operate in mesh free environments when computing derivatives via automatic differentiation.However, automatic differentiation is relatively costly, but for discovering an equation, we believe the accuracy is worth the cost.
In this work, we adapt a new data-driven method to discover the wave equation, named D-WE, which combines a neural network (NN) and GA.In D-WE, a fully connected deep NN is first trained, where the input to the network is spatialtemporal locations (e.g., x, y, z, t), and the corresponding output is the observed pressure wavefield u (x, y, z, t).After training the network, we can produce meta-data and compute time and spatial derivatives of u (x, y, z, t).Subsequently, a digital coding is created to define the form of the underlying wave equation, including the left-hand side (LHS) and right-hand side (RHS).The special encoding corresponds to the genomes for the combination of some candidate function terms.Cross-over and mutation are employed to expand the diversity of the library, that is, increasing the search scope of candidate wave equations.To determine the proper wave equation from the numerous candidate equations, we use the physics-informed information criterion (PIC) [16], which simultaneously considers the evaluation of parsimony and precision, resulting in an equation with physical interpretability.After discovering the structure of the wave equation, a physics-informed neural network (PINN) [17], which is initialized by the trained network, is trained to identify the corresponding coefficients for every term in the discovered wave equation.We present the discovery of the 2D acoustic equation to demonstrate the potential of D-WE, and its robustness in handling noisy and sparse data.

Problem description
In this work, we consider the general form of a seismic wave equation consisting of: with where u T denotes different orders of derivatives of displacement u with respect to time t, e.g., first (u t ) or second (u tt ); Θ(u) refers to the candidate library composed of potential function terms, in which the subscripts represent different orders of derivatives in space; [ξ i ] i=1,•••,n denotes the vector of coefficients with size n of the candidates in the library; and f (•) is a function parameterizing a wave equation with possible contributing terms.
For a specific displacement wavefield data, denoted as u We can see that the linear system in (3) has N x • N y • N z • N t equations and n unknown coefficients.In most cases, 3) is an over-determined system.Fortunately, a parsimonious form exists in most seismic wave equations, thus, just a few function terms have nonzero coefficients.The objective of a data-driven discovery of a wave equation is to identify the closed form of f (•), that is, find the linear combination of function terms in Θ(u) and corresponding vector of coefficients [ξ i ] i=1,•••,n .Actually, these two problems are mutually dependent.On the one hand, once we identify which coefficients are nonzero from the coefficients [ξ i ] i=1,•••,n , the structure of the correct equation is also determined.On the other hand, if we discover the structure of the equation, then how to obtain the nonzero coefficients becomes a simple problem of solving a linear equation.

The Neural network
As mentioned, the problem of discovering the seismic wave equation is given by equation (3).However, how can we obtain the derivative terms on the LHS and RHS when only the displacement wavefield is known?A feasible option is to use finite difference and other numerical methods to calculate time and spatial derivatives.Although numerical methods are efficient, the observations need to be on a regular grid and have high signal-to-noise ratio for accurate numerical derivatives, which is usually difficult to guarantee, especially for field data.
In contrast, NNs have proven their robustness in representing noisy data [16].Hence, an alternative solution is that we can skillfully use automatic differentiation of the NN to calculate the derivatives during the process of backpropagation.
For this purpose, we only need to train a deep fully connected backpropagation NN (shown in Figure 1) using the following loss function where θ denotes the NN trainable parameters, and the corresponding inputs to the network are the spatial-temporal locations (x i , y j , z k , t l ), to approximate the displacement wavefield u (x i , y j , z k , t l ) and its derivatives.In real applications, the wavefield (i.e., labels) are given by the recorded data.However, for this analysis of the approach, we will simulate the data.
Since the inputs to the network are mutually independent position coordinates, the typical requirement that observed data must be collected on a regular grid is not need here.Another advantage of this implementation is that the trained NN can be further used to predict the wavefield at certain spatial locations where we do not have observations.The predicted wavefields are referred as meta-data, which can further expand the data volume to us, thereby assisting in the discovery of the wave equation.Furthermore, this trained NN will serve as the initialization for the PINN, significantly reducing the training cost required to accurately determine the equation structure using PIC and identify the coefficients.
We will elaborate on this in detail later on.

Genetic algorithm
GA is a type of optimization algorithm inspired by the process of natural selection.Since directly solving equation ( 3) is a nondeterministic polynomial time (NP) hard problem with unlimited combinations [16], we utilize GA to search for an optimal preliminary candidate functional terms from unlimited combinations, which converts the problem to a finite-dimensional problem.The GA does this by creating a population of candidate solutions, each of which represents a different set of functional terms.The fitness of each candidate term is evaluated based on its ability to accurately model the wave behavior of the proposed equation.In our method, GA uses a series of operations, including translation, crossover, mutation, and selection to evolve the population of candidate solutions over multiple generations.
The workflow of GA is presented in Figure 2. In the following, we illustrate the steps in detail.
Firstly, we introduce a principle of translation to digitize the structure of potential seismic wave equation to the corresponding genome.Specifically, we first use numbers, which is defined as the gene, to represent different order of derivatives.For example, Gene: Here, number 0 represents displacement/pressure wavefield u; number 1, 2, and 3 are used to encode the first, second, and third order spatial derivatives of the displacement/pressure wavefield, respectively.Also, we use numbers 1 and 2 to represent first and second order time derivatives of the displacement/pressure wavefield, respectively.Then, we combine some genes to form gene modules, which can be utilized to define function terms.For example, the function terms in the LHS are represented as Gene module: while the function terms in the RHS have the form Gene module: It's noted that for most wave equations, the LHS of the equation is expressed as a first-or second-order time derivative.Hence, we only consider these two cases here.Meanwhile, we have to emphasize that we consider the symmetry of the wave equation in the RHS of the equation.For example, the gene module [2], which is different from gene 2, not only represents the second-order spatial derivatives of the displacement/pressure wavefield with respect to spatial variables x, but also signifies the second-order spatial derivatives with respect to the spatial variables y and z.The spatial derivatives of the displacement/pressure wavefield with respect to these three spatial variables are combined through addition.Additionally, when a gene module has multiple genes, it represents the multiplication of the corresponding genes, that is, [0, 3] denotes uu xxx + uu yyy + uu zzz .The combination of gene modules is regarded as the genome, which includes the LHS and RHS terms in the potential wave equation.We assume the LHS of wave equation only includes the derivatives with respect to t, and the LRS of wave equation consists of the derivatives with respect to spatial variables x, y, and z, which applies to many PDEs.Hence, in the case of the LHS of the equation given by the first-order time derivative, we can use the following digitization to translate the corresponding wave equation: Figure 3: An illustration of the process of cross-over and mutation.
When the LHS of the equation is given by the second-order time derivative, we can represent the wave equation as follows: Genome: These genomes are presented shown as examples.Similar digitization and encoding can be obtained analogously.We note that the gene modules here are connected by addition.Moreover, we have not digitized the coefficients, such as velocity and density, which are commonly present in the wave equation.Our current focus is discovering the wave equation without prior knowledge of the specific values of these coefficients.The values of the coefficients will be determined using PINN after discovering the equation's structure, as will be introduced in Section 2.5.In equations 8 and 9, we have set the coefficients of the function terms u xx + u yy + u zz and uu xxx + uu yyy + uu zzz to 1 for the sake of illustration.However, the coefficients can take on arbitrary values, and this does not impact our ability to discover the equation's structure.
Subsequently, cross-over and mutation are conducted under a certain probability to obtain next generation candidates.Cross-over means swapping parts of gene modules of two genomes to generate their children (see Figure 3a).Following the cross-over, mutation produces new genes, containing add, delete, and order genes (see Figure 3b).It should be emphasized that crossover and mutation are only applied to the RHS of the equation, whereas the LHS searches for the time derivative order.This is reasonable for most wave equations.
After mutation, we need to measure the quality of the genome and then perform the selection process.The measurement index is computed by a fitness function as follows: where N denotes all observation samples, equ L denotes the LHS function terms of the candidate wave equation, equ R i represent i th function term in the RHS, and the corresponding coefficients ξ i are calculated using singular value decomposition (SVD).It is worth emphasizing that in this section and the next section, we are utilizing the SVD method to calculate the coefficients corresponding to each function term.Although it may not achieve absolute precision, it can be relied upon.To avoid redundancy in the discovery equation, we use an l 0 penalty on the number of terms in the discovered equation.Here, len(genome) denotes the length of the genome, and ϵ is a hyperparameter.In general, as ϵ increases, the equation becomes more concise.Conversely, as ϵ decreases, the equation exhibits a more complex structure.
Once we obtain the fitness of all potential candidate equations, we can select the genomes that better describes the wave propagation system.In our case, the best half of the children are selected as the next generation of parents, and all others genomes are replaced by new random genomes.The process of cross-over, mutation, and selection is repeated for the new generation.When a certain predefined iteration is reached, the preliminary library with a few terms in the last generation is reserved.For this preliminary library, the combinations of all candidate function terms are countable, which is useful to evaluate each combination to further determine the equation.To do this, in the next section, we will use the PIC algorithm [16] to discover the accurate structure of the wave equation from the preliminary library.

Physics-informed information criterion
The PIC algorithm involves two types of measurements: redundancy and physical losses.Redundancy loss is used to measure the parsimony of the proposed equation and is based on the idea that the coefficients of redundant terms are unstable when applied to the observed data on moving windows of a given time step [18].Therefore, we can utilize this technique, namely the moving horizon, to calculate the average variation in coefficients for each combination to obtain the redundancy loss.As shown in the Figure 4, the smooth wavefield snapshots generated by the NN at different times are divided into N h overlapping horizons The T i is denoted as the wavefield snapshots within a time ′ max represent the the minimum and maximum of the time domain of the generated snapshots, respectively, and ∆t denotes the length of horizons.For a candidate combination Θ j (i.e., potential wave equation), the corresponding vector of coefficients ξ i j in horizon T i can be obtained solving equation equ L i,j − ξ i j • equ R i,j = 0, where equ L i,j and equ R i,j are the values of the LHS and RHS terms for a potential wave equation Θ j in horizon T i , respectively.For the combination Θ j , when we obtain all the coefficient vectors in overlapping horizons T i , i = 1, • • •, N h , we can calculate the corresponding redundancy loss as follows: where N term denotes the number of terms, σ j,k and µ j,k represent the standard deviation and mean, respectively, of the N h different coefficients over the overlapping horizons corresponding to the k th function term in the candidate combination Θ j .As can be appreciated in equation (11), the accurate terms are stable in the moving horizons, that is, the coefficients have small standard deviations, resulting in a small redundancy loss.In contrast, the coefficients of redundant terms exhibit a large degree of variation in different horizons, due to the need to compensate for errors caused by noise, which could be different in different horizons.
The physical loss, which is based on PINN [17], is presented to evaluate the accuracy of the discovered wave equation.
Here, we need first train a PINN, which maintains the same architecture as NN shown in Figure 1 and is initialized by the NN parameters θ, while the loss function has the form of with the data loss and the PDE loss where λ d and λ p are hyperparameters, which control the contribution of data and PDE losses to the total loss, respectively.
Here, the data loss comes from the average squared error (MSE) between the observed data and the predicted one from PINN, whereas the PDE loss is obtained by measuring the MSE between the LHS equ After training the PINN, the physical loss for the potential wave equation (i.e., candidate combination Θ j ) can be calculated as follows where ûPINN and ûNN refer to the normalized output of the meta-data predicted by PINN and NN, respectively, which is determined by: where u max and u min denote the maximum and minimum of the observation data, respectively.The utilization of such a form of physical loss is based on the following fact: when physical constraints are consistent with the data, the predicted results will exhibit significant improvements (see Equation ( 12)); However, if the physical constraints and data are not parallel, the performance of PINN will decrease.As a result, if the underlying wave equation can effectively describe the wavefield data, the predicted results of the trained PINN is closer to the NN's output, which is relatively accurate.Hence, the physical loss will be very small.
For each candidate combination Θ j , the PIC is obtained by multiplying the calculated redundancy and physical losses as follows: It is worth noting that the PIC is not performed for all possible combinations in the preliminary potential library, as calculating physical loss is time-consuming.Since the computational cost for redundancy loss is cheap, we first derive all redundancy loss and select top N b combinations with smaller redundancy loss.Following that, we perform the PINN training on the N b combinations and further combine redundancy and physical losses to present PIC.Afterwards, we will discover the correct structure of wave equation with the smallest PIC.

Identifying coefficients
Although we assume that the general structure of the wave equation has been obtained, we still need to determine the coefficients.Certainly, we can obtain the coefficients by directly solving a typically overdetermined system of linear equations (3).For example, we use the SVD method to calculate the coefficients in equations ( 10) and ( 14).However, this solution is not exact, especially for noisy data.In contrast, PINN provides a reliable framework to identify the coefficients.Hence, we use the PINN to obtain the values of the coefficients, which is also initialized by the previously trained network NN (Figure 1), while the loss function is reset to Here, the coefficients ξ are not the output of PINN, and thus, we define it as additional trainable parameters of PINN, which is updated along with network parameters θ to minimize the loss function.We initialize the values for the trainable coefficients ξ from solving the system of linear equations ( 3).Although initially it may not be accurate, it can help PINN converge faster than starting with random initializations.Once we identify the exact coefficients, we can combine it with the corresponding function terms to obtain the general form of the discovered wave equation.

Numerical Examples
To verify the feasibility and effectiveness of D-WE, we present an example in discovering the 2D acoustic wave equation: where we assume that the body force is absent, and v denotes velocity.We consider wave propagation in a homogeneous medium and utilizes finite differences (FD) to generate the dataset.The medium, which we assume has a velocity 2 km/s, is discretized along 101 grid points in both x and z directions with a grid spacing of 10 m.We collect 121 snapshots of the pressure wavefield with a time interval of 2 ms from zero to 0.24 seconds.The wavefield is initiated by an isotropic Gaussian function at the center of the model given by u(i, j, at time zero.In this test, the NN has three hidden layers with 50 neurons in each layer.Refer to [16], the activation function is set as a sine function.The maximum population size of genomes is 400, the maximum number of generations is taken as 100.The NN and PINN are trained by using an Adam optimizer [19].The hyperparameters ϵ (equation ( 10)), λ d , and λ p (equation ( 12)) are set to 10 −6 , 1, and 0.01, respectively.We first provide an example to illustrate the process of our method in discovering the acoustic wave equation from observed pressure wavefields.We randomly select 20% subsets from the complete volume of observed pressure wavefields to train the NN and then utilize them to discover the equation.The NN is trained for 30000 iterations.We simultaneously consider cases where the LHS of the equation is given by a first-order and a second-order time derivatives.We generate the initial library on the LRS of the equation as {[0, 1, 3], [1,1]}, corresponding to the form uu x u xxx + uu z u zzz + u 2 x + u 2 z .By utilizing the GA (as illustrated in Figure 2), the initial library will evolve to produce a overcomplete library, including a lot of candidate function terms.In our case, we limit the number of candidate function terms to 400, which constitutes the maximum population size of genomes.We list the optimal genomes at some generations, where Tables 1 and 2 correspond to the equations with first-order and second-order of time derivatives on the LHS, respectively.The first column represents the number of generation of evolution, the second column indicates the optimal genome and the corresponding translated form of the potential equation, and the final column represents their corresponding fitness scores.
From Tables 1 and 2, we can see that as with the progress in evolution, the optimal genome tends to stabilize.For example, when the LHS of the equation is a first-order of time derivative, the optimal genome is { [2], [0, 0, 2], [0, 1, 2], [0, 1, 3]}, while when the the LHS of the equation corresponds to a second-order of derivative, the optimal genome is { [2], [0, 0, 2], [0, 1, 2], [0, 1, 3]}.However, if we are to stop here and choose the equation form based on fitness scores, it would be { [2], [0, 0, 2], [0, 1, 2], [0, 1, 3]}.Certainly, this does not match the accurate form of the acoustic wave equation.Therefore, as stated earlier, we consider the optimal genome from the GA at the maximum number of generations as a preliminary library.The combinations from this preliminary library can be countable.We can select arbitrary terms to form a potential structure of the wave equation and then utilize a more accurate PIC metrics to determine the exact structure of the equation from all combinations.Tables 3 and 4 display the potential wave equations with the lowest 5 PIC metrics, corresponding to the first-order and second-order time derivatives on the LHS, respectively.By comparing Tables 3 and 4, we can see that the potential wave equation in the form of u tt = ξ 1 (u xx + u zz ) has the lowest PIC values, and as a result, is ultimately identified as the discovered equation structure.
Compared with the form of the acoustic wave equation, it is demonstrated that the equation structure, which is directly discovered from the observed pressure wavefield, is consistent with the corresponding acoustic wave equation.Once we establish the accurate structure of the equation, we further utilize PINN to optimize the coefficient ξ 1 for the terms (u xx + u zz ).Ultimately, we obtain the coefficient for the equation as 3.989 km 2 /s 2 , which has a negligible error with the true value v 2 = 4 km 2 /s 2 .Thus, the discovered general form of the equation is u tt = 3.989(u xx + u zz ).It is worth emphasizing that apart from the training time of the NN, the entire discovery process is highly efficient, costing only a total of 198 seconds.
We then validate the performance of D-WE for discovering the acoustic wave equation on sparse observations.We randomly select a subset of the complete grid point measurements of pressure wavefields, including 60%, 20%, 5%, 1%, and 0.5% of all grid points, and compare results with the discovered form from the complete pressure wavefield points.The results are shown in Table 5, where the error represents relative error between coefficients, which is defined as |ξ i − ξ true i | /ξ true i in percent.As we can see in the table, D-WE enables the discovery of the structure of the acoustic wave equation, even for extremely sparse data (e.g., 0.5% of all grid points).In most cases, the identified coefficients are almost exactly close to true ones, that is, the square of velocity (in this example equals 4), and only for extremely sparse data, the accuracy of the coefficients is slightly reduced, which is acceptable.We employ FD algorithms to numerically solve discovered equations derived from 60%, 5%, and 1% volume data, respectively, and compare the resulting wavefields with those obtained from simulating accurate acoustic wave equation, as depicted in Figure 5.As seen in Figure 5b-e, it is evident that the wavefield snapshots derived from the discovered equation exhibits a remarkably close resemblance to those obtained from accurate acoustic wave equation (Figure 5a) used in generating the observations.Furthermore, even for extremely sparse data, the discovered equation demonstrates a high degree of accuracy in simulating seismic wave propagation (see Figure 5f, g), with only negligible differences.
Table 1: The evolution process of the preliminary library when the left-hand side of the equation is the first-order time derivative.

Number of Genome and Fitness
generations Translation Furthermore, we demonstrate the robustness of D-WE to noisy data, which is presented in Table 6.Here, Gaussian noise is added to the clean data u to obtain the noisy data ũ = u + η • std (u) • N (0, 1), where N (0, 1) denotes the standard normal distribution with mean 0 and standard deviation of 1, and η is the noise level.The results prove that D-WE is reasonably robust to high levels of noise.Surprisingly, D-WE still accurately discovers the structure of the equation for data with strong noise (e.g., 300% and 400% noise level), and limited data.Here, we also numerically solve the discovered equations at noise levels 25%, 100%, and 300%.Figure 6 presents a comparison between the generated wavefield snapshots and their corresponding ground truth (Figure 6a).We can see that our method yields highly accurate equations for observation data with low noise levels (Figure 6b, c, d).As the noise level rises, the wavefield snapshots simulated by the discovered equations show increased signal leakage compared to the ground truth, but the accuracy is still commendable.For observation data with strong noise, such as in Figure 6h, the wavefront is nearly obscured by noise and the continuity is significantly disrupted.However, even in this case, the discovered equation still yields comparable wavefield snapshots to the exact acoustic wave equations, as indicated in Figure 6i, j.
To consider more realistic observation systems, we place all receivers on the grid points of the model boundary, also, use the isotropic Gaussian function at the center of the upper surface of the model to initialize the wavefield, which is illustrated in Figure 7.In Table 7, we present the results of the discovered wave equation under this observation system with different noise levels.Remarkably, even under such restricted observation conditions, our method is still able to accurately identify the structure of the equation and maintain robustness to some extent of noise.However, it should be acknowledged that, compared with random observation systems, the restricted boundary observation can lead to the reduced robustness of our method in identifying equation coefficients in the presence of noise.Moreover, when the noise level is high (e.g., noise level = 75%), our method may discover an incorrect wave equation.This is a predictable outcome given the highly constrained nature of the observation points.
Table 2: The evolution process of the preliminary library when the left-hand side of the equation is the second-order time derivative.

Number of Genome and Fitness
generations Translation Table 3: The potential wave equations and the corresponding PIC when the left-hand side of the equation is the first-order time derivative.

Potential wave equation PIC
Table 4: The potential wave equations and the corresponding PIC when the left-hand side of the equation is the second-order time derivative.

conclusions
We explored a new methodology to discover wave equations in a data-driven way.This novel implementation, dubbed D-WE, is proposed to directly discover a wave equation from spatial-temporal seismic wavefield observations.D-WE consists of two major components: the neural network (NN) and the genetic algorithm (GA).The NN accepts spatial-temporal locations as inputs of the network to approximate the observation displacement or pressure wavefields, which is used for calculating time and spatial derivatives and producing meta-data.On the other hand, GA serves to generate an expandable candidate function terms library to address the problem of an incomplete initial library.The best wave equation is determined from the candidate library by utilizing a physics-informed information criterion.The corresponding coefficients of each term in the optimal form is identified by PINN, which is initialized by the NN.Test on the discovery of the 2D acoustic wave equation demonstrates that D-WE can identify the correct wave equation and is robust to sparse and noisy wavefield data.This conclusion will hopefully pave the way to us utilizing this approach for the discovery of more exotic wave equations that describe wave propagation more inline with our observations.

Figure 1 :
Figure 1: Structure of deep fully connected backpropagation NN.

Figure 2 :
Figure 2: The workflow of the genetic algorithm.

Figure 4 :
Figure 4: An illustration of moving horizon.

′Ll
and RHS ξ ′ • equ ′ R terms of the potential wave equation, which is calculated on the meta-data (x ′ ) generated from the NN.It should be emphasized that the coefficients ξ ′ are deduced by computing equ ′ L − ξ ′ • equ ′ R = 0 during each training process.

Figure 5 :
Figure 5: Comparison of wavefield snapshots simulated by the accurate acoustic wave equation and the discovered equation with different data volumes.(a) Ground truth comes from accurate acoustic wave equation.(b), (d), and (f) correspond to the discovered equations derived from 60%, 5%, and 1% volume data, respectively, and their differences from the ground truth are plotted in (c), (e), and (g), respectively.

Figure 6 :
Figure 6: Comparison of wavefield snapshots simulated by the accurate acoustic wave equation and the discovered equation with different noise levels.(a) Ground truth comes from accurate acoustic wave equation.(b), (e), and (h) are the noisy wavefield data with noise levels of 25%, 100%, and 300%, respectively, which are obtained by adding noise to the ground truth.(c), (f), and (i) are obtained by solving the discover equations with noise levels of 25%, 100%, and 300%, respectively.(d), (g), and (j) are the corresponding differences with the ground truth.

Figure 7 :
Figure 7: Schematic diagram of observation system, where the triangle represents the receivers, and the star represents the position of the center of the isotropic Gaussian function used to initialize the wavefield.
Nx , y Ny , z Nz , t Nt u x x Nx , y Ny , z Nz , t Nt . . .

Table 5 :
Test on discovery of a 2D acoustic wave equation with varying subsets of the total observations.

Table 7 :
Test on discovery of 2D acoustic wave equation from limited observations.