Discontinuous finite element method for efficient three-dimensional elastic wave simulation


 The existing discontinuous Galerkin (DG) finite element method (FEM) for the numerical simulation of elastic wave propagation is primarily implemented in two dimensions. Here, a discontinuous FEM (DFEM) for efficient three-dimensional (3D) elastic wave simulation is presented. First, the velocity–stress equations of 3D elastic waves in isotropic media are transformed into first-order coefficient-changed partial differential equations. A DG discretisation method for wave field values on a unit boundary is then defined using the local Lax–Friedrichs flux format. The equations are first transformed into equivalent integral equations, and subsequently into a spatial semi-discrete ordinary differential equation system using a hierarchical orthogonal basis function. The DFEM is extended to an arbitrary high-order accuracy in the time domain using the exponential integrator technique and the explicit optimal strong-stability-preserving Runge–Kutta method. Finally, an efficient method for selecting the calculation area of the geometry of the current shot record is realised. For the computation, a multi-node parallelism with improved resource utilisation and parallelisation efficiency is implemented. The numerical results show that the proposed method can improve both the accuracy of the simulation and the efficiency of the calculation compared with existing methods.

used a method transforming an undulating surface into a horizontal surface, to obtain a solution for the elastic wave equation. Compared with the FDM, however, the FEM offers higher accuracy and presents lower dispersion and a Courant-Friedrichs-Lewy (CFL) condition at high and mixed orders (Abdulkadir 2015).
Thus far, two efficient high-order FEMs have been widely used in the forward modelling of wave equations: the SEM and the discontinuous Galerkin (DG) FEM (DG-FEM). The SEM is a high-order FEM that incorporates spectral expansion into the FEM. The 3D high-order DG-FEM and SEM perform well when applied to the forward modelling of wave equations. Komatitsch & Vilotte (1998) used the SEM for the numerical simulation of elastic waves and for two-dimensional (2D) and 3D seismic wave propagation on an undulating surface. However, the SEM generally requires quadrilateral and hexahedral grids, which are difficult to split in complex undulating surfaces and media. Moreover, it is difficult for the SEM to adapt to strongly undulating surfaces and complex underground structures.
In the development of the DG-FEM method, Cockburn & Shu (1989), Cockburn et al. (2001Cockburn et al. ( , 2004 combined DG with the Runge-Kutta (RK) discrete-time format and developed the RKDG method. Dumbser & Kaser (2006) used the discrete-time scheme of an arbitrary higher-order derivative in combination with the DG to solve the elastic wave velocity-stress equation. The arbitrary higher-order derivative format is based on the Taylor expansion of the wave field on time variables, and then uses the Cauchy-Kovalewski process to replace the time derivatives with spatial derivatives. Therefore, this method can achieve any higher-order accuracy in time and space. Dumbser & Kaser (2006) subsequently conducted elastic wave numerical simulations in 3D media, accounting for viscoelasticity and anisotropy (De la Puente et al. 2007) and developed a non-deterministic gridsolving method (Hermann et al. 2011). Etienne et al. (2009Etienne et al. ( , 2010 solved the elastic wave equation based on the DG of the central numerical flow (CNF). Because the method is based on the CNF, updating the velocity field is only related to the stress field, and vice versa. As such, it is well suited to the leapfrog time discretisation format. Brossier et al. (2010), Brossier (2011) and Etienne et al. (2009) used a low-order DG to solve 2D-and 3D-time-domain elastic wave equations, respectively. They found that this method can be used in inversion (especially for complex surface conditions). Xue et al. (2014) studied the DG finite element numerical simulation of a 2D undulating surface and obtained high-precision elastic wave simulation results. Their approach includes six steps: (i) the local Lax-Friedrichs numerical flux is used to obtain the finite elements; (ii) the Koornwinder-Dubiner polynomial, with orthogonality on the triangular mesh, is used to derive the semi-discrete space; (iii) the exponential function of the extended matrix is solved and simplified via the RK scheme; (iv) the space and time discretisation schemes are obtained; (v) the source, freesurface conditions and absorbing boundary conditions of complex frequency-shifted non-splitting perfectly-matchedlayer (Xue et al. 2014) are added and (vi) the 2D simulation is performed.
In seismic exploration applications, 3D elastic wave numerical simulations are primarily based on the FDM because the DG-FEM has a high computational cost and incurs a high application barrier. Section 2 of this paper describes the DG discretisation of a 2D elastic wave extended to a 3D case by Xue et al. (2014). The mass matrix in the tetrahedral element is subsequently transformed into an identity matrix based on a hierarchical orthogonal basis function, which does not require inversion in the time-step iteration. The corresponding stiffness matrix is well posed and can be extended to high-order accuracy in a spatial format. In the discrete-time format, the exponential integrator is used to transform the linear non-homogeneous differential equation into a matrix exponential form to avoid solving the inverse of a large matrix (Hochbruck & Ostermann 2010;Al-Mohy & Higham 2011). Subsequently, the explicit optimal strong-stabilitypreserving RK method (SSP-RK) (Gottlieb et al. 2008) is used to simplify and solve the matrix vector multiplication in matrix exponential form. In long time steps and high-speed small meshes, the CFL condition allows the algorithm to adapt effectively to various tetrahedral sizes; it can also be extended to achieve mixed-order accuracy in the time domain.
According to the spatial and temporal locality of the source influence, a calculation method is also implemented in this study to select the calculation area within the scope of the geometry corresponding to the current source. Based on a multi-node approach and its algorithm characteristics, the parallel efficiency of the algorithm is further improved using a fine-grained parallel optimisation method (namely, the optimisation of the stencil calculation model).
Section 3 presents the verification of the effectiveness and parallel efficiency of the method using a simple monoclinalstratum undulating surface model. The simplified SEG Advanced Modelling Program (SEAM) model is used to further verify the effectiveness and parallel efficiency of the method in the complex undulating surface model. Concluding remarks are presented in Section 4.

DG-FEM method for 3D first-order velocity-stress elastic wave equation
The 3D first-order velocity-stress elastic wave equation can be expressed as follows: 99 where the velocity-stress vector is v y , v z , zz , yy , xx , xy , xz , yz v represents the velocity in the direction indicated by the subscript (x, y and z directions, respectively), represents the stress in each of the six directions, as indicated by the sub- is the source vector and A 1 , A 2 , and A 3 are the elastic parameter matrices at the space coordinates ⇀ x =[x, y, z] T , which are In matrices (3), (4) and (5), is the medium density, and and are the Lamé constants. Let the solution region of equation (1) be . The finite element is used to discretise the solution region, which is then decomposed into non-overlapping sub-regions. The experimental function space on the discontinuous finite element is V h = { ∈ L 2 (Ω) : | Ω i ∈ P j (Ω i )}, where P j (Ω i ) is a polynomial of degree j over the Ω i region (element). The dis-continuous finite element method (DFEM) uses an elementby-element smooth approximation of the wave field; the approximation function does not need to be continuous at the element boundary. The following weak form is obtained by multiplying both sides of equation (1) by the weight function i and integrating on the i th element over the Ω i region (element): Using Gauss's divergence theorem, it can be inferred that Using the divergence operator ∇⋅ and gradient operator ∇, equation (6) can be rewritten as: (8) may be discontinuous at the boundary Ω i ; therefore, the integrand function term ⃖ ⃑ F = (⃖ ⃑ n ⋅ ⃖ ⃑ B)u i on Ω i must be replaced by the numerical flow (flux) (8) can be rewritten as: When the high-order DG-FEM is used, the order of the experimental function is high, and the selection of the flux ⌢ ⃖ ⃑ F has little influence on the numerical results (Kosloff & Baysal 1982). Therefore, the local Lax-Friedrichs flux scheme is used to define the wave field value on the boundary of the element, defined as 100 Figure 1. Mapping of an arbitrary tetrahedral element Ω i to a reference element Ω c in the physical space.
where ⃖ ⃑ n = [n 1 , n 2 , n 3 ] T , (u i ) int and (u i ) ext are the wave fields at the boundary of the normal vector ⃖ ⃑ n outside the ele- Thus, according to the condition u i = (u i ) int on the boundary, equation (9) can be rewritten as where the numerical circulation matrix is Q = 1 2 (n 1 A 1 + n 2 A 2 + n 3 A 3 + CI).
After constructing the DG-FEM for the 3D first-order elastic wave equation, a fully discrete discontinuous finite element equation system is obtained by discretisation in space and time.

Spatial discrete equation
In this study, tetrahedral elements with a strong boundaryfitting ability were selected for the discretisation. The arbitrariness of the tetrahedron on the unstructured mesh introduces complexity to the selection of the basis function on the element Ω i .
In addition, an arbitrary tetrahedral element Ω i is mapped to Ω c in figure 1, and the mapping between any tetrahedral element and the reference element in physical space is as follows: Using this mapping relation, the volume-element transformation formula can be obtained as dxdydz A set of hierarchical orthogonal basis functions is then selected on the reference element Ω c for the numerical simulation. The orthogonality of the basis function makes the mass matrix a unit diagonal matrix, which does not need to be inverted in the time-step iteration. The corresponding stiffness matrix is well posed; therefore, the computational complexity of the numerical simulation is significantly reduced.
A set of basis functions of order p includes polynomials from order 0 to q, composed of (q + 1)(q + 2)(q + 3) ∕ 6 shape functions in total. The specific format of the set of basis functions is as follows (Xin & Cai 2012): The m th component u i m (r, t) of the wave field u i can be expressed as follows: where r is the coordinate of the reference element Ω c , where N is the number of polynomial basis functions defined by equation (12). Assuming that the elastic parameters Ω i of the medium in the element Ω i are constant, a system of semi-discrete ordinary differential equations (ODEs) can be obtained by substituting equation (16) into equation (12): The numerical flux matrix of the four triangular surfaces of the element Ω i is Q m (m = {1, 2, 3, 4}). The calculation formulas for the element Ω i mass matrix M and stiffness matrix S x p are and respectively. The formulas to calculate the surface quality matrix of the four triangles of element Ω i are and where r ext is the coordinate of the outer boundary of the reference element Ω c . The degrees of freedom of each element areũ = [ṽ i x ,ṽ i y ,ṽ i z ,̃i zz ,̃i yy ,̃i xx ,̃i xy ,̃i xz ,̃i yz ] T and the dimensions are 9N × 1.ũ i is the undetermined coefficient of the internal wave field of the first element,ũ i 1 ,ũ i 2 ,ũ i 3 ,ũ i 4 are the undetermined coefficients of the wave field between the element Ω i and the four adjacent elements. The hierarchical orthogonal basis function selected in this study can be obtained from M as an identity matrix, which can be omitted from equation (17). The explicit formula of the specific zeroto fourth-order basis functions is shown in Yang et al. (2017); this can be omitted from equation (17).

Discrete-time scheme
The next step is to solve the ODE system by adding the source term to the left of equation (17). To simplify the ODE system, it is assumed that L is the n × n global space discrete operator, f (t) is the source integral vector and U(t) is the vector of the wave field coefficient {u m (t)} n m=1 of dimension n × 1. The parameter t is omitted. Equation (17) can be expressed as Implicit and explicit discrete-time schemes can be used to solve equation (22) of the (super-large) ODE system. When using explicit time schemes such as the RK scheme to directly solve equation (22), the time step decreases with an increase in the order. Moreover, when there are small, high-speed grids, a shorter time step is required owing to the CFL conditions. The implicit time scheme is used to solve equation (22) directly. Considering the global super-large linear operator L, formed by the 3D spatial discrete scheme, the calculation of the inverse of the matrix is time-consuming. Therefore, an exponential integrator technique is used to construct the explicit discrete scheme of equation (22) as it can adopt a long time step and be extended to high-order accuracy. The specific implementation process is described in the following paragraphs.
Theoretically, the analytical solution of equation (22) satisfies the nonlinear integral equation where t 0 is the initial time of the wave field updating and U 0 is the initial wave field value. The time discretisation scheme can be obtained using a Taylor series f (t) expansion at t 0 . However, this method involves solving the inverse of a large matrix. To avoid inversion, the matrix index proposed by Al-Mohy & Higham (2011) has been applied to the exponential integrator method, and the p-order Taylor series is used to expand f (t). Equation (23) can then be transformed into a matrix exponential function: . Furthermore, the final form of equation (23) can be expressed as where the dimensions of the unit matrix I n , U 0 , and L are n × n, (n + p) × 1, and (n + p) × (n + p), respectively. The , the dimension of the identity matrix I p−1 is (p − 1) × (p − 1), the p th unit vector of which the element is not zero is e p and the numerical stability coef- To solve equation (25), the matrix exponential vec- ] must be calculated.
Generally, the Krylov subspace approximation method and the Chebyshev-Laguerre polynomial expansion method are used. The core of the solution is matrix vector multiplication. In this study, equation (25) is used as the solution to the homogeneous differential equation: with the initial conditionŨ 0 = [ U 0 0 −1 e p ] to construct the discrete-time scheme. Gottlieb (2005) studied the time scheme discretisation of linear and nonlinear ODEs formed by the SSP-RK in a semidiscrete space scheme. Linearity was shown to be a special case of nonlinearity in this method. Therefore, equation (26) satisfies the conditions of the SSP-RK and can be extended to any order of accuracy. In this study, the optimal explicit SSP-RK (Gottlieb et al. 2008) is used to solve equation (26), and the lower-order optimal explicit SSP-RK (m, m) is obtained: It is important to note that because the spatial format is compact and explicit, and equation (27) involves a matrix vector multiplication (which is precisely the spatial discretisation process of the DG-FEM), it is not necessary to explicitly provide the global large matrixL. Thus, the wave field can be updated by one unit at a time without forming a whole set of finite element ODEs. This updating of the final-element independent wave field makes large-scale parallelisation easier and more efficient.
Finally, if the source and boundary conditions are considered, the complete equation to be solved can be obtained. See Appendix A for its derivation.

Multi-node parallelisation
It can be deduced from equation (17) that an update of the wave field on the i th unit is only related to the wave field of that unit and the four adjacent units. The solution of equation (27) can be independent of each discretised unit (based on the aforementioned characteristics of this method), thereby reducing the computational burden for computing the whole model for each shot, as done in a traditional calculation. In the implementation of the algorithm, the calculation area of each shot is first determined to reduce the unnecessary waste of computing resources; then, the wave field data of each subcalculation area are rearranged to achieve load balance and efficient communication. The specific steps of the method are described as follows.
(1) The geometric information structure of the entire model is provided as an input, and the points and connected edges of each boundary block in the model are calculated to obtain a single connected region.
(2) The geometrical domain of each shot is calculated. Four vertical cutting planes are selected in the inline and crossline directions, and the four vertical lines in the hexahedron composed of these four cutting planes are taken as the cutting positions. (3) According to the cutting positions of the four vertical lines, the cutting separation algorithm of the selfintersection polygon is used to evaluate each polygon with respect to the four cutting planes. Then, one or more new simply connected polygons are identified. The entire boundary is processed simultaneously to avoid the occurrence of small angles in the polygons.
The new geometry information is acquired and, finally, part of the model is formed. (4) A tetrahedral mesh generator is used to generate tetrahedral mesh elements according to the information on the geometric structure of some models. Finally, these elements are written to a file for the current shot calculation.
Load balancing and efficient data communication between nodes are conducted for some models in each shot. Using the stencil computing mode optimisation method (Holewinski et al. 2012), the program characteristics, memory access, calculation and communication are realised. The method consists of the following steps. (1) Computational optimisation analysis, which eliminates redundant computing and uses a multi-threading programming model to fully exploit the computer multicore architecture.
(2) Communication optimisation analysis for communication and computing coverage, including the reduction of synchronisation.
(3) Optimisation analysis of memory access to eliminate unnecessary operations and reduce the actual memory usage.
Using the method developed in this study, the boundaries of the sub-regions are redrawn, and each process is given an approximately equal workload through efficient asynchronous communication. The ParMetis grid partition tool is used to divide the sub-regions so as to include a similar number of elements for the tetrahedral meshing of some models. Each block is assigned to a process to achieve optimal load balance for parallel computing. After the domain decomposition of the mesh is complete, the grid-element attributes of each sub-region are recalculated. The specific calculation steps are as follows: (1) Each process reads the information on the corresponding sub-region and resets the ID from 1 to that of each unit in the region.
(2) The node of each tetrahedral element in the sub-region is given a number starting from 1. Subsequently, according to the data communication number between adjacent sub-regions, data transmission between them is carried out. Specifically, the proposed method allocates two vectors, sendData and recvData, for each current sub-area block. First, all data are extracted to transmit from the sub-region data to the sendData array are arranged according to the process number (equal to the subarea number) of the domain to be received. The data are sent in the asynchronous transmission mode. Second, the current sub-region receives data asynchronously into the recvData array, according to the sequence of the process number in the first step; the recvData array is then rearranged into the usedData array, which is used by the current sub-region, according to the data storage order of the current sub-region (sortRecvData array). The process is illustrated in figure 2.
According to the load-balancing and communication modes of the computing nodes, the following design for the multi-node parallel DG simulation algorithm flow is proposed.
Algorithm: load balancing, asynchronous communication algorithm of boundary data. Input: mesh vertex data, mesh cell data, mesh attribute (S-wave velocity/P-wave velocity/density) data, algorithm parameter data. Output: wave field data.
while(1) { //shot cycle Initial wave field Generating source vectors for (i=0; i<numStep; ++i) {//Steps of the current shot simulation calculation 1) The source information of the current time point (t) is added. 2) The stress value at the t -1 moment of the connected boundary of the sub-region is asynchronously received and sent.
3) The velocity value of the wave field simulation at time t in the sub-region is calculated. 4) The velocity value of the wave field simulation at time t on the free boundary of the sub-region is calculated. 5) The stress value at the t -1 time of all connected boundaries of the sub-regions of the current node is obtained.
6) The velocity value of the wave field simulation at time t of the connected boundary of the subregion is calculated. 7) The velocity value of the global wave field is updated. 8) The velocity value at time t -1 of the connected boundary of the sub-regions is asynchronously received and sent. 9) The stress value of the wave field simulation at time t in the sub-region is calculated. 10) The stress value of the wave field simulation at time t on the free boundary of the sub-region is calculated. 11) The velocity value at time t -1 of all connected boundaries of the sub-regions of the current node is obtained. 12) The stress value of the connected boundary of the sub-region at the current time t is calculated. 13) The stress value of the global wave field is updated.
} 14) The shot set records are stored.

}
The parallel asynchronous communication algorithm proposed in this study is such that the calculation of the velocity value of the connected boundary at time t only depends on the stress value of the connected boundary at time t -1. Asynchronous transmission is conducted in step (2) of the algorithm; therefore, the calculation of steps (3) and (4) and the transmission of step (2) occur simultaneously. Owing to the intensive computing in steps (3) and (4), when the multi-process of the algorithm is executed to step (5), all the computing processes in the computational domain have completed data reception. Therefore, synchronisation time is not consumed.
Similarly, the calculation of the stress value of the connected boundary at time t only depends on the velocity of the connected boundary at time t -1. Asynchronous transmission is conducted in step (8) of the algorithm; therefore, the calculation of steps (9) and (10) and the transmission of step (8) are conducted simultaneously. Owing to the intensive computing of steps (9) and (10), when the multi-process of the algorithm is executed to step (11), all the computing processes in the computational domain have completed data reception. Therefore, once again, synchronisation time is not consumed. Finally, in the computing operations of the internal matrix and vector, an efficient multi-thread parallelisation is used to further accelerate and improve the computational efficiency of the algorithm.

Numerical simulation experiment
In this study, a single-layer slanted surface-relief model is used to demonstrate the high-precision and parallel efficiency of the method under the DG mixed order. By simulating such a complex model, the promising adaptability and parallel efficiency of the proposed method is showcased.

Single-layer slanted surface-relief model
Comparisons are performed as follows: (i) the result of the DFEM simulation is compared with the analytical solution; (ii) the result of the FDM simulation (Dablain 1986, Crase 1990) of Tesseral (TR) is compared with the analytical solution. The calculation time of the DG is shorter when the DFEM and TR models have the same accuracy, whereas the DG accuracy is higher when the DFEM and TR models have the same calculation time. This indicates that the DG method proposed in this study is more advantageous than the TR. The specific implementation steps are listed next.
First, a single-layer slanted surface-relief model is constructed, as shown in figure 3. The specific parameters are as follows: (i) the length, width, and height of the model are 4000, 4000 and 6000 m, respectively, and the slant angle of the model is approximately 20°; (ii) the dominant frequency of the Ricker wavelet source is set at 25 Hz (Wang 2015); (iii) V p and V s are 3789 and 2185 m s −1 , respectively, and the density is 2319 kg m −3 ; (iv) the source location is at x = 2000, y = 2000, z = 10 m, the receiver range is 2100-4000 m, the spacing of each geophone is 200 m and the receiver operates on three lines.
To compare the performance of this method with the analytical solution and the FDM of TR, the same computing resources are used and the simulation results at x = 2000, y = 2000 m are compared. According to the vertical velocity component, shown in figure 3, the DG solution (blue) and analytical solution (red) of this method in the P3 order match well for both direct and surface waves.  The vertical velocity components obtained with the DG method for different spatial orders under the same grid density are compared, as shown in figures 4 and 5. It is shown that the amplitude and phase of the direct wave are consistent with those of the P2 and P3 order simulation results, and the phase of the surface wave is approximately the same. The amplitude, however, is attenuated. The phase of the P1 order direct wave is consistent with that of the P3 order, while the amplitude is slightly attenuated. However, the phase frequency of the surface wave is clearly different, and the amplitude is also significantly attenuated. This indicates the feasibility of a mixed-order P3P2 or P2P1 when a highorder precision simulation is not sought and the grid remains unchanged.
By comparing and analysing the TR solution (black) and the analytical solution (figures 6, 7 and 8), it is observed that the phase information of the TR direct wave matches well for different grid sizes; however, when the grid size is 2.5 m, the amplitude of the TR direct wave attenuates with an offset. For surface waves, when the grid size is 5 m, the phase and amplitude of the TR surface wave are clearly mismatched. Even for a grid size of 2.5 m, the TR method can only provide the approximate shape of the surface wave. When the grid size is reduced to 1 m, the TR solution approaches the analytical one.
In this study, the calculation efficiency of the DG-FEM and TR FDM are compared using a single-shot simulation of 4 ms sampling and 6 s duration. Table 1 shows that for a grid      9. Simplified SEAM block model. a step of 322 s is adopted to achieve a stable simulation; this takes approximately 3234.962 s. When the mixed P3P2 order is adopted, the time step can be increased to 480 s, and the total time is 2115.158 s. Compared with the single P3 simulation, the time consumption is significantly reduced; however, the accuracy is not improved. The calculation time of the P2 order is 1454.026 s, which is equivalent to that of TR with a 2.5-m grid size. However, figures 2-8 show that the accuracy of the P2 order DG-FEM method is higher. The efficiency can be further improved by using a P2P1 mixed order. This shows that the DG-FEM method can achieve higher simulation accuracy than the TR FDM in the case of a relief surface with a lower or equivalent calculation time.

Example of complex model
Based on the characteristics of the SEAM Foothills and SEG-Y discrete models, an imitation model with 78 horizons/faults and 76 blocks (i.e. a simplified SEAM block model, as shown in figure 9) is constructed using SeisModel (SINOPEC) software. The original SEG-Y model is used to fill the surface of the imitation model with attributes. A tetrahedral model with a side-length velocity ratio of less than 0.2 is established using the topologically consistent tetrahedral division method. The resulting total number of tetrahedrons is approximately 200 million. Figure 10 shows that the network model is characterised by a good body fitting and a variable mesh size.  The following geometrical characteristics are established: (i) the shot point coordinates are Y = 9980, X = 400-14 600 m, the shot interval is 40 m, and the total number of shots is 356; (ii) the coordinates of the three survey lines are Y = 9880, 10 000 and 10 120 m, X = 400-14 600 m, the interval is 20 m and each survey line has 711 receiver points; (iii) according to the arrangement sliding rule, each shot is received on all receiving points of the three lines, that is, the geophone does not move with the shot point; (iv) a Ricker wavelet is selected and its main frequency is set at 20 Hz, with 60 ms delay in the phase. In addition, the P1P2 mixed order is chosen, with a total calculation time of 6 s and a DT (time step) of 200 s. In the complex model, the time comparison of the multishot simulation parallel computing is an important index for measuring parallel efficiency. The process number of the multi-shot simulation is 2400 for a CPU utilisation rate of more than 96% (effectively reaching a full load) in the experiment using 200 CPUs and 2400 cores. The parallel computing times for 1200,1440,1680,1920,2160 and 2400 processes are compared in this hardware condition. As shown in figure 11, the actual acceleration curve (orange) approaches the ideal acceleration curve (light blue) before the CPU reaches its full load. This shows that the parallel strategy can effectively hide the non-computational time and overlap the computation and communication well. Thus, it is concluded that the proposed parallel algorithm is highly efficient.
The migration section is evaluated through static correction, velocity modelling and depth-domain imaging (figures 12-14), from which it is concluded that the migration section is highly consistent with the structure and fault of the model.

Conclusion
In this study, a 3D elastic wave DFEM method based on multi-nodes is proposed. In this method, the wave field values on the boundary of a unit are defined using the local Lax-Friedrichs flux format. The DG method is used to discretise the model space, and the explicit optimal SSP-RK method is used to obtain arbitrary high-order accuracy. In addition, owing to the high accuracy of the simulation fitting, the adaptive tetrahedron splitting method is adopted, which can avoid numerical dispersion and other challenges. It is found that the DG-FEM can effectively simulate the propagation of elastic waves for any complex undulating surface using the adaptive tetrahedron splitting method. More importantly, in the design of the 3D parallel computing algorithm, the effective computational domain, input-output overlap and optimisation of computing and communication are fully considered to ensure load balancing during parallel computing and improve the computational efficiency. The experimental results show that the simulation algorithm presented in this study demonstrates good parallel efficiency and simulation ability when applied to a complex model, therefore the proposed method is effective and practical. deep learning' (grant no. 41930112), the Natural Science Foundation of China (grant no. 41704052), and Science for Earthquake Resilience (grant no. XH18017Y).