Zero-offset data estimation using CNN for applying 1D full waveform inversion

Full waveform inversion (FWI) in the time domain has limitations due to large computing time and memory requirements. Some studies have addressed this problem by using machine learning techniques. Most FWI studies using machine learning directly estimate the subsurface velocity structure by training the seismic data generated through various synthetic models to obtain the subsurface velocity structure. In this study, we propose a method to convert the common midpoint (CMP) gather to zero-offset data at a CMP location using a convolutional neural network (CNN) to increase the computing efficiency for the FWI. As the training data, we use synthetic data generated by the seismic exploration geometry and the source signature of field data. Since the proposed method performs FWI using the converted zero-offset data, it can be performed more efficiently than FWI using the existing multichannel data. However, it is difficult to apply a seismic exploration geometry and a source signature that has not been used for training. To verify the proposed method, it is applied to a synthetic model not used for learning as well as focused field data. It is confirmed that a proper subsurface velocity structure was obtained.


Introduction
Full waveform inversion (FWI) is a technique used to obtain a high-resolution image of the subsurface using observed data from a seismic survey (Lailly & Bendnar 1983;Tarantola 1984;Pratt et al. 1998;Shin & Min 2006;Ray et al. 2016). However, generally, the FWI often faces unfavorable issues such as a high computing cost, nonlinearity, etc., due to the optimization method being based on a gradient and simulating the full-wave equation (Wang & Rao 2006;Virieux & Operto 2009;Fichtner 2010;Gholami & Siahkoohi 2010;Zhang & Lin 2020).
To improve the nonlinearity between the seismic data and the inverted parameters, Bunks et al. (1995) have pro-posed a multiscale seismic waveform inversion (Wang & Rao 2020), Sun & Alkhalifah (2020) have proposed a machine learning-descent optimization algorithm. Despite recent advances in computing technology, FWI still requires a high computing cost (Sun & Fu 2013). To reduce this high computing cost, Gholami et al. (2019) have used half the Nyquist rate to reduce the memory required for the time axis, and Kim et al. (2021) have used a discrete cosine transform (DCT)-based FWI method to reduce the memory for the spatial axis. In general, seismic data lack the low-frequency components to deal with media with a strong velocity contrast (Fei et al. 2012). Therefore, Shin & Cha (2008) have performed the waveform inversion in the Laplace domain to restore the lack of low-frequency components, and Bozdag et al. (2011) have proposed the envelope inversion method using seismic data with ultra-low-frequency information to recover long wavelength components.
With the development of computing technology, research focusing on the application of machine learning to FWI has been performed (Röth & Tarantola 1994;Araya-Polo et al. 2018;Moseley et al. 2018;Wu & Lin 2018). Recently, machine learning-based FWI methods have been used to derive a model parameter directly from seismic data. For example, Li et al. (2019) and Liu et al. (2021) have directly derived the velocity model from seismic data using deep-learning inversion. Likewise, Wu & McMechan (2019) have proposed FWI with convolutional neural networks (CNNs) to derive the velocity model directly from seismic data. Nevertheless, the derivation of a velocity model from seismic data still presents many challenges. Alternatively, FWI with machine learning has been proposed to perform the inversion by preprocessing and reconstruction of seismic data or model parameters. For example, Sun & Demanet (2020) have performed FWI using extrapolated low-frequency information in seismic data with CNNs, and He & Wang (2021) have carried out reparameterized FWI using deep neural networks. However, estimating the velocity model using machine learning has a problem in that it is difficult to consider various seismic exploration environments such as the exploration geometry and source signature (Fabien-Ouellet & Sarkar 2020;He & Wang 2021). Although various methods for deriving a velocity model using machine learning have been proposed, it is still difficult to overcome the issue of computational efficiency in FWI using large-scale data.
In the conventional processing of seismic reflection data, an image of the subsurface is obtained by stacking data generated by using velocity analysis and normal moveout to the sorted data for the common midpoint (CMP) (Baker 1999;Roberts & Bally 2012). Here, the stacked data are assumed to be the zero-offset data at the normal-incident reflection for line velocity at the CMP (Yilmaz 2001). In general, to increase the accuracy of the inversion results, FWI performs the wave equation simulating the seismic exploration environment for a real field and uses the common shot gathers. If the stacked data are used when performing the FWI, the computational efficiency can be increased by assuming onedimensional (1D) data instead of using the inversion data (Wang 2016). When performing 1D FWI using stacked data, the accuracy is lower than that of the FWI using multichannel data because the effect on the long offset is not considered, and its application for subsurface structures that vary strongly in the horizontal direction is limited. However, the application of 1D FWI using stacked data can improve the computational efficiency and has the advantage that it can be applied to large-scale multichannel data without a massive parallel computer system.
To successfully apply the stacked data to 1D FWI, two problems must be overcome. First, velocity analysis is usually required to generate stacked data. Since velocity analysis is time-consuming and human-intensive work, the technique for converting the CMP gather to the stacked data to minimize the human-intensive work is required . Second, 1D FWI is performed by solving the 1D wave equation with different geometric spreading effects from the observed data; therefore, to perform 1D FWI using stacked data, a geometric spreading correction is needed for the stacked data.
In this paper, we propose a method to convert multichannel data to zero-offset data at the CMP using a CNN by successfully applying the multichannel data to 1D FWI. The CNN architecture was designed to act as a feature extractor to convert the CMP gather to zero-offset data at the CMP. To improve the applicability of the field data for the proposed method, we trained the CNN using synthetic data generated by considering the seismic exploration geometry and estimated the source signature for the field data. We predicted the zero-offset data at the CMP by applying the CMP gather to the trained network and performed 1D FWI using the predicted data.
This paper is organized as follows. First, we describe in detail how we performed the proposed method. Next, we present our application of the trained network to twodimensional (2D) synthetic data and a field data set. Then we confirm that the proposed method can effectively estimate the velocity model from the 2D data set. Finally, we demonstrate that the trained network can be applied to field data obtained on an exploration performed in the Ulleung Basin (offshore of eastern Korea) by the Korea Institute of Geoscience and Mineral Resources (KIGAM).

Methodology
We propose an algorithm for converting the CMP gather to zero-offset data in order to perform the 1D FWI. Here, a CNN was used to convert the CMP gather to zero-offset data. In addition, we generated training data using the seismic exploration geometry and the source signature obtained through source estimation to reflect the characteristics of the field data. Label data were generated through 1D modeling using the line-velocity model for the CMP position.
To generate the training data, the Marmousi2 model, SEG/EAGE Saltdome model and the overthrust model were used (Aminzadeh et al. 1994;Martin et al. 2002). Figure 1 depicts the workflow for the proposed method. The process of the proposed method consisted of preprocessing, source estimation, generating the CMP gathers, DCT compression, network training and FWI. First, preprocessing was performed to increase the signal-to-noise ratio of the seismic data. Second, to generate the training data, source estimation was performed. Third, the training data were compressed using DCT to increase the computational efficiency when learning the network. Fourth, the designed network architecture was trained using compressed training data and label data. Finally, 1D FWI was performed using the predicted zero-offset data obtained by applying the trained network to the field data.
In this study, the applicability of the proposed method was analyzed by focusing on the exploration performed in the Ulleung Basin (offshore of eastern Korea) by KIGAM. In addition, the method was verified by applying the proposed method to the synthetic data generated using the same exploration geometry and source signature.
In the seismic exploration geometry, 1350 sources were excited in the horizontal direction at an interval of 25 m. The recording time was 3 s, and the sampling interval was 1 ms. The multichannel seismic data had 96 channels with a group interval of 12.5 m. The source depth and streamer depth are 5.0 and 7.0 m. The offset for horizontal displacement between source and receiver is 231.25-1425 m. Figure 2 shows the fold values used for the field data. In this study, we converted the CMP gather with 24 folds into zero-offset data. The 1D FWI was performed using the zerooffset data obtained by applying the field data to the trained network. The source signature is generated by averaging the source estimated using the deconvolution method for 1350 shot gathers on the field (Kim et al. 2013). Also, the training data for network training is generated using the same estimated source signatures and the exploration geometry as field data.

DCT compression
Due to the size of the input data during network training, we used DCT compression to reduce the computational burden. The DCT is one of the methods used to apply image compression. We applied the 2D DCT and inverse DCT as follows (Wang 1984): where x and y are the input data and transformed data, respectively, and Nt and Nrec are the time index size and the number of receivers, respectively. As shown in equation (1), the DCT coefficients were obtained by calculating the sum of the cosine functions oscillating at different frequencies. To compress the data using DCT, we used only a portion of the coefficients from the transformed DCT domain. Figure 3 shows the reconstructed data back to its original size after compressing the data into 500 units from 3001 units, which is the time axis, in a 24 × 3001 CMP gather. We compared the traces for the receiver position at 681.25 m to see the difference in the detailed results (figure 4). The reconstructed data and the original data were confirmed to have almost the same amplitude and waveform. In this study, the CMP gathers with 24 folds and 3001 time indexes were used as the training data containing 24 folds and 500 compressed time indexes.

Network architecture for training
To convert the CMP gather to zero-offset data, we designed a network architecture with 17 neural networks, excluding the input layer (figure 5). To obtain the feature of zero-offset data in the CMP gather, we constructed the CNN architecture. Table 1 shows the detailed architecture of the designed network. To extract the features for correction of geometric spreading effect and conversion to zero-offset data from CMP gather, we designed network architecture that includes a convolution 2D layer, average pooling layer and activation layer (ReLU). We include dropout layer and batch normalization layer for avoiding overfitting and improving the network performance. Also, we used root-mean-square error (RMSE) as a loss function. RMSE is defined as follows: where y is the corresponding label data,ŷ is the predicted data,ȳ is the mean of the corresponding label data and N is compressed sample size. The time-series data generated by performing 1D modeling using the line-velocity model at the CMP were used as the label data, and the CMP gather was 42 Figure 5. The proposed CNN architecture for zero-offset data prediction. The blue box on the left is the input layer. used as the input data. By using the label data generated using the same estimated source as the training data, we expected to be able to correct the geometry spreading effect. Figure 6 presents some feature maps used to convert the CMP gather to zero-offset data using the designed network architecture for neural network layers 6, 9 and 12 (shown in Table 1).

Synthetic training model set.
In this study, to train the designed network architecture, we generated training data for various subsurface structure models. The Marmousi2 model, SEG/EAGE salt dome model and overthrust model were used as the synthetic models to generate the training data. The Marmousi2 and the overthrust models were modified to generate the training data according to the subsurface shape. The salt dome model was used by modifying the 2D model for 12 lines extracted from the existing SEG/EAGE 3D salt dome model. The number of models and is 15 for the Mar-mousi2, eight for the overthrust model and 12 for salt dome model. The sizes of the Marmousi2, overthrust and salt dome models were 17.0 × 3.04375, 20.0 × 2.5 and 13.5 × 2.8125 km, respectively, with spacing of 6.25 m in the vertical and horizontal directions.

Generation of trained data and label data.
In this study, we generated the training data using the exploration geometry and the source signature, the same as the field data obtained in the Ulleung Basin (offshore of eastern Korea).
Here, the input and label data of the training dataset were generated by solving the 2D and 1D acoustic wave equations 43 Figure 6. The feature maps to convert the CMP gather to zero-offset data using the designed network architecture for neural network layers 6, 9 and 12 (shown in Table 1).
in the finite-difference time domain. Moreover, 1D modeling was performed using the same source signature and linevelocity model at the CMP to generate the label data.
The source signature was estimated through source estimation using the deconvolution method proposed by Kim et al. (2013). Figure 7 shows the time-series and Fourier spec- tra of the estimated source. We used 96 receivers at intervals of 25 with the same offset to the field data in each model to generate input data set. The number of sources used in each model is 593 for the Marmousi2, 718 for the overthrust model and 461 for the salt dome model. The number of CMP gather and the zero-offset data set used in each model is 8550 for the Marmousi2 model, 5560 for the overthrust model and 5256 for the salt dome model, and the total is 19 366. The sampling interval is 0.5 ms and the recording time is 3 s. We resampled the generated CMP gathers and zero-offset data to sampling intervals, which is the same as the field data. We also compressed the input data from 24 × 3001 to 24 × 500 and the label data from 3001 to 500 using DCT to reduce the computational burden during training. By using the same estimated source to generate the input and label data, we expected to be able to correct the geometry spreading effect. Furthermore, when performing 1D FWI using the predicted zero-offset data, the same source could be used. Finally, the designed network was trained using the input data and label data with the direct wave removed.

Training strategy
In this study, we performed network training using the 'Deep-Learning Toolbox' in MATLAB (Kim 2017). In addition, Gaussian noise between 5 and 30 dB was randomly added to the training data using the 'awgn' command in MATLAB to improve the robustness of the network. Since the dropout layer and batch normalization layer were included in the designed network to reduce overfitting, a validation split to confirm overfitting in total data was applied at a relatively low 10%, and then a test split of 3% was applied. All of the hyperparameters of the network are shown in Table 2. In hyperparameters, we used an initial learning rate of 0.001 for stable training while avoiding overfitting. Since the input data size used for training in this study was very large compared 44 to the size used for training of a general network, we used the benchmark size of 20 considering our computer performance and efficiency of training networks. To obtain the regularization results while avoiding the overfitting problem, a dropout layer ratio of 50% was used according to the proposal by Baldi & Sadowski (2013) and L2 regularization value of 0.0005. We used Adam as the optimizer of the designed network to optimize our RMSE loss function. Adam is a computationally efficient algorithm for first-order gradient-based optimization with few memory requirements (Kingma & Ba 2014). We generally used ReLU activation functions, which are known to facilitate the development of very deep neural networks due to their non-saturating properties (Hahnloser et al. 2000). Finally, we shuffled our dataset at evert-epoch to reduce overfitting and variance. Figure 8 depicts the loss curves for the training data set and validation set with respect each epoch iteration.

Numerical examples
To verify our proposed method, a numerical test using a 2D synthetic dataset was first performed. Second, the field data test was conducted using the benchmarked field data acquired in the Ulleung Basin (offshore eastern Korea). We per- formed the 1D FWI using the zero-offset data in each example. Table 3 shows the hardware specifications of the computer on which the network training and the 1D FWI were executed. In the two cases, the conjugate gradient method (Polyak 1969) was used for local optimization, and the direct method (Pica et al. 1990) was used to calculate the step length.

Numerical test
The proposed method was applied to synthetic data in 2D SEG advanced modeling (SEAM) with a complex salt dome model. The velocity model used for the test was generated by resampling the original 2D SEAM model ( figure 9). The size of the model was 5601 × 526. The 1301 sources were evenly distributed in the horizontal direction at 25-m intervals. The recording time was 3.0 s, and the sampling interval was 0.5 ms. Each source had 96 channels of the streamer type at 12.5-m intervals. First, CMP sorting was performed to apply the trained network to the synthetic data. We used 1278 of the CMP gathers out of all 5112 to consider the same geometry used in the network training. Figure 10 shows the near-offset data in the CMP gathers and the label data obtained using the line-velocity model at the CMP. Figure 11 depicts the predicted data obtained by applying the trained network to the CMP gather. When the predicted data and the label data were compared, it was confirmed that the amplitude phase including the first-arrival traveltime was well predicted for the first-arrival waveform. However, it was demonstrated that the quality of the prediction data was low for the salt dome and   multiple signals. For a more detailed comparison, the timeseries data at CMPs of 5, 15 and 30 km were extracted from the predicted data and the label data ( figure 12). Using the predicted data, it was confirmed that the first-arrival signal was well predicted, but it was difficult to predict an accurate amplitude and waveform information in the subcomplex structure. To evaluate the accuracy of the predicted data obtained from the proposed method, we compared the RMSE Figure 12. The comparison of the label data and the predicted data at a CMP of (a) 5 km, (b) 15 km and (c) 30 km. and the coefficient of determination (R 2 ) between the predicted data and the label data (figure 13). R 2 for evaluation of predicted data is defined as follows: The findings reveal that the proposed method provided relatively stable results when applied to the CMP gather for a relatively simple structure; however, it had a low R 2 value in a complex structure. Figure 14 shows the initial velocity model obtained by applying a smoothing operator to the true model. We performed the 1D FWI using the predicted data shown in figure 11, and the result was obtained in 950 iterations. When Lee et al.   performing the inversion calculation, 20 cores were used, and a computing time of 6439 s was required. Figure 15 depicts the inversion results using the predicted data and the label data. Moreover, figure 16 reveals the summed RMSE curves for all CMPs at each iteration. Additionally, the inverse re-sults for data at CMPs of 5, 15 and 30 km were compared (figure 17). By comparing the 1D FWI results performed using label data and predicted data, we confirm that the trend of the inversion results performed using label data was similar to the predicted data. 47 Figure 16. The RMSE curves for all CMPs at each iteration. The black line indicates the RMSE for 1D FWI using the label data, and the red line indicates the RMSE for 1D FWI using the predicted data in the 2D SEAM model.

Field data test
After verifying the proposed method with a numerical test, we applied the proposed method to 2D marine seismic data. We used CMP gathers with as much as 1327 out of all 5308 to consider the same geometry used in the network training. Figure 18 shows the near-offset data in the CMP gather and the predicted data obtained by applying the trained network to the CMP gather. The predicted data show continuity of seismic events, but abnormal signals in the form of a flat layer were confirmed to appear at a position of 15-27 km (red box). We performed the 1D FWI using the predicted Figure 18. (a) The near-offset data and (b) the predicted data generated using the trained network. data (figure 18b), and the result was obtained in 50 iterations. A velocity model that increases linearly in the depth direction was used as the initial velocity model for FWI. Figure 19 depicts the initial velocity model and inversion results using the predicted data. Furthermore, figure 20 shows the summed RMSE curves for all CMPs at each iteration.  When performing the inversion calculation, 20 cores were used and a computing time of 8186 s was required.

Discussion
The proposed method has some weaknesses. First, a lot of computing time was required to generate the training data. Second, it was difficult to predict the complex subsurface structures using the CMP gather. Third, in the case of the velocity model obtained through 1D FWI using the predicted data, there might have been discontinuities with the velocity model at the CMP axis since this model used 1D timeseries data. Fourth, the training data were generated considering 2D geometry spreading, so training data considering 3D geometry spreading are required. The proposed method can be applied to the seismic exploration data acquired in the same exploration geometry and source signature, but if not, it is thought that the reliability in the application will be lower.

Conclusions
We propose a zero-offset data conversion method by applying the CMP gather to 1D FWI using a CNN. In this paper, a network architecture was designed by focusing on the field data acquired from the Ulleung Basin in the East Sea, and the designed network was verified through synthetic data testing. As a result of applying the synthetic data, it was confirmed that the prediction was difficult for the complex structure model, but the result with the simple structure model was similar to the label data. By comparing the two results of the FWI using the predicted and label data in the synthetic data test, we demonstrated that reasonable results could be obtained when the results found by applying the trained network to data not used for training were used.
In addition, we predicted the zero-offset data by applying the trained network to field data and performed 1D FWI using the predicted data. The predicted data showed the continuity of seismic events well, but it was confirmed that abnormal signals appeared in certain areas. Importantly, when performing 1D FWI using predicted data on field data, we obtained a velocity model over a wide area with a low computing cost.
In conclusion, when using the method proposed in this study, we generated a reasonable velocity model with a low computing cost through 1D FWI using predicted data. Also, we could expect the generated velocity model to be used as an initial velocity model for 2D FWI. In the case of a trained network, it is anticipated that it will be possible to provide higher quality results by improving the training data generated from various velocity models through transfer learning. 49