Low-frequency noise suppression for desert seismic data based on a wide inference network

The importance of seismic exploration has been recognized by geophysicists. At present, low-frequency noise usually exists in seismic exploration, especially in desert seismic records. This low-frequency noise shares the same frequency band with effective signals. This leads to the limitation or failure of traditional methods. In order to overcome the shortcomings of traditional denoising methods, we propose a novel desert seismic data denoising method based on a Wide Inference Network (WIN). The WIN aims to minimize the error between the prediction and target by residual learning during training, and it can obtain a set of optimal parameters, such as weights and biases. In this article, we construct a high-quality training set for a desert seismic record and this ensures the effective training of a WIN. In this way, each layer of the trained WIN can automatically extract a set of time–space characteristics without manual adjustment. These characteristics are transmitted layer by layer. Finally, they are utilized to extract effective signals. To verify the effectiveness of the WIN, we apply it to synthetic and real desert seismic records, respectively. In addition, we compare WIN with f − x deconvolution, variational mode decomposition (VMD) and shearlet transform. The results show that WIN has the best denoising performance in suppressing low-frequency noise and preserving effective signals.


Introduction
The recovery of effective signals plays an important role in seismic exploration. The existence of strong noise seriously affects the extraction of effective signals and makes seismic signal processing extremely difficult (Zhang & Van der Baan 2018a;Oropeza & Sacchi 2011). In desert seismic data, low-frequency noise has extremely complex characteristics (Li et al. 2017). First, it has strong energy, which disturbs the temporal and spatial characteristics of effective signals. In addition, the differences between effective signals and noise in most transform domains are also smaller due to the high amplitude of noise. Second, both the effec-tive signal and noise in desert seismic data are characterized as low frequency. Traditional denoising methods have limitations or become ineffective when the effective signals and noise have an identical frequency band. Therefore, noise reduction has become one of the more arduous tasks in desert seismic data processing.
Over the past few decades, many denoising methods have been proposed according to the characteristics of seismic noise. For example: f − x deconvolution (Abma 1995), the empirical-mode decomposition (EMD) filter (Kopsinis & McLaughlin 2009), curvelet transform (Hennenfent et al. 2010), singular value decomposition filter (Gan et al. 2015), time-frequency peak filter (Zhang et al. 2015), shearlet transform (Tang et al. 2018;Zhang & Van der Baan 2018b) and so on. The goal of traditional methods is to extract effective signals from noisy seismic records by utilizing the differences of effective signals or noise in various transform domains. The traditional denoising methods mentioned have played a positive role in dealing with random noise attenuation in medium or shallow exploration. However, desert seismic data is seriously contaminated by various noise due to complex surface and near-surface geological conditions. Traditional denoising methods rely too much on the model assumptions of signals and noise, which makes it difficult to excavate the internal relationship among the large seismic data. In addition, traditional methods often face the problem of manually selecting parameters during denoising processing.
In the past few years, the application of convolutional neural network (CNN) models can be found in various fields. For example, there are successful examples in seismic exploration, such as automatic classification (Yuan et al. 2018;Serfaty et al. 2017;Lin et al. 2018), automatic detection of fracture characteristics (Huang et al. 2017) and automatic recognition (Titos et al. 2018). CNN does not need to select features manually, and the trained CNN also avoids the problem of manual parameter adjustment. Compared with conventional methods, CNN is more suitable for large data processing and learning hidden relationships in data. In this paper, we concentrate on low-frequency noise attenuation in desert seismic data with a low SNR by using a Wide Inference Network (WIN). The WIN (Liu & Fang 2017) is a CNN-based denoising neural network. It has been proved that the WIN has the ability to learn and construct a model of complex non-linear relations because it has the ability to learn hidden relationships in data. The key of our proposed method is constructing a high-quality training set, which ensures the effectiveness of training. Therefore, we use three representative synthetic models of seismic event, the mixed-phase wavelet, zero-phase wavelet and Ricker wavelet, to generate training data for this network. This can generate a better generalization capability for new and invisible noisy records. A WIN with five wide convolution layers can accomplish the main denoising task of low-frequency noise by using our training set. The WIN not only obtains better performance in low-frequency noise attenuation and signal preservation, but is also a new strategy of noise attenuation for a desert seismic record. In addition, we compare a WIN with f − x deconvolution, variational mode decomposition (VMD) and shearlet transform. The results show that the WIN can more effectively reduce lowfrequency noise and preserve effective signals in a situation of a low SNR.

Network structure
The WIN is typically trained with supervised learning to represent a function for recovery. Its denoising network structure is shown in figure 1. We will introduce the network structure by introducing the structure of each layer, learning strategies and regularization methods. Here, we briefly describe the concept of the 'width' of a network (Liu & Fang 2017), which includes the depth of network (L), the number of convolution kernels (K) and the size of convolution kernels (F). These three parameters are the main parameters for evaluating network performance, L , K and F , which represent the non-linear expression ability of each convolution layer. As shown in figure 1, the structure of the WIN has two different types of layer: (1) Conv + BN + Relu: for layers 1-4, Figure 1. The network structure diagram of a WIN. 802 K = 128 convolution kernels of size F = 7 × 7 . Thus, each convolution layer of the WIN has lots of channels and a large receptive field. This makes it easier to learn hidden relationships in data. (2) Conv + BN: for the last layer, K = 1 convolution kernels of size F = 7 × 7 . The purpose of this is to give the output data the same dimensions as the input data. Conv is the convolution layer, BN denotes Batch Normalization and Relu denotes the activation function.
2.1.1. Residual learning. The training of neural networks is prone to gradient dispersion that fundamentally hinders the convergence of neural network training. Residual learning is a good way to avoid this problem (He et al. 2016). It can overcome the shortcomings of network degradation well, which may happen during network training. Moreover, the network is easier to optimize by using residual learning. Meanwhile, it can improve accuracy and enhance network performance. In this paper, we introduce the residual representation by adopting skip connection (Kim et al. 2016). Skip connection can avoid the problem of gradient disappearance in network training and perform residual learning at the same time. (Ioffe & Szegedy 2015) is performed between convolution layer and Relu layer. In this way, we guarantee that the values of the training data are all on the same magnitude, which makes the values more stable during network training. During the process of network training, if the distribution of input data changes in one layer, then these changes will be accumulated and expanded in the latter layers. As a result, the network model needs to constantly adapt to learning a new data distribution. This problem can be avoided by normalizing each small batch of data. Furthermore, the convergence rate of network can be effectively improved by BN.

Denoising theory of WIN
The noise in a desert area is highly complex, including the characteristics of non-stationary, non-Gaussian, non-Uniform phase-shifting. Here, we use the letter n to represent low-frequency desert noise. Thus, the desert seismic record can be expressed as: where y is a desert seismic record and x is effective signal.
The basic idea of WIN denoising is to extract x from y as much as possible through effective network training. The formula is expressed as follows: where WIN stands for the wide inference network. = { , b} stands for the trainable parameters of convolution layers. It contains weight and bias b . In order to optimize the network represented by the above expression, the loss function is defined: where {(y i ; x i )} N i = 1 stands for N pairs of training samples. l( ) is the loss function used to learn the main parameters, . Because N is very large, it is unrealistic to use the entire training set to calculate the gradient of l( ) directly. Therefore, in this paper, we adopt small a batch stochastic gradient descent (SGD) to optimize l( ) . In each iteration, only a small number of training samples are used to calculate the gradient. As a result, the local gradient is utilized to replace the global gradient, which can improve the efficiency and reduce the complexity.
In order to measure the performance of network denoising, we employ SNR and MSE (Mean Square Error) as evaluation methods, which are defined as follows: where x i denotes the effective signals, q i denotes the recovered signals and M denotes the data length.

Training
The quality of training set will directly affect the performance of network denoising. In order to achieve a more satisfactory and accurate denoising effect, we need to construct a high-quality training set with good generalization ability. But there are few labeled samples of seismic data at present. In addition, the types of seismic noise vary in different areas. Therefore, it is particularly important to construct a training set that is suitable for desert seismic records. Thus, we have generated 30 different synthetic desert seismic records, which are used for the construction of a training set. We choose three representative seismic event synthetic models, mixed-phase wavelet, zero-phase wavelet and Ricker wavelet, to generate these original seismic records. Each record contains various dominant frequencies from 20 to 40 Hz, various apparent speeds from 1000 to 3200 m s -1 , and various vertical 803 distances between detectors from 8 to 12 m. The size of each record is 1600 × 200 . The noise data added in the original records is the real desert noise, which comes from the desert of the Tarim Basin, China. At the same time, we also add synthetic surface waves to the original records. All we do is to make the network generate a better generalization ability for new and invisible noisy records as far as possible. At last, the training set is constructed by taking the noisy data as the input and the corresponding pure data as a label. The expressions of the three representative seismic event synthetic models are as follows: Mixed-phase wavelet: Zero-phase wavelet: Ricker wavelet: where A stands for amplitude, f d stands for the dominant frequency, t d stands for the initial time and a and c are two variables, respectively. For the training parameter setting, we employ the public and available Caffe toolbox for network training. Network training is carried out by using an l 2 regularization method to minimize the loss function (3) to obtain the network parameters, Θ . When training, we adopt a small batch SGD method with momentum 0.9. Table 1 lists the setting of all specific parameters during training. We adopt a step learning rate policy along with an initial learning rate 0.1. Meantime, weight decay is 1e − 4 and clip gradient is 0.1, which are utilized to facilitate network training. The batch size is employed as 128.

Synthetic desert data examples
To verify the effectiveness and applicability of the WIN proposed in this paper for desert low-frequency noise attenuation, we present the following example with a synthetic desert seismic record. In addition, WIN is compared with f − x deconvolution, VMD and shearlet transform. Figure 2a shows a synthetic desert seismic record with the size of 1200 × 120 . The dominant frequencies of the effective signals are 25, 30 and 35 Hz. The red box shows three random missing seismic data. Figure 2c shows the noisy record with SNR = −6dB , which is contaminated with real low-frequency desert noise (noise data comes from the desert of Tarim Basin, China, as shown in figure 2b). Figure 2d-f demonstrates the frequencywavenumber spectrums of figure 2a-c. From figure 2d-f, the overlapped frequency band can be clearly observed. Meanwhile, from figure 2c and f, we can find that the effective signals are almost submerged in the strong low-frequency noise. Figure 3 shows the denoising results of four methods for a synthetic desert seismic record. From figure 3b and g, it can be seen that f − x deconvolution removes most of the lowfrequency noise. However, a large amount of noise, which has the same frequency as the effective signal, is retained in the denoising result. From figure 3l, we can observe that the effective signal attenuation is very serious. As for VMD, a large amount of noise still exists in the result, as shown in the figure 3c. Figure 3 parts h and m show that VMD abandons all effective signals that have the same frequency as low-frequency noise. Figure 3 parts d, i and n show the denoising results of shearlet transform. It can be found that the denoising results of shearlet transform are not good. Shearlet transform cannot suppress low-frequency noise thoroughly. It is worth mentioning that the three traditional denoising methods cannot recover the missing seismic data. In contrast, figure 3e indicates that WIN is superior to the other three denoising methods in noise attenuation and effective signal preservation. The denoising result of WIN is the closest to that of a pure record. Furthermore, WIN can recover the missing seismic data and make the effective events clear and continuous, as shown in the red box in figure 3e. Since all the  (e) Denoised synthetic data after WIN application. Frequency-wavenumber spectrum of: (f) synthetic noise-free desert seismic record, (g) f − x denoising synthetic data, (h) VMD denoising synthetic data, (i) shearlet transform denoising synthetic data, (j) WIN denoising synthetic data and (k) real desert noise. Frequencywavenumber spectrum of: (l) differences section between predicted and noisy data through f − x , (m) differences section between predicted and noisy data through VMD, (n) differences section between predicted and noisy data through shearlet transform and (o) differences section between predicted and noisy data through WIN.  effective signals contained in our training set are continuous, when a continuous signal is missing a few channels of data, the trained WIN tends to fit the missing data. Meanwhile, figure 3j and o show that there is almost no signal residue in the difference.
We randomly select a trace (105th) to further analyze the performance of the four methods. Figures 4 and 5 illustrate the denoising results of the four methods. From figure 4, we can observe that f − x deconvolution, VMD, shearlet transform and WIN all can suppress noise. But in comparison, the denoising result of WIN is the closest to the pure signal. As for f − x deconvolution, VMD and shearlet transform, there are various degrees of noise residue in their results. From the comparison in figure 5, it is obvious that the spectrum of denoising result of WIN is also the closest to that of pure signal.
The quality of denoising results can be objectively evaluated by the output SNR and MSE. SNR and MSE are represented by functions (4) and (5), respectively. The denoising results of the four denoising methods at different SNRs are shown in the Table 2 and figure 6. Compared with f − x deconvolution, VMD and shearlet transform, the output SNR in the result of WIN is always higher, while the output MSE in the result of WIN is always lower. The denoising result of WIN is obviously more outstanding, especially in the low SNR. 806

Real desert data processing
In order to verify the practicability and reliability of WIN, we evaluate the denoising performance of WIN on real desert seismic records in a certain area of China. Figure 7 shows two common-shot-point records, the size of one is 3000 × 180 and the other is 3000 × 120 . It can be seen from figure 7 that each record contains a large amount of strong low-frequency noise. The effective signals are almost submerged in the noise and it is difficult to identify. Four methods are used to test the real desert seismic records in figure 7. Their denoising results are shown in figures 8 and 9. As shown in figure 8a-c and figure 9a-c, there are various degrees of noise residue in the results of f − x deconvolution, VMD and shearlet transform. The effective signals are still hard to identify. What is worse, from figure 8e-g and figure 9e-g, we can see that the differences between noisy and recovered data contain a lot of effective signals. In contrast, as shown in figure 8d and h and figure 9d and h, we can see that WIN obtains the best denoising result. The effective signals are recovered more effectively and the background becomes clearer by the WIN.

Conclusion and discussion
When the effective signal and noise are at the same frequency band, traditional denoising methods suppress not only low-frequency noise but also effective signals. A WIN utilizes residual learning to minimize the error between the prediction and the target label to obtain a set of optimal weights and biases. There is no doubt that a WIN is an intelligent denoising method based on machine learning from the perspective of big data. In this paper, by constructing a high-quality training set, the trained WIN can establish an optimal non-linear mapping model between the noisy record and the effective signals. The synthetic data and field data processing examples show that a WIN is an automatic data-driven denoiser without manually tuning the parameters after training the model.
Since the real desert seismic records are complex and changeable, and the low-frequency noise level of each record is different, the application of the WIN has certain limitations. For future work, a WIN can be further explored and improved in the following two aspects: (1) select more and better seismic wave models to improve and enrich the training set and (2) further investigate the WIN structure for desert seismic records, such as the network 808 Figure 9. Another denoising example of a real desert record. (a) Denoised real desert record after f − x application. (b) Denoised real desert record after VMD application. (c) Denoised real desert record after shearlet transform application. (d) Denoised real desert record after WIN application. (e) Differences section between predicted and noisy data through f − x . (f) Differences section between predicted and noisy data through VMD. (g) Differences section between predicted and noisy data through shearlet transform. (h) Differences section between predicted and noisy data through WIN.
depth, the number of convolution kernels in a network, the size of convolution kernels, regularization method and so on.