High-resolution seismic impedance inversion integrating the closed-loop convolutional neural network and geostatistics: an application to the thin interbedded reservoir

Seismic impedance inversion is one of the key techniques for quantitative seismic interpretation. Most conventional post-stack seismic impedance inversion approaches are based on the linear theory, whereas the relationship between seismic response and impedance is highly nonlinear. Thus, it is challenging to implement conventional inversion methods to obtain high-resolution impedance for reservoir investigation. Convolutional neural network (CNN), a superior deep neural network, has a strong learning ability, which can learn from data and establish complex nonlinear mapping. However, CNN-based methods are generally heavily dependent on amounts of labeled data. Hence, an alternative seismic inversion approach is proposed that combines the closed-loop CNN and geostatistics. The closed-loop CNN is less dependent on labeled data, characterized by utilizing labeled data and unlabeled data simultaneously to train the neural network. The two subnets represent forward modeling and inversion respectively, constraining each other during the neural network training. Geostatistics can be used to enrich the training data for neural network training, taking into account geological and geophysical prior information. Synthetic data testing reveals that the proposed inversion scheme can obtain more reasonable results benefiting from labeled training data augmentation. The proposed inversion scheme is applied to the field data for identifying thin interbedded reservoir within delta depositional system. The predicted results obtained by the proposed inversion scheme are consistent with well log data and geological settings, offering insights into reservoir characterization and hydrocarbon identification.

getting rid of the influence of the wavelet. Moreover, the impedance can directly indicate subsurface stratigraphic information, which has meaningful geologic information and more accurate geological structure. Thus, high-resolution seismic impedance inversion is an important technique for improving the accuracy of seismic reservoir characterization.
Since Backus & Gilbert (1967, 1968, 1970 established the seismic inversion theory, seismic inversion has been an important research topic aiming to retrieve subsurface physical properties from exploration seismic data (Tarantola 2005;Wang 2016). Generally speaking, seismic inversion can be divided into two categories based on the algorithm, namely geophysics-based inversion and machine-learning-based inversion (Kim & Nakata 2018). The geophysics-based inversion methods are based on parametric equations describing the specific relationship between data and model space. Specifically, they can be mainly divided into two types: inversion methods based on wave equations and inversion methods based on convolution model (Buland & Omre 2003;Virieux & Operto 2009). Full waveform inversion (FWI) is the representative of the inversion methods based on wave equations, which involves all types of wave and expects to take advantage of information as much as possible to obtain more accurate results. However, it is too difficult to describe the rule of seismic wave propagation in the Earth by analytical equations. Thus, the high complexity and the low computational efficiency of FWI make it difficult to be applied in practice. Inversion methods based on the convolution model use the convolution model to depict the relationship between seismic data and impedance. It provides a more efficient way to obtain impedance from seismic data. Hence, lots of scholars have carried out researches on this type of inversion method. The trace integration and the recursive inversion were proposed to obtain impedance directly from seismic data (Lindseth 1979;Ferguson & Margrave 1996). However, since seismic data is band-limited, seismic inversion needs to incorporate more prior geology information to obtain high-resolution and robust inverted results. Researchers have developed many inversion approaches constrained by prior information (Cooke & Schneider 1983;Sacchi 1997;Hansen et al. 2006;Grana et al. 2012;Yuan et al. 2015). These approaches generally establish the initial model based on the prior information and then iteratively improve the model parameters via comparing the forward modeling data and recorded data until the error criterion is satisfied. The inversion approaches mentioned are all based on geophysics models. They generally require the linearization of the forward operator during the inversion. However, there exists a high nonlinearity between seismic data and the model parameters. The linearization might lead to inaccurate predictions, which makes it difficult to obtain high-resolution seismic inversion results (Lu et al. 1996;Zong et al. 2017).
Machine-learning methods are a bunch of data-driven methods that learn from data and make predictions with less dependence on humanity. Machine-learning methods have been widely used in seismic exploration (Zhao et al. 2016;Caté et al. 2017;Liu et al. 2020b;Chen et al. 2021;Liu et al. 2021a). Among various machine-learning methods, artificial neural network (ANN) is a fascinating method for its adaptivity and stronger capability of nonlinear mapping. Since the 1990s, ANN has been applied in seismic inversion for obtaining subsurface properties directly from seismic data, such as elastic attributes, petrophysical properties and so on (Rogers et al. 1992;Röth & Tarantola 1994;Huang et al. 1996;Moya & Irikura 2010). However, limited by the computing devices and optimization algorithms, ANNs, which usually have one single hidden layer, were implemented in the past. Poor generalization ability due to the simple structure of ANN hindered its application in geophysics.
The progress of computing devices and the availability of massive computational resources in recent years have led to great advances in machine-learning techniques. Evolving from classical ANN, various deep neural networks that have more sophisticated architecture attract more and more attention. In particular, CNN, a state-of-the-art deep neural network, is famous for its strong capability of feature extraction and has gained lots of popularity in seismic exploration, including seismic data processing, interpretation and inversion (Zhao et al. 2017;Yuan et al. 2018;Hu et al. 2019;Araya-Polo et al. 2019;Yang & Ma 2019;Wang et al. 2019;Mosser et al. 2020;Di et al. 2020;Lee et al. 2022;Wang et al. 2021Wang et al. , 2022Liu et al. 2021bLiu et al. , 2022. CNN directly learn from input data without extracting features from data manually in advance. Extracting low-level and high-level features hierarchically by convolution kernels, CNN is good at capturing spatial patterns in data and learning complex nonlinear mapping (Das et al. 2019). Theoretically, CNN is an appropriate approach to solve seismic inversion problems that are highly nonlinear. However, there exists a major obstacle. A large amount of labeled training data is needed for training CNN to obtain a reliable predictor. If not, overfitting would occur and generalization of CNN would be affected. In fact, due to the high cost of drilling and well logging, there are always only a limited number of wells in an exploration area. To address this obstacle, several attempts have been made to improve the generalization of CNN and boost CNN-based seismic inversion approaches, mainly including use of data augmentation and applying semi-supervised learning (Das et al. 2019;Das & Mukerji 2020;Wu et al. 2020).
On the basis of Wang et al. (2020), labeled training data augmentation based on geostatistics is introduced to assist the closed-loop CNN for high-resolution seismic impedance inversion. The closed-loop CNN is comprised of two Unets (i.e. a kind of CNN structure), named the forward subnet and inverse subnet, respectively. The closed-loop CNN 551 applied here can exploit unlabeled data to extract features for neural network training to reduce its dependence on labeled training data. Meanwhile, during the training, the forward subnet and inverse subnet can constrain each other so that generalization ability can be improved. Besides, in order to incorporate more prior information into the neural network training and enrich the labeled training data, geology and geophysics theories are utilized to generate synthetic labeled training data that conform to geological law and statistical information. The experimental results from the synthetic and field data demonstrate the superiority of the proposed seismic inversion scheme.

Methodology
CNN-based techniques always require a huge amount of labeled data, and the performance of the CNN relies on the variety of labeled data. Thus, a large amount of various labeled training data can improve the performance of CNN. Besides, the appropriate neural network structure can improve the generalization of CNN. Hence, aiming to introduce the prior geology information to the CNN-based seismic inversion, we propose the post-stack seismic impedance inversion scheme, combining the closed-loop CNN with geostatistics simulation that can not only bring in geology and geophysics knowledge, but can also augment labeled training data. In addition, the closed-loop CNN is an upgraded version of conventional CNN with better generalization ability.

Generate synthetic labeled training data
Unlike other areas, such as image segmentation, image classification and so on, where labeled training data can be easily obtained, only a small amount of labeled data are used for neural network training in seismic inversion. In seismic impedance inversion, labeled data refers to the impedance log at well locations and the corresponding seismic data at the same locations. Due to the limited number of wells in an area, the amount of labeled data usually is lacking. To overcome this difficulty, it is feasible to augment labeled training data by virtual of geology and geophysics knowledge, contributing to the neural network training.
The distribution of subsurface strata conforms to certain rules, such as physics, deposition rules and so on. Thus, synthetic training data should be generated, constrained by the certain geological sedimentary background and rock physics laws in the target area. Hence, synthetic well log data is generated based on the geological scenario of the reservoirs. Meanwhile, their associated synthetic seismic traces are computed.
The approach applied to simulate synthetic log data in this paper consists of two parts, including lithofacies (discrete attributes) generation and elastic or reservoir properties (continuous attributes) generation (Deutsch & Journel 1992; Remy et al. 2009). In the first place, based on statistical analysis, the distribution of lithofacies in the vertical direction and the statistics rule of reservoir properties (e.g. porosity, water saturation, volume of clay etc.) or elastic properties (velocity, density, impedance etc.) can be obtained. Then, by virtue of the sequential indicator simulation (SIS), we can generate lithofacies sequences via acquired lithofacies distribution and a vertical variogram that contains geological information. Subsequently, we populate each lithofacies with reservoir properties via the sequential Gaussian simulation (SGS) based on statistics. The rock physics model can be applied to generate elastic properties from reservoir properties, realizing a geophysical constraint for labeled training data generation. Alternatively, elastic properties can be directly simulated within predefined lithofacies according to the obtained statistics, bypassing reservoir properties generation. The final step is the synthetic seismic response generation based on the generated elastic properties and estimated wavelet from seismic data. The specific workflow of labeled training data generation is shown in figure 1. Figure 2 shows the structure of the closed-loop CNN used in this paper. The neural network is composed of two subnets: a forward subnet and inverse subnet. The forward subnet is trained to approximate seismic forward modeling. Conversely, the inverse subnet is used for approximating the 552 Figure 2. Architecture of the closed-loop CNN, composed of two subnets (U-net).

Closed-loop CNN model
seismic impedance inversion. The forward and inverse subnets both adopt the same U-net structure, which is a kind of end-to-end CNN structure. The U-net is composed of an encoding part and a decoding part. Specifically, impedance and the corresponding seismic trace form the output and input of the inverse subnet, respectively. As to the forward subnet, the impedance is the input and the seismic trace is the output.
The closed-loop CNN implements 1D convolution. So, impedance and seismic data are both sent to CNN for training and predicting trace by trace. Tested by experiments, implement five stacked convolution layers for the encoding and decoding, respectively. Convolution kernel of size 3 × 1 is implemented in all the layers of the neural network (Wang et al. 2020). In the encoding part, the stride of convolution is 2, halving the size of feature maps after each layer. The number of channels in the first layer is set to be 128. After each layer, the number of channels will be doubled. Similarly, in the decoding part, the convolution stride is set to 1. The bilinear interpolation is used for resizing the feature maps, doubling the size of feature maps. Here, the number of channels is halved after each convolution layer. Rectified linear units are used as the activation function through the neural network, giving the CNN stronger nonlinear mapping capability (LeCun et al. 2015). For the neural network, batch normalization not only is implemented to speed up CNN training by normalizing the data, but also acts as a regularization to mitigate overfitting (Ioffe & Szegedy 2015). Furthermore, skipping connections in the U-net concatenates the convolution layers of the encoding part and decoding part, which can merge multiscale features to improve the ability of nonlinear mapping.
The open-loop CNN only requires labeled training data for the training. The object function can be expressed in the least square form: where d denotes input seismic data, which are labeled by impedance log data m. The labeled training data set has N pairs. By iteratively training the inverse subnet, the parameters of the network Θ b are updated via minimizing the object function, obtaining the function F Θ b (•), which represents the seismic inversion. Similarly, the function F Θ f (•), which represents the seismic forward model can be obtained by training the forward subnet, minimizing the following object function: Compared with the open-loop CNN, the closed-loop CNN has a more complex objective function, including  cycle-consistency loss . It is the cycleconsistency loss that can make use of unlabeled seismic data to train the neural network. So, the loss function can be expressed as where The cycle-consistent losses with respect to labeled seismic data and impedance are denoted by L 1 cycle and L 2 cycle , respectively. The cycle-consistent loss with respect to unlabeled data d * is denoted by L 3 cycle . N * is the number of unlabeled seismic traces used for neural network training. In addition, 1 , 2 and 3 are the hyperparameters that are used for adjusting the weight of each part in the loss function.
After taking the partial derivatives of the objective function expressed by equation (3), the Adam optimization algorithm (Kingma & Ba 2014) is applied to optimize the neural network, training the forward and inverse subnets together. Then, the well-trained inverse subnet is used for obtaining inverted impedance.

The workflow of the seismic inversion
The workflow of the proposed high-resolution closed-loop CNN seismic impedance inversion scheme is mainly composed of three parts: (i) combining the geologic and rock physics knowledge, analyze the obtained log data in the target area to get the statistics properties and the spatial characteristics of the impedance. Then, generate the labeled training data based on the SIS and SGS for network training. (ii) Train the closed-loop CNN model with the labeled and unlabeled data. (iii) Apply the well-trained inverse subnet to predict the impedance directly from seismic data. In this research, we implement CNN via Pytorch.

Examples
In this section, we illustrate the performance of the proposed inversion scheme on both synthetic data and real  field data. As to the synthetic data example, the Overthrust model is chosen to prove the efficiency and robustness of the proposed inversion scheme. Furthermore, the proposed inversion scheme is applied to the field data to obtain outstandingly high-resolution inversion results, which are in accordance with the geological understanding and improve the accuracy of thin interbedded reservoir identification.

Synthetic data example
We adopt the Overthrust model to assess the proposed inversion scheme. Figure 3a shows the impedance model. Figure 3b shows the corresponding synthetic seismic data that is generated by convolving the reflectivity obtained from the impedance model with a 40 Hz Ricker wavelet. The seismic data has 801 traces. The number of samples per trace is 374. The sampling interval is 1 ms. Five traces located at CDP 100, 250, 400, 550 and 700 are extracted from the impedance model as the known logging data, which are used as labeled training data, paired with the corresponding seismic data. The positions of the wells are shown in figure 3b (red curves). In the meantime, the rest of seismic data are unlabeled and applied to train the closed-loop 555 Figure 7. Crossplot of impedance against gamma ray colored by lithofacies. CNN. Besides, they will be used to assess the inversion results.
As mentioned earlier, the forward and inverse subnets consist of five convolutional layers, respectively. The initial learning rate is 1 × 10 −4 . Based on the five known wells, more labeled training data are generated with the help of the scheme for generating labeled training data (figure 1). Because we obtain impedance from the Overthrust model without lithology information, augmented labeled training data can be directly obtained via statistics analysis and geostatistics simulation for the elastic property, bypassing lithofacies simulation. Here, a data set of 2000 labeled training data has been generated, which is used for training the closed-loop CNN.
Aiming to validate the capability of the proposed inversion scheme, we compare the results inverted by the proposed inversion scheme with results inverted by other two methods, including the closed-loop CNN-based seismic inversion without label augmentation and the conventional modelbased inversion. Figure 4 parts a-c show results obtained from the proposed inversion scheme, closed-loop CNNbased inversion without label augmentation and the conventional model-based inversion, respectively. Figure 4 parts d-f represent absolute error maps of the corresponding methods. It is observed that all methods can achieve reasonable results. However, the results of the proposed inversion scheme possess higher resolution and can reveal more details, better accounting for the stratigraphic heterogeneity. In contrast, the 556 Figure 9. Some simulations obtained via the geostatistics approach, including lithology and impedance logging data (sand in yellow, shale in gray, impedance is the red curve). results obtained via the inversion method based on closedloop CNN without labeled data augmentation have poor lateral continuity caused by the overfitting of the network due to the lack of labeled training data. Besides, although the results obtained via the conventional model-based inversion method are of lateral continuity, the resolution is lower than results obtained by the other two methods.
To test the robustness of the proposed inversion scheme, Gaussian white noise is added into the synthetic seismic data. Figure 5 parts a-c are the predicted impedance by the proposed inversion scheme using noisy data at 10, 15 and 25 dB, respectively. The results indicate that the proposed inversion scheme can still obtain reasonable inverted impedance even when the input seismic data is noisy. It means that the proposed inversion scheme is robust.

Field data application
In this section, we provide real data experiments demonstrating the performance of the proposed inversion scheme.
The research target herein is the thin interbedded sandstone reservoirs of Cretaceous Qingshankou formation in the Daqingzijing area, Songliao Basin, Northeast China as shown in figure 6a. Figure 6b shows the stratigraphic and litholog- sandstone reservoirs are characterized by their thinness (about 10 m) and a rapid horizontal variation. Meanwhile, as figure 7 shows, although the impedance ranges of sandstone and shale have some overlap, the sandstone reservoirs (yellow) are characterized by higher impedance and moderate lower gamma ray than the shale (gray). The impedance of sandstone is between 11 × 10 6 and 13 × 10 6 msg −1 kg −1 cm −3 . Hence, the high-resolution seismic impedance inversion method is in demand for the thin interbedded sandstone reservoirs.
The data set includes a 3D post-stack seismic and the corresponding well log data. The 3D seismic data set has about 1.9 million traces and covers approximately 310 km 2 . The seismic data have also been processed properly so as to reveal subsurface geologic situation. Figure 8 shows the seismic data. The interval of interest is between the two horizons (pointed by white arrows in figure 8) in the time domain. Based on the seismic data analysis, the dominant frequency of the seismic data is 35 Hz.
In the light of the proposed inversion scheme, 15 wells have been used first to augment the labeled training dataset, which includes 15 true well log data and 3500 simulated well log data plus the corresponding seismic data. Figure 9 shows some simulations of lithology and corresponding impedance logging data obtained by the geostatistics approach mentioned in section 2.1. Different realizations show that sand occur alternately with shale, similar with the real labeled training data. Besides, impedance of sand intervals is generally larger than that of shale interval, which is coincide with the rock physics law in the study area. Then, we train the neural network constructed in the previous using the augmented labeled training dataset, taking ∼50 minutes. During the training, batch size is set to 10. As shown in figure 10, the training loss converges with respect to epochs, meaning the training is effective. Apply the trained closed-loop CNN to directly predict impedance from post-stack seismic data, which costs ∼2.5 hours. Compared with the conventional modelbased inversion method, computational efficiency has been 558 Figure 12. Inverted impedance based on the proposed inversion scheme, inserted the impedance log (black curve). There are two sandstone reservoirs whose thicknesses are 6.5 and 12.5 m, respectively. improved a lot. The training and predicting are both performed on HP Z640 with 1 NVIDIA Quadro P2000 GPU and 20 CPUs. Figure 11 parts a and 11b are the inverted impedance via the proposed inversion scheme and the conventional inversion method, respectively. Comparing the results, it is obvious that impedance inverted based on deep learning method has a higher resolution and can identify thinner layers. Figure 12 demonstrates that the predicted impedance is consistent with the well log data, providing more powerful proof for thin interbedded sandstone reservoir identification. As shown in figure 12, the inverted impedance based on the proposed inversion scheme can identify the sandstone reservoir below 10 m. Figure 13 indicates that impedance (in red) predicted by the proposed inversion scheme can match measured impedance (in black) from a blind well and the interpretation results (sandstones are indicated by yellow blocks), exhibiting the capability of the inversion method. In contrast, impedance (in green) inverted by the conventional model-based inversion method can only reveal the trend of impedance, lacking details. Figure 14 shows the strata slice of predicted impedance within the Qingshankou formation based on the proposed inversion scheme. As illustrated in figure 14, the predicted impedance coincides with sedimentary facies, which is in the fan delta front (indicated by white dotted lines): namely, favorable reservoir sands are distributed in the distributary channels and mouth bars. Besides, in figure 14, there exists high impedance within the area marked by the white circle, which has been confirmed to be volcanic rocks (basalt) by logging interpretation. Thus, the predicted results provided a significant proof for identifying the thin interbedded sandstone reservoir, and have a high resolution both vertically and horizontally. Figure 13. Comparison between the inverted impedance (in red) based on the proposed inversion scheme, the inverted impedance (in green) based on the conventional model-based inversion method and the measured impedance (in black) at well location (H46). Yellow blocks represent sandstone. 559 Figure 14. Slice of the inverted impedance obtained via the proposed inversion scheme along the target reservoir. The lithology column and logging data (GR and AC) in well Q139 illustrate that high impedance within the white circle corresponds to volcanic rocks (basalt).

Conclusions
In this article, a post-stack seismic inversion scheme is established to predict impedance, which not only takes advantage of the closed-loop CNN but also introduces geology and rock physics knowledge into the inversion as constraints. The closed-loop CNN used here can extract information from unlabeled data, which can constrain the neural network training process so as to reduce the network's dependence on labeled data. Nevertheless, in practice, there may be only a few wells (labeled training data) for oil and gas exploration and development. Hence, labeled training data augmentation considering geology information and rock physics is applied to enrich training data types properly. The tests on the synthetic data demonstrate that the proposed inversion scheme can obtain more reasonable results. The application of the proposed inversion scheme on the real field data achieves good results that reveal thin interbedded sandstones (higher impedance) in a delta depositional system with high resolution. Thus, our study proves that a machine-learning-based method combining geology and geophysics knowledge can facilitate quantitative seismic reservoir characterization with higher efficiency and accuracy.