Two-dimensional deep learning inversion of magnetotelluric sounding data

Conventional linear iterative methods for magnetotelluric sounding (MT) suffer from considerable limitations related to difficulties in selecting the initial model and local optima. On the other hand, conventional intelligent nonlinear methods exhibit slow convergence and low accuracy. In this study, we propose an inversion strategy based on the deep learning (DL) deep belief network (DBN) to realise the instantaneous inversion of MT data. A scaled momentum learning rate is introduced to improve the convergence performance of the restricted Boltzmann machine during the DBN pre-training stage, and a novel activation function (DSoft) is introduced to enhance the global optimisation capability during the DBN fine-tuning stage. To address the difficulty in designing the sample data when prior information is lacking, we employ the k-means ++ algorithm to cluster the MT field data and use the clustering results as the prior information to guide the construction of the sample dataset. Then, based on the proposed DBN, we ensure end-to-end mapping directly from the apparent resistivity to the resistivity model. We implement two groups of experiments to demonstrate the validity of both improvements on the DBN. We consider six types of geoelectric model from the test set to demonstrate the feasibility and effectiveness of the proposed DBN method for MT 2D inversion, which we further compare with the well-known least-square regularisation method for several extended geoelectric models and field data. The qualitative and quantitative analyses show that the DL inversion method is promising as it can accurately delineate the subsurface structures and perform rapid inversion.


Introduction
The magnetotelluric sounding method (MT) measures natural geomagnetic and geoelectric field variations on the Earth's surface and infers the distribution of subsurface structures by inversion (Tikhonov 1950;Cagniard 1953). Currently, conventional MT inversion methods include Occam inversion (Siripunvaraporn & Egbert 2000), rapid relaxation inversion (Smith & Booker 1991;Lin et al. 2009), Gauss-Newton inversion (Grayver et al. 2013;Liu & Yin 2016), nonlinear conjugate gradient inversion (Newman & Alumbaugh 2002;Kelbert et al. 2014) and least-square regularisation (LSR) inversion (Lee et al. 2009). These methods have yielded useful results to facilitate practical applications. However, most of these methods are based on the traditional linearisation theory; that is, they linearise the nonlinear inverse problem in the vicinity of the initial model and then apply optimisation methods to obtain the extrema. As a result, these methods rely heavily on the initial model and tend to stuck in local minima. With the progress of computer performance and forward algorithms, intelligent nonlinear inversion methods featuring global optimisation have attracted increasing attention. Compared with conventional inversion methods, intelligent nonlinear inversion methods solve nonlinear inversion problems directly and are independent of the initial model and Jacobian matrix. At present, simulated annealing inversion (Shi & Wang 1998;Sharma 2012), artificial neural network (NN) inversion (El-Qady & Ushijima 2001;Wang et al. 2015), genetic algorithm inversion (Schwarzbach et al. 2005;Wang et al. 2018), ant colony algorithm inversion (Liu et al. 2015), particle swarm optimisation algorithm inversion (Shaw & Srivastava 2007;Shi et al. 2009) and Bayesian inversion (Guo et al. 2011; have been applied in the electromagnetic field. Nevertheless, these intelligent nonlinear inversion methods are insufficient for feature representation. When addressing complex situations (e.g. processing large datasets), some limitations in terms of low accuracy and slow convergence remain, which hinder the further development of intelligent nonlinear methods for electromagnetic inversion.
Deep learning (DL) methods represent an evolution of artificial NNs. DL first made breakthroughs in the fields of information extraction, machine translation and image processing . In recent years, DL methods have gained much attention in geophysical communities, particularly in the field of seismic exploration. The popular application of DL methods in geophysics usually focuses on the recognition and classification of geological characteristics, such as lithology prediction (Zhang et al. 2018), noise suppression (Wu et al. 2019;Wu & Xue 2020), fault recognition (Yang 2019) and automatic event picking (Chen 2017). The first promising application of DL methods in seismic tomography (Araya et al. 2018) may usher in a new epoch in geophysical inversion, wherein DL methods will supplement or even substitute traditional inversion methods, not only yielding accurate results but also delivering the distribution of underground properties in real time. In the field of electromagnetic prospecting, Puzyrev (2019) proposed a strategy based on a convolutional neural network (CNN) for controlled source electromagnetic (CSEM) (Di et al. 2019 two-dimensional (2D) inversion. Shahriari et al. (2020) constructed a deep neural network (DNN) based on a CNN and recurrent NN to realise onedimensional real-time inversion of borehole resistivity measurements for the surrounding subsurface. Liu et al. (2020) carried out 2D DL inversion of electrical resistivity survey data to establish mapping between apparent resistivity data and the geoelectric model by CNN. All these works manifest great promise of DL methods and provide new perspectives for electromagnetic inversion.
To the best of our knowledge, applications of DL to MT inversion are relatively rare; therefore, the advantages and limitations of such applications need to be explored. In this work, we develop a 2D DL inversion of MT data based on the deep belief network (DBN) approach. Wang et al. (2019) first introduced the DBN to invert MT data and confirmed its feasibility for a synthetic dataset. However, the network structure of the DBN used was primary and shallow. DL refers to NNs that generally contain a large number of layers (Shanmugamani 2017). A DNN enables the detection and extraction of complex features hidden in a large dataset. Here, we introduce a scaled momentum learning rate and a novel activation function, DSoft, to improve the performance of DBN on MT inversion. Two sets of experiments were conducted to demonstrate the effectiveness of both improvements on the DBN. Many previous studies (Puzyrev 2019;Wang et al. 2019;Shahriari et al. 2020;Liu et al. 2020) have achieved promising results in electromagnetic modeling using DL methods. Nevertheless, they have mainly focused on the architecture, configuration and evaluation of the performance and application of NNs, and few studies have discussed how to produce a reasonable sample dataset for field data, especially when lacking prior information (e.g. physical properties of rock and ore samples). In view of this, we introduce the k-means++ algorithm (Bensaid et al. 1996) to cluster the MT field data and use the clustering results to guide the construction of the sample dataset. Then, the proposed DBN method is trained online and tested on several MT synthetic datasets and offline field data. We also compared the proposed method with the classical linear iterative method, namely the LSR achieved by the open-source software MT2DInvMatlab (Lee et al. 2009). The comprehensive qualitative and quantitative analysis results are promising. The proposed DBN inversion method not solely yields accurate inversion results but also allows for instantaneous inversion.

Method
DBN is a widely used DL model (Qiu 2020) that can extract the essential characteristics and patterns of data and has the ability to converge to the global optimum. As depicted in figure 1, the DBN is a DNN that consists of multiple layers of a restricted Boltzmann machine (RBM) and one layer of a back-propagation (BP) network. The training process of the DBN is composed of two stages: pre-training and fine-tuning. In the first stage, the RBMs are trained in unsupervised learning way layer by layer from the bottom to the top, and the input of the upper-layer RBM employs the output of the lower-layer RBM. In the latter stage, the DBN parameters are optimised in supervised learning way by minimising the difference between the network output and target output, based on the algorithm of the error BP theory.

Restricted Boltzmann machine
The RBM can be viewed as a directionless graph model, as shown in figure 2. h is the hidden layer: it consists of hidden nodes h j (0 < j ≤ n) and can be regarded as a feature  extractor. v is the visible layer: it is composed of visible nodes v i (0 < i ≤ m) and represents the input data. W is the connection weight matrix between the visible and hidden layers.
When the states of the visible nodes are given, the activation probability of each hidden node can be obtained by: where = {w ij , a i , b j } are parameters of the RBM, w ij represents the connection weight between the visible node v i and hidden node h j , a i represents the bias of the visible node v i and b j denotes the bias of the hidden node h j . The RBM is a symmetric network. Therefore, given the states of the hidden nodes, the activation probability of each visible node is as follows: An iterative method is adopted to train the RBM to determine the optimal parameter to approximate the provided input data. Using the fast-learning algorithm, namely contrastive divergence algorithm (Hinton et al. 2006) to train the RBM, the updating rule of each parameter can be conveniently obtained: where denotes the learning rate and < ⋅> recon indicates the probability distribution of the model after a one-step reconstruction.
The choice of learning rate is fundamental. When is large, the convergence speed is fast but a very large rate may cause instability of the algorithm. Meanwhile, a small can prevent instability but suffers from a low convergence rate. To solve this problem and avoid the algorithm prematurely converging to a local optimum, we introduce the scaled momentum learning rate, which we use instead of the real gradient with the previous accumulated momentum. It uses the 'weighted moving average' of the negative gradient as the updating direction. Then, the updating rules of the parameters in equation (3) change to the following equations: where signifies the scaled momentum learning rate.
Thus, the actual updating direction of each parameter is not fully determined by the current likelihood function of the gradient but is controlled by the weighted average of previous gradients. When the gradient direction of a parameter is inconsistent with that in the previous period, the updating range of the real parameter becomes smaller. On the contrary, if it is consistent, the updating range of the real parameter is increased. Generally, in the early stage of the iteration, the gradient directions are mostly the same. The scaled momentum learning rate accelerates the network to reach the optimal solution faster. In the late stage, the gradient directions are inconsistent and oscillate around the convergence value. 629 The scaled momentum learning rate slows down the network to increase the convergence stability.

BP network
The BP network is a multi-layer feedforward network. Its principle is to continuously optimise the parameters of the entire network using back-propagating errors to achieve the goal of minimising the sum of the network square errors. Here, we use a three-layer BP network consisting of an input layer, an output layer, and a hidden layer. Then, considering the connection weight w ij between the input and hidden layers, hidden layer bias b j , activation function, f h and input data v, the hidden layer output h can be calculated as follows: According to the connection weight w jk (0 < k ≤ l) between the hidden layer and output layer, output layer bias c k , activation function, f o and input data h, the top layer output o can be obtained by the following equation: Then, we use the predicted output o and target output y to compute the network error energy sum e, which is the standard error for checking whether the network has converged.
The activation function f (⋅) is a crucial component of the NN as it helps ensure nonlinearity of the network, which has proven to be key to high performance. Over the years, many activation functions have been proposed in theoretical studies, but only a few are suitable for most applications. The sigmoid and hyperbolic tangent (Tanh), which have been used as activation functions for several decades, have been recently identified to be less competitive for DNNs owing to accepted wisdom that a saturated activation function is prone to cause gradient vanishing (and/or explosion). Currently, the two most popular and successful activation functions in the DL community are the rectifier linear unit (ReLU) (Nair & Hinton 2010) and Swish (Prajit et al. 2017), defined as f (x) = max(0, x) and f (x) = x ⋅ sigmoid(x), respectively. The ReLU has an identity derivative 1 for positive arguments and 0 otherwise, and thus not only makes the network sparse but is also less susceptible to gradient vanishing and exploding problems. Therefore, it has become the standard/default activation function for most tasks. Unlike ReLU, the activation function Swish is smooth and non-monotonic, and similar to ReLU, Swish is unbounded above and bounded below.
It has been demonstrated that Swish consistently matches or surpasses ReLU in many challenging fields.
Inspired by Swish, we propose a novel activation function 'DSoft' that takes the following form: where (x) = x(1 + |x|) −1 is the Softsign (Turian et al. 2009) activation function, and (x) = ln(1 + e x ) is the Softplus (Nair & Hinton 2010) activation function. The derivative of DSoft is: where −1 is the sigmoid activation function. The leftmost and middle subgraphs in figure 3 show the curves of DSoft and its derivatives, respectively. The rightmost subgraph of figure 3 displays other commonly used activation functions, along with DSoft, for comparison. The proposed DSoft activation function has the properties of being unbounded above, bounded below, nonmonotonic and smooth. The characteristic of being unbounded above is preferable because it prevents saturation, which tends to cause slow convergence and gradient vanishing. This property indicates the considerable improvement of ReLU over the sigmoid and Tanh (Nair & Hinton 2010); this property is shared by DSoft. A lower bound may also be a favorable property because it can induce a strong regularisation effect. The property of non-monotonicity preserves the small negative inputs as negative outputs, which can improve gradient flow and enhance expressivity (Prajit et al. 2017). Finally, smoothness is an advantageous property resulting in better optimisation and generalisation performances (Prajit et al. 2017). Figure 4 illustrates the output landscapes of four different activation functions by passing the coordinates of each point into an 11-layer randomly initialised NN and plotting the scalar outputs. It is clearly shown that there are many sharp transition regions in the output landscape of ReLU because of its non-smooth property. In contrast, the output landscapes of DSoft are quite smooth, similar to those of Swish. A smoother output landscape indicates a smoother loss landscape, which is prone to better optimisation and generalisation.
Consequently, the updating rules of the network parameters at the fine-tuning stage are as follows: where the error term is

Neural network modeling
NN modeling aims to acquire the nonlinear mapping relationship between the input and output through training. In this work, we intend to approximate the inverse function of MT using DBN and map the apparent resistivity to the resistivity model directly. Accordingly, the sample data is a synthetic dataset of the apparent resistivity value obtained by the forward solver based on the finite element method (FEM) and the corresponding true resistivity value. The code developed by Wang et al. (2013) was used as the forward solver to calculate the apparent resistivity for the corresponding resistivity model. All simulations in this work were carried out in the following modeling environment: Windows 10 operating system (64 bit), 16 GB memory, and an eight-core 3.00 GHz CPU; MATLAB R2020a was used as the programming platform.

Sample data preparation
The quantity and quality of the sample data are the main factors influencing the generalisation performance of the NN. When the sample data are sufficiently representative, the generalisation performance of the NN is high. However, it is often difficult to design sample data for field data when prior information is lacking (e.g. physical properties of rock and ore samples). Cluster analysis is a data mining technique that is widely used to extract the intrinsic characteristics of unlabeled sample data (Aurélien 2019). In this study, we apply the k-means++ algorithm (Bensaid et al. 1996; Aurélien 2019), a popular clustering algorithm, to cluster the MT field data and obtain the prior distribution information of the apparent resistivity data. Then, the prior information is used to guide the construction of the sample dataset.
The k-means++ algorithm instructions are as follows.
(i) Select a data point uniformly at random from a given dataset of apparent resistivity as the initial cluster center c 1 ; (ii) Select the next cluster center c i = x ′ ∈ with probability: where D(x) = arg min ‖x − c r ‖ 2 , r = 1, 2, ..., k selected denotes the shortest distance from a data point x to the nearest cluster center that has already been selected. (iii) Repeat step ii until a total of k cluster centers = {c 1 , c 2 , ..., c k } have been chosen; (iv) For each i ∈ {1, 2, ..., k}, assign the remaining data points in that are closer to c i than c j (for all j ≠ i) to the cluster C i ; (v) For each i ∈ {1, 2, ..., k}, recalculate c i by setting it to be the center of mass of all points in C i : (vi) Repeat steps iv and v until no longer changes; (vii) Use the cluster centers as the prior information to construct the sample dataset.
In this study, an MT field dataset from a survey line in the Bainiuchang mining area, Yunnan Province, China, was used for cluster analysis and inversion research. The field data were acquired using an EH4 electromagnetic imaging system with 51 measurement points at 40 m intervals and 39 frequency acquisition points. The frequency range was [15.8, 100 000] Hz and was distributed logarithmically and equally. As a result, the MT survey line produced 1989 measurements. Figure 5 shows the apparent resistivity curves of two typical MT sites from the survey line. We can observe that the apparent resistivity curves of both sites have good continuity and the curves of TE and TM polarisation modes are consistent in shape, indicating that the quality of the observed data is reliable.
When the k-means++ algorithm is applied to cluster the apparent resistivity data, we should determine the optimal number of clusters k in advance. Otherwise, the clustering  result might be quite unsatisfactory if k is set to an incorrect value. Here, the performance metric inertia and 'elbow rule' were used to choose k and evaluate the clustering results (Bensaid et al. 1996;Aurélien 2019). A lower inertia value denotes a better partition. Indeed, a larger number of clusters causes the data points to be closer to its closest cluster center; consequently, the inertia will be lower. Figure 6 shows the inertia as a function of k. As evident, the inertia decreases gradually with an increase in k. However, the inertia reduces rapidly as k increases up to 6 and then decreases gradually as k continues increasing. Using the 'elbow rule' , k= 6 is determined to be a good choice, which indicates the field data have been fully clustered that any lower value would be dramatic, while any higher value would not help much. Table 1 provides the clustering results of the field data when k= 6, which functions as prior information to guide the construction of the sample dataset. This is meaningful for inverting field data when lacking prior information, and it also provides the possibility of auto-constructing a sample dataset based on the clustering results in future work. It should be mentioned here that though the sample dataset is built based on the clustering results of the field data, it is totally different from the field data. We expect to solve the real-life MT applications with simple and regular sample geoelectric models. Although we acknowledge the fact that complex and sophisticated geoelectric models could simulate the real scenario more accurately and better solve inversion problems, the construction of sample geoelectric models and the forward operation are more time-consuming and computationally expensive. When lacking prior information, the construction of complex sample models is also difficult to realise. Moreover, it might be preferable to automate the construction and generation of sample data using simple and regular sample models.
Therefore, based on the prior information provided by the k-means++ algorithm, we generate sample geoelectric models in an effective way: embedding an anomalous body with different resistivities, sizes and shapes into different locations in a homogeneous medium (surrounding rock). For all types of model, the resistivity anomaly varies within {200, 500} Ω ⋅ m and the variation range of the surrounding rock is {1000, 2000, 4000, 6000} Ω ⋅ m. Figure 7 shows the sample models used for training. Other unlisted models have similar characteristics but differ in the location, resistivity of anomaly, and resistivity of the surrounding rock. Nine types of geoelectric models are included. In this way, two polarisation modes of TE and TM in MT sounding generate 6050 geoelectric models and then separately obtain 6050 sets of apparent resistivity data using a 2D forward solver based on the FEM with a unified mesh generation size. All geoelectric models were uniformly set with 51 measurement points and 39 frequency acquisition points. The frequency range was [15.8, 100 000] Hz and was distributed logarithmically and equally. To optimise the training process, we reduced the number of network input and output nodes by dividing each geoelectric model into four subsections. As depicted in figure 8, every subsection has a horizontal length of 800 m and the same vertical depth as the geoelectric model. To avoid the potential negative effect of the partition boundary, two adjacent subsections have an overlapping area of 400 m in the horizontal direction. In this way, a total of 24 200 groups of sample data composed of apparent resistivity and true resistivity data were acquired in the TE and TM polarisation modes, respectively. Liu et al. (2021) demonstrated in their work that the training effect was worse when using phase or a combination of apparent resistivity and phase as input than when using apparent resistivity as input. Therefore, in our work, all the simulated apparent resistivity and the corresponding true resistivity values of the subsection are used as input and target output data, respectively. Both the input and target output data are normalised to the range of [−1, 1] because data standardisation tends to improve the performance of the training process (El-Qady & Ushijima 2001). The normalisation technique that we used was the standard min-max curve scaling (Qiu 2020). We randomly divided the 24 200 groups of sample data into training, validation and test sets at a ratio of 7:1:2.

Network evaluation
To demonstrate the effectiveness of the proposed DBN inversion method, two experiments were conducted as follows: experiment 1, scaled momentum learning rate (SMLR) performance analysis and experiment 2, activation function DSoft performance analysis. Because there is no known, mature and scientific theory for setting NN parameters in the MT field, an 11-layer DBN model (one input layer, nine hidden layers and one output layer) was adopted with reference to the 633 Figure 8. Illustration of how to divide a geoelectric model into subsections.
previous work (Hinton 2012;Lv et al. 2014) and by a combination of experimental and empirical methods. The number of nodes in each hidden layer was set to 200. To avoid any negative effects of certain occasionality and instability in NN learning, both the following experiments were repeated 10 times for the fixed MT training and validating sets.
In experiment 1, to verify the role of SMLR at the DBN pre-training stage, we compared the convergence performance of the RBM with and without the SMLR on the MT data. The SMLR is set to 0.5 when epochs ≤20 and to 0.9 when epochs >20. To ensure a fair comparison, other relevant hyperparameters were set as below following Hinton (2012): batch size and epochs were both 100, learning rate was 0.1. The evaluation indicator reconstruction error E re denotes the difference between the reconstructed data and original input data at the pre-training stage, and the lower E re the difference, the smaller the difference. The equation is as follows: where T is the sample number and D is the dimension of the sample. Table 2 shows the convergence results of RBM with and without the SMLR on both TE and TM polarisation modes of MT. It can be clearly seen that the reconstruction errors of RBMs with SMLR are much lower than RBMs without SMLR, especially the first RBM, which is the first and most important step to reconstruct the original input data directly. Further, the reconstruction errors of RBMs with SMLR show a steady downward trend, but the reconstruction errors of RBMs without SMLR appear oscillations in several RBMs. These indicate that the SMLR not only prevents the RBM from prematurely converging to the local solution but also enhances the convergence stability of RBM. In experiment 2, to confirm the validity of the activation function DSoft during the DBN fine-tuning stage, we compared DSoft against several successful and widely used activation functions in the DL community: Tanh, ReLU, Leaky ReLU (Maas et al. 2013), Softplus and Swish. The network was optimised using the popular Adam algorithm, following Kingma and Ba (2015). For a fair comparison, the learning rate , epoch and batch size were maintained at 0.001, 500 and 100, respectively, for each activation function. To avoid overfitting, a patience of 50 for early stopping was used such that the fine-tuning process would be stopped if the loss on the validation set failed to decrease or remained the same for 634  50 consecutive cycles. The mean square errors E ms are used as performance indicators: where y represents the predicted values, Y represents the expected values and Ndenotes the dimension of the output data. E ms measures the estimation error between the predicted and expected values. The lower its value, the higher is the prediction accuracy. Table 3 provides concrete and conclusive proof of the effectiveness and superiority of DSoft activation function by comparison with other popular activation functions (especially Swish). The average and best error performances of training and validation are used for detailed statistical analysis. As is evident, the mean and optimal error values of 10 runs achieved by DSoft are found to be the smallest, which implies DSoft consistently surpasses other activation functions on both TE and TM polarisation modes datasets in MT inversion. Figure 9 displays the fine-tuning time graph of the proposed DBN on the training set of TE polarisation mode using ReLU, Swish, and DSoft. As shown in the figure, DSoft has a relatively higher epoch time by comparison with ReLU and Swish, which can be described as a drawback.
In addition, it can be observed in both experiments that the performance of the proposed DBN method in the TE polarisation mode surpasses that of the TM polarisation mode. This may be because the apparent resistivity of the TE polarisation mode obtained by FEM is higher than that of the TM polarisation mode in discretisation and separability (average apparent resistivity standard deviations of training sets of TE and TM are 0.0318 and 0.0185, respectively).

Examples and results
This section presents a number of inversion results to analyse and discuss the inversion effect of the proposed DBN method in the MT field. The difference degree of shape, range, boundary and position of the anomaly body along with the resistivity value are the principal factors for evaluation. Figure 10 shows the loss curves of the proposed DBN inversion method on the training and validation sets of the TE and TM polarisation modes. In both modes, the training and validation loss curves gradually decline with epochs, which indicates that no overfitting has occurred during training. The time of training the proposed DBN for 1000 epochs here is about 2 hours.

Models from test set
Six types of geoelectric models were randomly selected from the test set to examine the accuracy of the proposed DBN inversion. As shown in figure 11, the geoelectric models for inversion, DBN inversion results of TE and TM polarisation modes, and vertical resistivity profiles are placed from the left column to the right column, respectively.
Overall, in both the TE and TM polarisation modes, the proposed DBN inversion method can accurately depict the shape, range, boundary and position of the anomaly, and 635 Figure 10. Training curves of the proposed DBN inversion method in TE and TM polarisation modes.
invert the resistivity value despite a relatively large difference between the resistivities of the anomaly and surrounding rock, which indicates its promising inversion ability. For a more intuitive visualisation of the inverted resistivity values, the rightmost columns in Figure 11 display the vertical resistivity changes along the anomaly profile (marked by the black truncation line) in the form of curves (without smoothing). The resistivity curves of the inversion results and geoelectric models are basically consistent and show synchronous changes. In addition, in the third and sixth examples, although the inverted resistivity values of the anomaly are moderately higher than the true resistivity values owing to the deep burial depth and high complexity of the anomalous body, the resistivity change trends of inversion results and geoelectric models are completely consistent, and can still accurately depict the characteristics of the low-resistivity anomaly.

Extended models
To further demonstrate and discuss the generalisation ability of the proposed DBN method, we use the classical linear iterative inversion method, namely LSR achieved by the MT2DInvMatlab software (Lee et al. 2009), as a benchmark and make a comparison between them. The synthetic geoelectric models used for inversion comparison are unprecedented during DBN training. To ensure a fair comparison, both methods apply identity configurations to generate the corresponding resistivity data.
Three comparison examples are shown in figure 12, wherein the geoelectric models, inversion results of the proposed DBN method in TE and TM polarisation modes, and inversion results of the LSR method in TE and TM polarisation modes are displayed from the top to the bottom row, respectively. It is clear that both inversion methods can reflect the existence and invert the position of the anomaly in the three examples. Nonetheless, the proposed DBN method delineates the shape, range and boundary of the anomaly more precisely, and the inverted resistivity of both the anomaly and surrounding rock is closer to the overall true value. Meanwhile, the inverted low-resistivity anomalies in the LSR inversion results appear noticeably long tail-shaped. In example (Ⅱ), a false high anomaly phenomenon appears in the inversion results of the LSR, and the inverted resistivity of the surrounding rock in the LSR inversion results is far from the true value. These phenomena do not occur with the proposed DBN method, thereby confirming its superiority. The quantitative comparison results shown in Table 4 also demonstrate the superiority of the proposed DBN method. In addition, we observed that the inversion effect of the shallow anomaly is better than that of the deep anomaly in both the inversion results of the proposed DBN and LSR. This may be attributed to the fact that the shallow anomaly has more obvious features in the apparent resistivity data. Thus, improving the inversion effect of deep anomalies will be the subject of future studies.

Field data
In this part, we apply the proposed DBN method to the inversion of the aforementioned field data and benchmark it against the LSR method. The survey was conducted in the Bainiuchang mining area in Yunnan Province, China. The Bainiuchang mineral deposit is a super-large polymetallic deposit dominated by silver, lead, zinc and copper. The mining area is located in a plateau mountain area, wherein the terrain is high in the southeast and low in the northwest, and the valley erosion landform is developed. An EH4 electromagnetic imaging system was used for data acquisition. The survey line had a length of 2000 m. The measurement results of the physical properties of rock and ore samples show that the resistivities of lead-zinc ore and disseminated ore are several 636 Figure 11. Inversion results of six types of geoelectric model randomly selected from the test set. The leftmost columns indicate the geoelectric models for inversion. The middle two columns display the DBN inversion results of TE and TM polarisation modes, separately. The rightmost columns show the vertical resistivity profiles marked by the black truncation lines.
hundred Ω ⋅ m, and the resistivity of the surrounding rock is above 1000 Ω ⋅ m. There is a significant electrical difference between the ore body and the surrounding rock. Figure 13 shows the inversion results for the TM mode field data using the proposed DBN and LSR methods. We can observe that the two inversion methods yield similar resistivity images. Although the detailed resistivity values may vary slightly, the resistivity change trends are quite compatible. Two vertical boreholes, ZK1 and ZK2, were drilled in the survey area and were situated at 1662 and 1875 m along the survey line, respectively (see the white dotted lines in figure 13). Figure 14 displays the inversion results from the two boreholes in the form of resistivity curves. It is clear that the resistivity curves of two inversion methods are consistent in change trend. According to the drilling data, dolomite, marble, skarn and granite are the main lithologies in the survey area. The elevation ranges of the ore-bearing bed from the boreholes, ZK1 and ZK2, are 1395-1430 and 1450-1480 m, respectively. The lithology of the ore-bearing bed is marble, and the resistivity of the interface between the skarn 637 Figure 12. Inversion results of three extended models. Row (a) shows the geoelectric models for inversion. Rows (b) and (c) are the inversion results of the proposed DBN in TE and TM polarisation modes, respectively. Rows (d) and (c) display the inversion results of LSR in TE and TM polarisation modes separately. and granite is 200-500 Ω ⋅ m. The inversion result of the proposed DBN method is quite compatible with the borehole data. Compared with LSR, the proposed DBN can depict the low-resistivity anomaly more finely. Moreover, the DL method can predict the subsurface resistivity distribution in real time (a fraction of second without any iterations) if the network topology is trained in advance, which is significantly convenient and practical in field work.

Discussion and conclusion
In this study, we implemented a 2D inversion strategy for MT sounding data based on DBN, a DL algorithm. By verifying its inversion performance on synthetic and field data and comparing it with the classical linear iterative inver-  Figure 13. Inversion results for the TM mode field data. The results of the proposed DBN and LSR are shown on the left and on the right, respectively.

Figure 14.
Inversion results of the proposed DBN and LSR from the two boreholes. The left is the resistivity curves of the ZK1 and the right is the resistivity curves of ZK2.
nonlinear mapping relationship between the apparent resistivity and the corresponding geoelectric model and to achieve a good inversion response. Although the training process of a DNN can be time-consuming, we complete this process in advance. The actual inversion is orders of magnitude faster than conventional linear iterative inversion methods. Second, cluster analysis based on the k-means++ algorithm can provide prior information for guiding the construction of the sample dataset, and the success of the inversion of field data confirms its effectiveness. This is favorable for applying NNs to invert field data, especially when encountering a lack of prior information (e.g. in terms of physical properties of rock and ore samples). Cluster analysis also shows promise in automating the construction of sample datasets in future studies. Third, the introduced SMLR improves the convergence performance of the RBM, and the proposed DSoft activation function consistently outperforms other commonly used activation functions. Both of these improvements enhance the global optimisation ability of the DBN in MT inversion. Fourth, comparing to the classical LSR method, the proposed DBN method performs well in delineating the shape, range and boundary of anomaly bodies and suppressing false targets. Meanwhile, the proposed DBN method performs better on models from the test set than on the extended models. This suggests that it is necessary to enrich the synthetic geoelectric models by reasonably increasing the number of anomaly bodies, which may contribute to the inversion performance of DNNs. Overall, the proposed DBN method not only yields precise inversion results but also possesses extremely high efficiency, indicating that the inversion based on DL methods has promising potential. Nevertheless, inversion based on DL methods remains a challenge. First, DNNs realise inversion via learning, which requires a considerable amount of synthetic data. Constructing such a dataset could be completed within an acceptable amount of time for 1D and 2D problems, but it may be excessively time-consuming and dependent on computational resources in the case of three-dimensional (3D) problems. Thus, applying DNNs to solve 3D problems requires further research. Second, the dimensionalities of the input and output of the existing DNNs are fixed, and when the dimensionality of measurements changes, the topology of the DNN needs to be redesigned. Therefore, although this work shows the promising potential of DNNs in MT inversion, there is still extensive work remaining. The above-mentioned limitations of DNNs in MT inversion will be the focus of our future research. 639