Classiﬁcation of tectonic and non-tectonic seismicity based on convolutional neural network

In this paper, convolutional neural networks (CNNs) were used to distinguish between tectonic and non-tectonic seismicity. The proposed CNNs consisted of seven convolutional layers with small kernels and one fully connected layer, which only relied on the acoustic waveform without extracting features manually. For a single station, the accuracy of the model was 0.90, and the event accuracy could reach 0.93. The proposed model was tested using data from January 2019 to August 2019 in China. The event accuracy could reach 0.92, showing that the proposed model could distinguish between tectonic and non-tectonic seismicity.

In the process of seismic monitoring, we also record non-tectonic seismic signals, such as injection-induced earthquakes, landslides, mine seismicity (Wong et al. 1989;Malamud et al. 2004;Long & Ruan 2017) and other non-tectonic sources of seismic emissions.Distinguishing tectonic and non-tectonic seismicity is an important challenge relevant to both seismology and hazard mitigation.For example, by analysing only seismic events, we can investigate the focal mechanisms more accurately, which is of great significance for predicting the distribution of strong aftershocks and studying the stress distribution of structural zones.By identifying non-tectonic events more reliably and accurately, we can detect illegal mining, induced seismicity, blasting and nuclear explosions (Thandu et al. 2015;Yang et al. 2015).If seismic monitoring can quickly and accurately obtain the properties of the acoustic wave, rescue operations can be launched as soon as possible, thereby greatly reducing casualties and property losses.
Although the statistical characteristics of tectonic and nontectonic events are different, distinguishing between the two types of events accurately is still very difficult (Long & Ruan 2017).Generally, the classification of the two types is often performed manually or in a semi-automated fashion.
The manual approach depends on staff analysis of a large number of seismic records and is extremely challenging or even impossible to complete in real time.The semi-automated method is based on feature extraction and machine learning.Many effective acoustic wave features, such as focal depth, energy ratio, transient spectrum, waveform complexity, and frequency spectrum ratio (Shen & Zheng 1999;Lu & Huang 2010;Ren et al. 2019Ren et al. , 2020)), have been proposed to differentiate between tectonic and non-tectonic sources.Recently, machine learning has been widely used in pattern recognition (Raducanu et al. 2010), signal processing, prediction, and evaluation.Machine learning methods also have been used to analyse acoustic waves (Bergen et al. 2019).
For seismic detection, neural network logistic regression and the convolutional neural network (CNN) have been used to detect earthquakes (Mousavi et al. 2016;Wu et al. 2019).For acoustic wave phase picking, PhaseNet has used three-component seismic waveforms as input and generated probability distributions of P arrivals, S arrivals and noise as output (Zhu & Beroza 2019).For distinguishing the properties of seismic events, CNN has been used to discriminate low-magnitude earthquakes from low-yield anthropogenic sources accurately with spectrograms accurately (Tibi et al. 2019).Feature extraction based on empirical mode decomposition and diffusion mapping have been proposed to identify earthquakes and blasts (Bi et al. 2012;Bregman et al. 2020).Both methods are limited as they only focus on the classification between tectonic and blasting energy sources.
In this paper, we proposed a binary CNN model to distinguish between tectonic and non-tectonic seismicity using raw waveforms as the input.First, the fixed-length waveform was intercepted according to the onset of the P wave recorded in the official catalogue.Then, we trained the model, which included seven convolutional layers with velocity data provided by the China Earthquake Network Center (CENC).The Adam optimization algorithm was used to accelerate the training process and improve the performance of the classifier (Kingma & Ba 2014).The trained model could then extract features, which can be used to classify other acoustic waves accurately.For a single station, the accuracy was 0.90.We considered results of multiple stations comprehensively for a seismic event, of which the event accuracy was 0.93.Furthermore, we tested the model using events that occurred between January 2019 and August 2019, and the test event accuracy reached 0.92.The model we propose could distinguish between tectonic and non-tectonic seismicity automatically and accurately.

D ATA
The event waveforms used in this paper were provided by CENC.The distribution of stations is shown in Fig. 1.The information for the seismic events was selected from the official earthquake catalogue.Waveforms were sampled at 100 Hz on three channels corresponding to the three spatial dimensions, namely: BHE (oriented west-east), BHN (oriented north-south) and BHZ (oriented vertically).
Events that occurred between January 2009 and August 2018, with magnitudes ranging between M L 2.5 and M L 4.7, were taken into consideration.A total of 3799 tectonic earthquake events and 1043 non-tectonic seismic events, which included ep (blasts), sp (suspected blasts), ss (collapses), se (landslides) and ot (uncertain events), in China (except Yunnan, Sichuan, and Inner Mongolia) were used in our analysis.

Data preparation
Records have overlaps and gaps, caused by the instability of power supply, network delay, instrument failure and other factors; therefore, the recorded waveforms must be prepared for analysis and processing.
The merge method provided by ObsPy (Breckpot & Marzec 2010), which is an open-source project for processing seismological data, was used to remove the overlap in the waveform.The records with gaps were removed.
Pre-processed waveforms recorded in near stations, from which the epicentre distance was less than 200 km were used to train and test the CNN.For non-tectonic seismicity, the short-period surface wave was relatively obvious in the near stations.The surface wave of tectonic seismicity was usually not obvious for the near stations.The details of the data set used for training and testing the model are shown in Table 1.

Confirming the onset of P wave
The onset of the P wave was recorded in the CENC catalogue but in the actual scenario, we did not know the onset of the P wave before the catalogue was compiled.Therefore, we used the onset recorded in the catalogue to train the model and the onset determined by the short time average/long time average (STA/LTA; Stevenson 1976) and Akaike information criterion (AIC; St-Onge 2010) algorithms to test the model.STA/LTA calculates the energy ratio of the shortterm window and the long-term window to determine the interval that contains the onset of the P wave.The AIC function acts on the interval confirmed by STA/LTA to determine the accurate onset of the P wave.The definition of the AIC detector is where M is the start time of the short-term window, L is the end time of the short-term window, and x[] represents the amplitude of the signal.k is between M and L. The k that minimizes the AIC function is the accurate onset of the P wave.
The process for the onset of the P wave determined by the STA/LTA and AIC algorithms is shown in Fig. 2. Two solid lines represented the interval determined by STA/LTA.The dashed line represented the onset of the P wave, which was determined by the AIC function.
The arrival time of the three-component waveforms may differ slightly.In order to consistently identify the arrival times of these three-components, the P-wave arrival of each component was determined independently.

Identification and windowing of waveform
Due to the differences in epicentral distance, for an event, the length of the valid waveform from different stations was not consistent and could vary between tens of seconds and several minutes.Although the length of the most valid waveforms was about 80 s (Liu et al. 2003), waveforms from landslides may last a long time.The waveform of the landslide in the Maoxian station, which lasted nearly 100 s, is shown in Fig. 3. Thus, we selected a 100 s waveform after the onset of the P wave to maintain the completeness of the waveform.

Partitioning of the data set
In our analysis, 19 788 waveform records (including 14 176 waveforms belonging to 3267 tectonic seismic events and 5612 waveforms belonging to 653 non-tectonic events) were selected randomly as the training set.In order to evaluate the statistical significance of the model performance better, the number of tectonic records was nearly equal to non-tectonic ones in the test set.We choose 532 tectonic seismic events and 390 non-tectonic seismic events with a total of 4493 records as the test set, which were not in the training set.The partition of the data set is shown in Table 2, and the geographic distribution of the data set is shown in Fig. 4.

Data augmentation for the records of non-tectonic seismic event
Generally, supervised deep learning classifiers need to be trained with abundant data to prevent overfitting (Shorten & Khoshgoftaar 2019).Because the number of non-tectonic seismic records was less than the tectonic ones, data augmentation was done to construct a suitable training data set.We selected the waveforms with a high SNR (signal noise ratio) to inject the noise recorded by the same station.The SNR was defined as the ratio of the sum of the absolute amplitudes in the first 5 s after the onset of the P wave to the sum of the absolute amplitudes from 5 to 10 s before the onset of the P wave.The SNR of the three-component wave was defined as the smallest SNR of the three components.To maintain the waveform characteristics, we only repeatedly add the data from 50-100 s before the onset of the P wave as the noise data to waveforms whose SNR was greater than 3.The number of original non-tectonic records was 5612.The number of waveforms that could be added to noise is 852.The multiples of noise data added to waveforms were three and five.After adding noise, 7316 non-tectonic records were used to train the model.
We used the CENC onset to train the model and the onset determined by STA/LTA and AIC algorithms to test the model.There was about one second error between the CENC onset and the determined onset.Compared with the tectonic records, the amount of   non-tectonic records may not realize the characteristics of the waveform well.In order to eliminate the influence of the onset of the P wave error, a 1 s forward time-shift was applied to the non-tectonic records.After the time-shift, 14 632 non-tectonic records could be used to train the model.

Detrending and normalization
Because waveforms may deviate from the proper baseline, there were some long periodic components in the seismic wave.A tenth degree polynomial was applied to the waveforms to remove this variation.The waveform before and after detrending is shown in Fig. 5.After detrending, the waveform is normalized as follows: where μ and σ are the mean and standard deviation of the waveform X, respectively.

Generating data labels
The input training data set included records of tectonic and nontectonic seismicity.The corresponding attribute labels of [1, 0], for a tectonic event, or [0, 1], for a non-tectonic event, were associated with our catalogue.The data processing workflow is shown in Fig. 6.

M E T H O D S CNN model
We made the three-component wave as input to train the CNNs model (Perol et al. 2018).In order to build and train the model, TensorFlow (Abadi et al. 2016) which is an end-to-end open source machine learning platform (https://www.tensorflow.org)was used, and a Graphic Processing Unit (GPU) GTX1650 with frequency of 8GHz was used in the processing and training.We considered many other models and found that the performance was excellent with shallower and deeper models.However, the proposed network performed best among all.Fig. 7 shows the structure of the CNN model, which included seven convolution layers, three max-pooling layers and one full connection layer.
As shown in Fig. 7,32 * [1,5,3] in the convolution layer denoted using 32 1 * 5 convolution kernels.The padding of all convolution layers was 'VALID'.Each of 7 convolution layers was followed by a ReLU activation function.Convolution layers and max pooling layers used stride 2. The number of neurons in the fully connected layer was 128, and dropout with 65 per cent was applied after the fully connected layer.The result of the Softmax function from output layer was the probability of the input wave belonging to either the tectonic or non-tectonic seismicity.A binary cross-entropy loss function (Bengio 2009) and an Adam optimization algorithm with 10 −4 initial learning rate were used to train the model with a batch-size of 128 samples (64 tectonic samples and 64 non-tectonic samples) selected randomly.The model was trained for 4500 iterations.

S I M U L AT I O N R E S U LT S Performance measurements
This section describes the test results conducted to quantify the performance of the CNNs model using precision, recall, F1-score (Lo et al. 2010) where TP represents the number of true positive samples (a positive sample of which the result of the model is also positive), FP represents the number of false positive samples (a negative sample of which the result of the model is positive), FN represents the number of false negative cases (a positive sample of which the result of the model is negative), TN represents the true negative samples (a negative sample of which the result of the model is also negative) (Zhang et al. 2019).

Test of single station records
Records of a single station were used to train and test the CNNs model.Fig. 8 shows the variation of training and validation accuracy and loss with iteration.
The test results and the measures values with the model are given in Tables 3 and 4.  From Table 4, the precision and recall of tectonic and non-tectonic seismicity were both above 0.90 and the accuracy is 0.90.Therefore, our trained CNN model could discriminate tectonic and nontectonic seismic events reliably.

Test of multiple stations records from one event
Because waveforms of a seismic event will be recorded by several monitoring stations, we can distinguish tectonic and non-tectonic seismicity by simultaneously considering records for the same event from several different stations in order to get a more accurate result.We used the event accuracy, which was the ratio of the number of results correctly judged to the total number of results, to evaluate the performance of the CNNs model.
According to the majority voting rule, the attribute property of events type was discriminated by the results of all the stations which have the epicentre distance less than 200 km.For an event, if the number of records determined as tectonic seismic event was different from non-tectonic seismic event, the result of most records was deemed as the final label property of this seismic event.If equal, we conducted three tests to determine the events property.In these tests, the result of the nearest station, the median station and the farthest station were deemed as the property of event.We tested with the CNN model, which had the highest accuracy in the training process.The results of three tests are shown in Table 5.
From Table 5, the highest event accuracy was obtained by using the criterion of the nearest station.We tried to use the results of the several nearest stations to get a more reliable property of seismic events.The signal recorded by stations which was located away from epicentre may have a low SNR.So, the criterion of three nearest stations was considered.The event accuracy was 0.93, which was 0.01 higher than using the nearest station.
Tables 6 and 7 show the test results of the nearest station criterion and the three nearest stations criterion with the highest test accuracy model.Indicators were all distinguished with 'event' in the tables.

I N D E P E N D E N T T E S T
To measure the generalization performance of the model, we did an independent test.The independent test set consisted of 23 nontectonic events from January 2019 to August 2019 and 109 tectonic events from January 2019.The details of the independent test set and the results are shown in Table 8.
From Table 8, the average event accuracy was 0.92, which showed that the model could effectively detect the property of seismic events.The classifications of tectonic and non-tectonic seismic records are shown in Fig. 9.
From Fig. 9, the waveform which deviated from the baseline could also be classified accurately.Our model could be applicable to stations with instrument which is not aligned horizontally.

Comparison to semi-automated methods
To evaluate the performance of our model, we compared it with some semi-automated methods.As shown in Table 9, our model achieved the best performance.The accuracy of event was 0.16 and 0.28 higher than Bagging and SVM, respectively.

C O N C L U S I O N S
We have developed a CNN classification approach based upon the waveforms recorded by stations.We used a waveform intercepted with a fixed-length to train and test the CNN model to distinguish tectonic and non-tectonic seismic events accurately.
The model proposed in this paper could label features of two types of seismic events (tectonic and non-tectonic) and process three-component seismic records automatically.For a single station, the classification accuracy was 0.90.For an event, the event accuracy was 0.93.Furthermore, we used data from January 2019 to August 2019 to test the model, and the test event accuracy reached 0.92.The results showed that the model could accurately distinguish the properties of seismic events in China.This study compared our model with two semi-automated methods (Bagging and SVM).The results showed the superior performance of our model, which only relies on the acoustic wave without extracting features manually.
The proposed model has been deployed in CENC as one of the quick identification auxiliaries for the seismic network staff.In the future, we shall direct our research efforts towards training different models according to different regions.

Figure 2 .
Figure 2. Process for determining the onset of the P wave.

Figure 3 .
Figure 3.The waveform recorded by the Maoxian station from the landslide in Maoxian, Sichuan, 2017 July 24.

Figure 4 .
Figure 4. Geographical location distribution of seismic events in the data set.(a) Distribution of tectonic events in the training set.(b) Distribution of non-tectonic events in the training set.(c) Distribution of tectonic events in the test set.(b) Distribution of non-tectonic events in the test set.

Figure 5 .
Figure 5.The waveform before and after detrending.
, and accuracy.Precision reveals the rate of false detection, whereas recall reveals the rate of missed detection.The F1-score represents the comprehensive performance of precision and recall.Accuracy is the difference between the ground-truth and the result of model,

Figure 8 .
Figure 8. Training, validation accuracy and loss learning curves.

Table 1 .
Data set for training and testing binary CNN.

Table 2 .
Partition of the data set.

Table 3 .
Test of single station records.

Table 4 .
Measures values of single station records.

Table 5 .
Criteria of distinguishing event property.

Table 6 .
Test result of nearest station criterion.

Table 7 .
Test result of three nearest stations' criterion.

Table 8 .
Independent test data and results.

Table 9 .
Results of our model and semi-automated methods.