Processing automatic seismic event detections: an iterative sorting algorithm improving earthquake hypocentres using interevent cross-correlation

SUMMARY We present an iterative classiﬁcation scheme using interevent cross-correlation to update an existing earthquake catalogue with similar events from a list of automatic seismic event detections. The algorithm automatically produces catalogue quality events, with improved hypocentres and reliable P- and S -arrival time information. Detected events are classiﬁed into four event categories with the purpose of using the top category, with the highest assessed event quality and highest true-to-false ratio, directly for local earthquake tomography without additional manual analysis. The remaining categories have varying proportions of lower quality events, quality being deﬁned primarily by the number of observed phase onsets, and can be viewed as different priority groups for manual inspection to reduce the time spent by a seismic analyst. A list of 3348 event detections from the geothermally active volcanic region around Hengill, southwest Iceland, produced by our migration and stack detector (Wagner et al. 2017), was processed using a reference catalogue of 1108 manually picked events from the same area. P - and S -phase onset times were automatically determined for the detected events using correlation time lags with respect to manually picked phase arrivals from different multiple reference events at the same station. A signiﬁcant improvement of the initial hypocentre estimates was achieved after relocating the detected events using the computed phase onset times. The differential time data set resulting from the correlation was successfully used for a double-difference relocation of the ﬁnal updated catalogue. The routine can potentially be implemented in real-time seismic monitoring environments in combination with a variety of seismic event/phase detectors.


I N T RO D U C T I O N
In times of vast seismic networks and rapidly growing data storage capacity, the need for automatic analysis of seismic data becomes increasingly important. A variety of algorithms have been presented to automate the detection of earthquakes, by detecting either associated seismic phases (e.g. Allen 1982;Bödvarsson et al. 1999), or, more recently, aggregated (stacked) seismic trace attributes (e.g. Drew et al. 2013). Recent efforts to improve automatic picking accuracy (e.g. Küperkoch et al. 2010;Baillard et al. 2014) and the use of hand-picked data to calibrate weighting schemes (Di Stefano et al. 2006) intend to mimic picking by an expert seismologist and provide useful quality measures for the automatically picked onsets.
In a previous study (Wagner et al. 2017), we investigated the benefits of back-propagating and stacking different trace attributes, such as, for example, the ratio of a short-term average over a longterm average (STA/LTA) or the Kurtosis function to detect and locate seismic events without explicitly detecting phase arrivals (see also e.g. Langet et al. 2014;Pesicek et al. 2014;Grigoli et al. 2016). This type of seismic migration-based detection (MBD) uses an a priori velocity model to combine information from phase arrivals, which is the chosen trace attribute, at different observation locations. Compared to conventional phase-detection algorithms, advantages include a lower magnitude of completeness, that is, the earthquake magnitude above which the catalogue is considered to be essentially complete, and an improved true-to-false ratio of detected events (Wagner et al. 2017). However, an MBD on its own does not provide arrival time measurements because the method does not rely on phase picks to detect events. The stacked attribute parameter of an event (sum of attribute traces at the detected event hypocentre) can indicate event signal-to-noise ratio (SNR), depending on the attribute that is being stacked, and is often the intuitive choice for a preliminary quality assessment of detections. However, it can be difficult, and to some extent arbitrary, to determine an Processing automatic seismic event detections 1269 adequate threshold value-for example, by testing different thresholds (Wagner et al. 2017)-that maintains an acceptable true-tofalse ratio without discarding too many good detections. Furthermore, due to the grid-search nature of MBD, the accuracy of estimated hypocentres is strongly influenced by the grid sampling and the velocity model used. For the events to be used in, for example, traveltime tomography, manual examination including phase picking is required.
In regions with repetitive seismic activity, for example, fault zones or areas with anthropogenic seismicity, events can often be grouped into families (clusters) based on the similarity of their observed waveforms. Similar to interstation cross-correlation (e.g. VanDecar & Crosson 1990;Cansi 1995), interevent crosscorrelation (e.g. Poupinet et al. 1984;Schaff & Waldhauser 2005) can be used to measure the similarity of and time delay between two same-phase signals from different earthquakes observed at the same station.
In this study, we use interevent cross-correlation (see Section 2.1) in an iterative scheme to scan a list of MBD automatic event detections (referred to as 'DET') for events that are similar to a set of reference events (referred to as 'REF') from an existing earthquake catalogue. While the use of differential traveltimes by waveform cross-correlation for relocating new events relative to previously known events has been investigated in previous studies (e.g. Lin et al. 2007;Waldhauser & Schaff 2008;Waldhauser 2009), these usually require knowledge about the approximate location of the new events and/or onsets of observed phase arrivals. In our case, the events to be processed (DET) consist of detections, which have not yet undergone quality control or phase analysis-manual or automatic. Whether a detection is a real event or not is unknown to begin with and only crude location estimates with theoretical phase arrival times exist. Faced with this situation, the presented algorithm does not only measure accurate differential traveltimes by cross-correlation, but also estimates absolute P-and S-phase onset times at each station for the detected events relative to manual picks of well-correlated reference events (Section 2.2) for an improved absolute hypocentre location. In a second step, the detected events are divided into four categories (Section 2.3) based on (i) their similarity to the reference events, (ii) the assessed quality (see next) of the newly determined phase arrival times and (iii) how reliable the new hypocentres are when relocated in the previously determined velocity model using the estimated P-and S-arrival times. The different event categories provide an indication of which events to prioritize for manual inspection, with the aim of using the events in the highest quality category directly for local earthquake tomography without manual analysis. Since the primary selection criteria is the similarity to existing reference events, the algorithm naturally prioritizes detected events with similar source mechanisms and originating from roughly the same source region as the reference events. Depending on the investigated area and local seismicity, it is possible to adjust the here established selection criteria to include events of a lower degree of similarity at the expense of hypocentre accuracy. In Section 4, we discuss how this can be used to potentially identify previously unknown types of events, that is, with different source mechanism and/or from outside known seismicity clusters.

M E T H O D O L O G Y
Prior to cross-correlation, the three-component seismic data (see Section 3.1) were filtered with a 10 per cent cosine tapered bandpass filter. After experiments with different frequency bands, a passband from 2-22 Hz was chosen as it produced consistent results for a variety of events. The low cut-off frequency (2 Hz) removes microseismic noise, while all dominant frequencies of observed S-phases are passed. Similarly, the high cut-off frequency (22 Hz) is the lowest possible cut-off to remove high frequency noise but keep all dominant frequencies of observed P phases. For each event (REF and DET), the filtered East and North component seismograms are rotated to radial and transverse components assuming a straight ray path from the event epicentre estimate to each station. The geometry (source locations, velocity structure) in our chosen study area means that most arrivals can be expected to be travelling relatively close to the vertical on arrival at the surface. Therefore, we only use the filtered vertical component for P-wave analysis. For S-waves, both horizontal components (filtered and rotated) were correlated individually and the correlation with the highest value was chosen.

Cross-correlation
Cross-correlation can be used to measure the similarity of two timeseries for different time-shifts or lags between them. Here, we use the concept to determine the time-shifts (differential or relative time) at which the P-or S-phase signals from two earthquakes recorded at the same station are most similar, that is, at what lag their correlation is highest (cf. Schaff & Waldhauser 2005). Because we seek to correlate time segments with relatively short transient signals, that is, the signal length is usually much smaller than the correlation window, we use time domain instead of frequency domain correlation. Other advantages of time domain correlation include, for example, a higher sensitivity at large time lags as shown by Schaff et al. (2004).
The signals from event pairs were cross-correlated station by station. In order to create a complete differential time data set, all REF-REF event pairs were correlated prior to starting the REF-DET correlation analysis.
A time segment (correlation window) starting at the P or S onset T ref for a given REF event was used as a reference signal. For the correlated (searched) REF or DET event, the chosen time segment was centred on the corresponding phase onset T with a length of 2 τ max (see Table 1). For DET events, theoretical P-and S-arrival times were determined by a 3-D velocity model and the source origin time and location. For REF events, manual picks were used at stations where available, and theoretical times otherwise. The shorter reference segment (x) was run along the long segment (y), calculating zero-lag cross-correlation on each step. We used the zero-mean normalized cross-correlation c (signed coherence) as similarity measure, which is defined for time lags τ as with the variances σ 2 x and σ 2 y , and the means μ x and μ y of the timeseries x and y, respectively, within their corresponding correlation windows of length N points. Normalization by the product of variances assures that c is bound between [− 1; 1] and comparable for different pairs of x and y. We refer to the maximum (positive) value of c(τ ) as the correlation coefficient (CC) and the corresponding time lag τ cc , summed with the absolute time difference between the correlation windows, defines the differential time between the two signals. Note that the choice of using the signed coherence implies that signals of opposite polarity will have negative correlation and, therefore, only same-polarity signals, originating from sources with similar radiation patterns, can be detected by this method. The CC 1270 F. Wagner et al. was estimated at subsample precision using quadratic interpolation. As the highest signal frequency used is less than one quarter of the sampling rate, three-point interpolation was adequate.
Due to imperfect information about velocity structure, and uncertainties in estimated source times and locations for the DET events, large time lags may occur between x and y. We limited the maximum searched time lag τ max to the smallest observed (picked) P-to-S time, 0.6 s, to avoid correlation between P and S phases. Ideally, the correlation window should have the same length as the target phase signal (see Section 4.2). In practice, however, this is hard to achieve considering that signal lengths (and signature) vary for different source mechanisms and travelled paths. After experimentation with various possibilities for defining window lengths, we decided to use one single length for all phases identified as P (0.5 s), and a different length for S (1 s).

Estimation of onset times for detected events
For the purpose of using the detected events for local earthquake tomography, we estimated absolute P-and S-phase onset times (not generated by the MBD method). For a given DET event, the P (and S) onset times were estimated at each station as where the summation is over the i = 1..N, the number of REF signals that were well correlated (CC > 0.8) with the DET signal at the given station. The weights w i are defined as the inverse of (1.01 − CC i ), which was found to be an adequate proxy for the inverse of the time-lag uncertainty observed while comparing measured (correlated) time lags with differences in manually picked times. A similar relation was observed by Sgattoni et al. (2016) in synthetic correlation tests with seismic data from Katla volcano, South Iceland. Note that the high weighting factor (approx. 20), between CC = 1 and CC = 0.8, strongly favours time lags with a high CC in the weighted average. The weighting, however, was deemed appropriate after comparing estimated and picked onset times for several events (see example in Fig. 1). We use the standard deviation of the weighted time lags (requires at least two samples) as a quality measure for the computed P and S onset times. After comparing computed onset times with manual picks of the same phases for 371 events, we define P-and S-arrival time estimates as high quality if the standard deviation is below 0.06 s. Only high-quality arrival times were used in the subsequent analysis (Section 2.3). Fig. 2 shows histograms of the P and S residuals of high-quality phase onset estimates w.r.t. the manual picks. The mean P-and S-residual time was −0.007 and 0.01 s, respectively. The root-mean-squared (RMS) residual fit was 0.085 and 0.12 s for P and S, respectively.

Event relocation and classification
After the REF-DET correlations have been computed and the automatic P-and S-phase arrival times have been determined, a series of selection criteria is applied to the DET events. The following criteria and thresholds are steered towards a selection of high-quality events, intended for direct use in, for example, seismic tomography without further manual inspection. If, on the other hand, the focus lies more on counting events and less on event quality, for example, for seismicity studies, the parameters can be adjusted to select more events with, for example, less observed phases (see Section 3.2).
First, the similarity to existing REF events was evaluated. We define that a DET event is sufficiently similar to an REF event if it shows at least four 'good' phase correlations, that is, above a chosen CC threshold (0.8). The choice of a CC threshold to determine if a correlation measurement is useful or not is related to the choice of correlation window length. In general, the CC of two time-series including a coherent signal of duration shorter than the total length of the correlation window and (uncorrelated) noise decreases as the correlation window becomes longer because the ratio of correlated signal to uncorrelated noise energy within the window decreases.
The CC-threshold value should thus be chosen in accordance with the correlation window length used, expected signal length and the purpose it is used for (see Section 4.2). DET events that were sufficiently similar to, that is, had a CC greater than 0.8 with at least one REF event at four or more stations, were selected for further processing. The remaining DET events were categorized as low-similarity events (category C4).
The DET events selected in the first stage were then tested for quality and quantity of available onset times. Theoretically, event location requires a minimum of four arrival times to determine the event origin in time and space. We increased the minimum requirement to seven high-quality arrival time estimates (with an error below 0.06 s, see Section 2.2) to reduce the effect of potential outliers. Events with a sufficient number of arrival times were relocated, remaining events were moved to category C3.
Relocation was performed with a global grid-search algorithm based on PStomo eq (Tryggvason et al. 2002) and using the estimated onset times. The grid-search algorithm finds the optimal hypocentre location in space by minimizing the misfit between the onset time estimates and theoretical times from traveltime fields generated for all stations using the (fixed, not inverted) 3-D P-and S-wave velocity models. The optimal origin time is then determined in accordance with the optimal location. The global search scheme reduces the risk of the location becoming stuck in a local minimum. Events that were relocated on the border of the search grid were added to the C3 event category. We note that the DET events are not located relative to the REF locations, that is the station delay times with the REF events were not used for the inversion at this stage. Although the estimated onset times were derived using relative time information, the averaging of delay times from different events for each station produced a new, independent set of absolute arrival  time data. The source-receiver geometry information is thus not constraint by relative time data but by the velocity model. After relocation, the new hypocentre location and error were evaluated in a third selection step to identify potential mislocations caused by erroneous phase onset estimates. Because the correlation of signals from two events generally decreases with event separation as source mechanisms and path effects start to differ (see e.g. Schaff et al. 2004), high coherence values are less likely at great interevent distances. In our case, a coherence above 0.8 at distances greater than 5 km was sometimes related to inappropriately aligned correlation windows (misidentified phases or poor initial hypocentre location) or phase arrivals overlapping with phases from small undetected events with similar origin time but different location from the target event. To eliminate such errors, we set a straight path maximum distance threshold of 5 km between relocated DET events and the nearest REF event. For DET events with at least one REF neighbour within the maximum distance threshold, the location error was used to indicate whether the location was deemed reliable. The location error was approximated with the square root of the trace of the hypocentre covariance matrix and the maximum error threshold was set to 2 km. DET events failing either the maximum distance or maximum location error criterion were moved to category C2, and events passing both criteria to the top category C1.
The selection procedure thus categorizes all DET events into four classes (see also Table 2): (i) C1 events having high similarity to at least one REF event with sufficient high-quality arrival time observations to be used, for example, local earthquake tomography without further manual analysis; (ii) C2 events with sufficient, but lower quality arrival time observations, that is, potential outliers, have the highest priority for a manual inspection; (iii) C3 events with insufficient phase onset information, for example, due to low event magnitude or noisy conditions, are less useful to derive velocity information from, but perhaps still interesting for seismicity studies; and (iv) C4 events have insufficient similarity to REF events and consist either of unique, unknown events (w.r.t. the existing catalogue) or simply false detections, depending on how complete the reference catalogue is with respect to the local seismicity.

Iterative procedure
Prior to the REF-DET correlation, the existing REF catalogue of manual phase picks was completed by filling in missing phase observations with onset time estimates as described in Section 2.2 but based on correlations with the other REF events. As the REF events were originally located in a 1-D velocity model, they were also relocated using the global grid search (Section 2.2). The 3-D P-and S-velocity model used in this study is the same as the one used for the MBD event detection (see Wagner et al. 2017).
After a first iteration of the entire routine described in the previous Sections-REF-DET correlation, P and S onset time estimation, event relocation and classification-the top category, C1 events were merged with the REF catalogue. The routine was then repeated  with the updated catalogue and the remaining C2-C4 events until no more events were categorized as C1. For the new REF events, only the arrival time estimates with a residual less than 0.2 s compared to theoretical arrival times (determined using the velocity model) were included in further analyses. We consider the rejection threshold of 0.2 s to be conservative, removing only spurious outliers and, therefore, not significantly biasing the earthquake data, hence the following velocity inversion, towards the original model.

Study area, seismic network and data
The study area (Fig. 3) Wagner et al. (2017). The most prominent features of the local seismicity distribution include the north-south and east-west striking lineaments in the south, interpreted as aftershock sequences and triggered seismic activity, respectively, associated with the May 2008 events (Brandsdóttir et al. 2010;Decriem et al. 2010). The activity south of station 'eng' and the much smaller cluster next to 'skd' are presumably triggered by re-injection of waste water from the geothermal production at the Hellisheidi and Nesjavellir power plants, respectively (Hardason et al. 2010). Additional clusters of seismicity covered by the eastern part of the seismic network are most likely related to geothermal activity within the Hengill fissure swarm. Estimated local magnitudes of the reference events ranged from −1 to 2. The magnitudes were calculated using a linear regression of logarithmic S-wave amplitude and epicentral distance (see Wagner et al. 2017) calibrated to the SIL magnitude scale used in the public catalogue (Jakobsdóttir 2008). A list of 3978 events was detected using the migration and stackbased event detection (MBD) algorithm from Wagner et al. (2017) with data from 48 selected days between December, 2010 and August, 2012, of which 29 d overlap with the REF catalogue. The selected days show a wide geographical distribution of previously known events as listed in the SIL catalogue. The STA/LTA trace attribute was used as detection function for the MBD and the searchgrid sampling was refined in two steps for each day. For most days, the final grid sampling was limited to 1 km to reduce runtime. For some days, a 0.5 km sampling, providing a better spatial-and temporal (cf. Wagner et al. 2017)-resolution, was found beneficial due to a higher seismicity rate. The stack detection threshold was set to 3.6 and 2.9 for the coarser and finer grid sampling, respectively, based on the assumed background noise level of the maximum attribute stack curve (Wagner et al. 2017). We removed 630 detections of events that were already in the REF catalogue, resulting in an initial list of 3348 new DET events. The DET events are distributed throughout the entire study area. Highlighting DET events with a high-attribute stack value, that is, presumably high-SNR events, delineates a seismicity pattern matching that of the REF events (Fig. 4).

Results
Prior to running the algorithm, a randomly selected subset of 177 DET events was visually examined as a reference for event quality in the different event categories. Events were judged to be 'real' events when at least four seismic signals, presumably originating from a common source, were clearly observed. We also checked whether the estimated phase onsets were misidentified (as P or S), that is, an event was assessed to be a true event at the location indicated, and potentially useful for tomographic analyses.  The first iteration of REF-DET event correlation and subsequent relocation of DET events with the selection criteria presented in Section 2.3 yielded 300 C1 events (9 per cent of the DET list). The remaining DET events were categorized into 43 C2 (1 per cent), 845 C3 (25 per cent) and 2160 C4 (65 per cent) events; percentages with respect to the initial DET list. The previously examined events were distributed unequally in the different categories, that is, 6 per cent of C1, 40 per cent of C2, 8 per cent of C3 and 4 per cent of C4 had been examined. All (100 per cent) of the examined C1, C2 and C3 events were judged to be 'real' (useful) seismic events. About onefifth of the examined C1 events had one misidentified phase. Those misidentified phases were easily detected, due to a high-traveltime residual (over 0.2 s) at the given station after relocation, and could, therefore, be rejected automatically from further analyses. Events in classes C2 and C3 had increasing proportions of events with misidentified phases and often more than one misidentification per event, which confirms the expected lower event quality compared to C1 events. The examined C4 events consisted of only 9 per cent that were judged to be 'real' events, whereas 68 per cent had less than four clear P and S first arrivals, but were still judged to be seismic events, and 23 per cent of the C4 events did not show any recognizable seismic signal, that is, they were considered false detections.
After repeating the procedure for six iterations, a total of 428 events (13 per cent of the initial DET list) were categorized as C1 and, when added to the REF catalogue, increasing the original number of catalogue events by 39 per cent. Of these, 128 or 30 per cent were added after the first iteration. The number of C2 events increased slightly to 47 (1 per cent), C3 increased to 917 events (27 per cent) and category C4 was reduced to 1956 events (59 per cent); percentages with respect to the initial DET list. The distribution of examined events slightly changed to 15 per cent of C1, 35 per cent of C2, 3 per cent of C3 and 4 per cent of C4. Again, 100 per cent of the examined C1 and C2 events were 'real'. One of the previously identified false detections was now found in category C3, that is, it had sufficient coherence with at least one REF event, which was most likely a C1 event that was added to the REF list after the first iteration. The proportion of 'real' events in C4 decreased Events from the first iteration as the red circles and the following iterations as the blue squares. The initial reference catalogue is shown with the grey dots. New events are located close to the initial event distribution.
slightly from 9 to 7 per cent, indicating that some of the true events were successfully moved to higher categories (C1-C3) after the first iteration. The C1 events are located close to the original REF event clusters (Fig. 5).
The selection parameters presented above (run A in Table 3) were chosen in order to produce a high-quality list of events with redundant phase information. If, on the other hand, the priority is finding as many real events as possible with less importance given to the quality of hypocentre estimates and arrival times, a more relaxed set of selection criteria might be chosen (run B in Table 3). For example, by reducing the number of required phase measurements to four (instead of seven), a total of 572 C1 events (Fig. 6a) were produced compared to 428 C1 events in run A. All of the examined C1 events were still judged to be 'real', although slightly more phases were misidentified. This is likely due to an increasing number of inappropriately aligned correlation windows caused by a higher location error in a previous iteration when using only four arrival times for relocation.
For an even more relaxed event selection, we removed the maximum event separation criteria after relocation (run C in Table 3). The algorithm produced 1001 C1 events, that is, more than twice that of run A. Again, all of the examined C1 events were judged to be 'real'. Although, compared to events from runs A and B, an even higher proportion of misidentified phases was noted and hypocentre locations were generally more spread, including some dubious locations far outside known event clusters (Fig. 6b).
Overall, in each event cluster the number of found C1 events was more or less proportional to the number of reference events in the same cluster, indicating a similar seismicity distribution for the two time periods covered by the initial and final reference catalogue. Fig. 7 shows the logarithmic local magnitude distribution for the final catalogue (run A) compared to the initial reference catalogue. The two curves are nearly parallel indicating a proportionally equal increase of events in each magnitude bin. The b-value of each curve was estimated according to Tinti & Mulargia (1987;as reported in Marzocchi & Sandri 2003). Assuming the same magnitude of completeness of −0.3 for both catalogues the b-value increased   marginally from initially 0.95 to 0.97 for the final catalogue. For runs B and C, a magnitude comparison was considered unreliable due to lower quality hypocentre and phase pick estimates for some events.

Double-difference relocation
After the final iteration of the event classification routine, the updated REF catalogue, that is, the original REF plus additional DET events categorized as C1, were relocated with a double-difference (DD) algorithm. All REF and C1 events were correlated with each other to create a complete differential time data set with the updated (reliable) arrival time information. To ensure a good (absolute) starting location of the event hypocentres for the DD inversion, seven 3-D tomographic iterations of simultaneous (single event) hypocentre and velocity inversion were run using PStomo eq (Tryggvason et al. 2002) with the absolute traveltimes and a 1-D starting velocity model. The resulting 3-D velocity model was then used for seven iterations of DD event relocation without velocity inversion. The DD algorithm follows Wolfe (2002), that is, no distance weighting was applied. The DD locations were solved for using the iterative LSQR solver (Paige & Saunders 1982) as singular value decomposition (see e.g. Menke 1989) proved to be very slow. For an additional speed increase, the events were grouped into (geographical) clusters with a maximum distance of 2 km between events of the same cluster and a maximum number of 800 events per cluster. Each cluster was relocated separately and only differential time measurements with CC above 0.8 were used. The evaluation of the velocity inversion is not within the scope of this work and intended for a future study. Evaluating only the arrival time misfit instead, we note that the RMS residual dropped from 0.14 Downloaded from https://academic.oup.com/gji/article-abstract/219/2/1268/5548776 by Uppsala Universitetsbibliotek, frederic.wagner@geo.uu.se on 06 November 2019 and 0.26 s to 0.04 and 0.05 s for P and S times, respectively. The RMS misfit of the differential times dropped from 0.055 and 0.057 s to 0.039 and 0.032 s for P and S, respectively, during the seven DD iterations.
In general, the relocated events cluster more densely compared to the initial event locations (Fig. 8), and form more linear features, matching the prevalent NS strike and subvertical dip of faults and fissures in the study area. A detailed interpretation of the relocated events is outside the scope of this study, but we note that the observed features are to some extent similar to structures found in other studies. In particular, the induced seismicity cluster in the northwest now shows three subparallel lineaments (see E-W projection at −21.4), which may coincide with injection sites at the Hellisheidi power plant and associated fissures (Hardason et al. 2010). The N-S striking aftershock sequence in the southeast appears to separate in two lineaments after DD relocation (see E-W projection at −21.15 • longitude) and the southern E-W striking lineament now shows an en-echelon pattern in map view, both similar to features observed by Bessason et al. (2012). Note that events that were not included in any event cluster after the grouping (e.g. single events at depths around 10 km) were ignored by the DD relocation and are thus omitted from the relocated events in Fig. 8.

D I S C U S S I O N
Faced with the output of an automatic event-detection algorithm, the seismologist has to separate the 'true' events from 'false' detections. In this study, we explore how waveform correlations with already picked events can be used to automatically categorize the event detections. Though a separation of the detections into only two categories may in practice be done, the interesting parameter is then the success ratio, that is, the ratio of the true and false events in the respective category. Depending on what is most important, the end member result is either 100 per cent true events in one category, or 100 per cent false events in the other. The ideal result, that is, a perfect result in both categories, was not achieved in this study, and may also be very difficult to achieve. Also, the definition of what exactly a 'true' event is may differ depending on context. For example, we may be very confident a true event has been recorded without being able to clearly identify individual phases or pick their onsets. If counting the number of events is the purpose, this may suffice. If, on the other hand, we want an accurate estimate of the origin time and event location (using common location techniques), at least four phase onsets need to be clearly identified.
In this study, we worked with the output of the MBD algorithm of Wagner et al. (2017). With different selection criteria, 3348 automatically detected events were allocated into four categories of varying event quality. The criteria for inclusion in the top category (C1) were rather conservative, as the purpose of these events was to be suitable for further analyses using tomographic methods without the need for manual inspection and quality control. Acceptance for poor event locations or poor phase picks was low, also because the C1 events were ultimately used as reference events after being merged with the REF catalogue. For, for example, local earthquake tomography, events in second category (C2), with the only violated criterion being their large distance to existing reference events, have highest priority for manual inspection. Events in categories C3 and C4 have a lower priority for manual inspection as they contain a larger proportion of low-quality events (fewer correctly identified phase onsets) and false detections, respectively. However, our results show that both categories C3 and C4 may still contain useful events, even if at a lower success rate, and thus manual inspection might be considered, depending on the purpose of the study. Reasons for interesting events remaining in the low-priority categories can be a low similarity with existing REF events, implying an incomplete REF catalogue for the given area and time, low SNR or inappropriately aligned correlation windows-the latter being mostly due to poor initial hypocentre estimates, or, in some cases, error in the velocity model. Based on visual examination of a randomly selected subset of events in each category, a decreasing true-to-false ratio and higher number of low-quality events in terms of the number and accuracy of estimated individual onset times was assessed for increasing category number, suggesting our scheme is useful to evaluate the performance of a seismic event detector. The scheme also appears successful in that only the lowermost category (C4) has a significant number of false detections. Extrapolation of the statistics of events inspected in this study suggests approximately 14 per cent false detections among the DET events.
In the following sections, we discuss the effect of various parameters we used on the selection of events into the different categories.

Reference catalogue
Our scheme is built on waveform correlation with previous events, implying a risk that the scheme could miss unique or unusual events. Tests with a very reduced REF catalogue (20 events), limited to a single cluster, revealed that other clusters not represented in the reduced catalogue were entirely missed by the algorithm. Results were significantly better when the same number of events were taken from several clusters geographically distributed in the area, though still fewer events in total were marked as C1 compared to using the full REF catalogue. This indicates that even though a geographically sound selection of events may represent the general seismicity, it does not guarantee that all event types are represented as, even within a geographically limited area, events may have a very different character, and event characteristics may also change over time. It is intuitive to suggest that the REF catalogue should thus be established with events from different regions as well as different periods in time.
Similarly, the difference in event magnitude can also affect the correlation of two events. Waldhauser & Schaff (2008) mention a linear decrease of CC with increasing magnitude difference (cf. their fig. 7). Our data show a similar decreasing trend including a slight CC increase at magnitude differences >2 before the curve breaks down due to low number event pairs at ML > 2.2 (Fig. 9). The decrease is more pronounced for S-wave correlations, which show generally higher CC compared to P-wave correlations. A similar increase is observed in Waldhauser & Schaff (2008). Note their bin sizes are larger than ours (0.5 compared to 0.2). In our case, the slight increase is a result of individual stations showing high-CC values towards larger magnitude differences. Without further analysis of this phenomenon and assuming that all processed event magnitudes fall well within our chosen frequency filter band (2-22 Hz), we speculate that the numbers for small magnitude differences are dominated by small magnitude event pairs with low SNR, hence low CC compared to correlation of an event pair with a large magnitude difference involving at least one event with a larger magnitude and thus likely a higher SNR. Alternatively, the low-frequency signal of large magnitude events may correlate with low-frequency noise present within the higher frequency signal of a small magnitude event. In either case, the dependence of the CC on an event pairs' magnitude difference means that an ideal reference catalogue should 1276 F. Wagner et al.  also cover the complete magnitude range of the study area and/or individual seismic clusters. In our case, examination of the available MBD results, for example, in the form of a map highlighting detected events with a high stack attribute parameter as shown in Fig. 4, appears to provide a good initial estimate of the general seismicity pattern in the investigated area and for the analysed time period. The similarity in hypocentre and magnitude distribution between the initial and final (enhanced) catalogue (Figs 5 and 7) indicate that our choice of reference catalogue was representative for the local seismicity during the analysed time.
Besides a reduced capability of finding new events, an incomplete REF catalogue may also lead to a reduced number and lower quality of onset time estimates (see also Section 4.3). Hence, the selection criteria might need to be adjusted. For example, the required number of phases can be lowered (here from seven to four) at the expense of allowing events that are more likely to have a greater location error. To compensate for the introduced error, a lower event separation threshold (e.g. 3 km instead of 5 km in this particular case) can to some extent filter events with poor hypocentre locations.

Correlation coefficient threshold
Given a reference catalogue representative of the seismicity in the study area, the CC threshold controls the first selection stage. An appropriate CC threshold should be chosen in accordance with the data at hand. The CC of two signals depends on various factors: event separation and proximity to receiver (see e.g. Schaff & Waldhauser 2005), differences in source mechanisms, SNR, choice of frequency band and correlation window length, and methodology, for example, frequency or time domain correlation. Schaff & Waldhauser (2005), for example, compare real data correlation results for different window lengths to choose the threshold accordingly. We compared delay times from picked phases to those from correlation measurements for various CCs. For the data set investigated and choice of processing parameters, the RMS residual between the picked and correlated delay times was observed to increase for decreasing CC, with a faster increase below 0.8, which was consequently chosen as CC threshold.
For given signal and noise characteristics (e.g. SNR), simple expressions for the expectation of the coherence can be found, that is, by evaluating eq. (1) after adding an uncorrelated noise term to the time-series x and y and their variances. However, because we use short (with respect to the correlation window) transient signals, we conducted some numerical tests to illustrate the effect of the correlation window length on the coherence. Two time-series containing Gaussian noise and an identical sine-wave signal of finite length 10 times shorter than the noise series are correlated at zero time lag using various correlation window lengths. Fig. 10 shows the mean coherence from 500 simulations for different window lengths and SNR. Maximum coherence for each SNR is observed when the correlation window length equals the signal length (1 on x-axis), and increases with increasing SNR as expected. As the window becomes longer than the signal length, correlation rapidly decreases with window length because the noise series increases with the correlation window while the signal length remains the same. The decrease is faster at lower SNR. Although the use of white noise is not a realistic approximation of seismic noise, particularly within the wave coda, the test illustrates the basic relation between coherence and window length, and allows us to estimate the SNR our method is sensitive to given the parameters used. Our picked P-and S-wave phases had dominant frequencies in the range of 8-16 Hz and 4-8 Hz, respectively. Assuming signal lengths of 2-3 periods, the ratio of correlation window (0.5 and 1 s) to signal length ranges from 1.3 to 4. According to Fig. 10, signals with an SNR around 5 should generally correlate above the CC threshold of 0.8, and signals with an SNR as low as 3 can correlate above the threshold if the window to signal length ratio is 1.3 or less.

Uncertainty of estimated phase onset times
The uncertainty of the estimated phase onsets (Section 2.2) is steered by the accuracy of the manual reference picks (picking error) and the mean time lag error of the involved correlations. The averaging of a sufficient number of samples (delay times) should reduce or even remove random, zero-mean error, while systematic error, for example, when phases are picked consistently too late, or when the moving correlation window detects a signal different from the target phase, remains.
We use the standard deviation of the averaged arrival times as an uncertainty measure for the computed phase onset times. It follows that the estimated uncertainty value is sensitive to inconsistencies between the averaged times, for example, individual mispicked phase arrivals, but not to consistent, systematic error.
Relocation of the DET events with the computed onset times, in general, improved hypocentre estimates over the initial estimates from the MBD. Besides the uncertainty of the onset times, the accuracy of the new hypocentre estimates also depends on the velocity model error.

Event separation distance
For the purpose of extending the list of reference events with correlated events, the larger the distance an event is from any previously examined event, the potentially more interesting it is. On the other hand, fewer reference events might have been used to produce the phase onset estimates of distant events, leading to a higher probability that these estimates, and as a consequence the derived hypocentre location, have large error. Having a maximum distance threshold for events being selected should reduce the risk of larger errors in the computed onset estimates and mitigate the propagation of error and misaligned correlation windows to subsequent iterations. At the same time, it may prevent the selection of potentially interesting events outside the previously known event clusters, and thereby limit a possible geographical expansion of the REF catalogue. Hence, an event separation limit can significantly impact the trade-off between quantity and quality of selected events. Tests suggested that for our data set the distance threshold brought little improvement after setting the number of required phase observations to seven (run A, Section 3.2), suggesting that the reference events covered most of the seismic activity in the area and events with at least seven phase observations were sufficiently well located in close proximity. In some scenarios, however, a lower number of phases should be required, for example, in case of a smaller reference catalogue (see Section 4.1). A smaller event separation threshold can then partially compensate for the arrival time error, introduced using a limited REF catalogue, by avoiding selection of events with poor location.

Prioritizing manual inspection using the MBD attribute stack parameter
The attribute stack parameter is an intuitive choice to qualify detections from MBD and select events for further analysis. However, depending on which attribute function has been used to identify the presence of seismic signal the meaning of the stack value can vary. For example, the STA/LTA used in this study is closely related to SNR, but the commonly used Kurtosis function (see e.g. Langet et al. 2014;Cesca & Grigoli 2015) not necessarily so. Similarly, a high stack value-relative to its background level-is a good indication of an event being real, but a low stack value does not necessarily define a false or low-quality detection.
Histograms of the number of events as a function of the event attribute stack parameter for the different event categories (Fig. 11a) show that with increasing category number (decreasing quality) the events have generally a lower attribute stack value. Comparing the proportion of events per bin on the x-axis to the proportion of original DET events per bin for each category (Fig. 11b), reveals a positive relationship between attribute stack and the likelihood that events are real, that is, from a high category (C1 or C2). An example stack threshold (the dotted vertical line in Fig. 11b) is drawn just after the proportion of C1 events to DET events has increased and the proportion of C4 events (low-quality category) has decreased most significantly. The threshold might be used to prioritize manual inspection for events from categories C2-C4 with a stack value above the threshold.
The shape of the curves in Fig. 11 is undoubtedly controlled by how we set the selection criteria for an event being selected (Section 2.3), especially the CC threshold (Section 4.2). We note, though, that the most dramatic rise of the C1 curve is between stack values of 3 and 4, which, assuming the stack value is a good proxy for SNR, is in some agreement with Fig. 10 that indicates that events with an SNR below 3 will, in general, have a poor correlation (below threshold).

Future improvements
Correlation, and thus relative traveltime accuracy may be improved by evaluating all three components for both P and S signals, instead of only the vertical or horizontal components, respectively. However, for this data set we did not observe a systematic improvement while correlating all three components (rotated) and choosing the highest correlation for each P and S signal. Therefore, we chose the computationally less expensive option.
The hypocentre error due to velocity uncertainties may be reduced by updating the velocity model after each iteration using the updated reference catalogue, and thus improving correlation window alignment in the subsequent iterations. A similar approach is applied in the PSIR algorithm (Zefeng Li 2016) where iterative velocity model updates improve the search window alignment for automatic phase picking. Instead of, or in addition to, seismic velocity inversion, a DD relocation using the differential time measurements could be performed after each iteration. Although the DD relocation comes at a higher computational cost than single event relocation, the fewer parameters compared to seismic tomography suggest it may be easier to set up in an automated scheme.
The algorithm presented in this work could be implemented in a real-time monitoring environment as detected events are processed one at a time. As the reference catalogue continues growing, correlation with the entire catalogue might become computationally too expensive and/or time consuming. In that case, each detected event may be correlated with only a selection of reference events. Criteria to choose appropriate reference events might include, for example, event separation in space and/or time as neighbouring events in swarm-like activity are more likely to be well correlated.

C O N C L U S I O N S
The algorithm presented in this work uses interevent crosscorrelation to categorize event detections produced by an automatic seismic event detector into groups of varying likelihood of constituting real events. The algorithm was applied to 3348 event detections generated by a migration and stack detector and a reference catalogue of 1108 manually picked events.
We present different parameter choices to steer the selection process towards higher quality or higher quantity output. The designed event categories were confirmed to contain increasing true-to-false ratio and higher quality of events from the bottom to top category. Events in the top category can be used for other seismic applications Downloaded from https://academic.oup.com/gji/article-abstract/219/2/1268/5548776 by Uppsala Universitetsbibliotek, frederic.wagner@geo.uu.se on 06 November 2019 without further manual analysis, as tested with local earthquake tomography and double difference relocation.
We found that it is not necessary to use a large reference catalogue to achieve good results as long as the reference events represent the general seismicity of the study area in space, time and magnitude range. Newly detected regions of seismicity should be treated with care and if possible checked manually. In order to improve accuracy of automatic arrival time estimation and thus relocation we recommend that each event type (based on waveform similarity) is represented by a sufficient number of events to reduce bias. This number will vary for any given earthquake catalogue and should be determined by trial and error and/or examination of waveform data.
Waveform-similarity-based selection proved to be a useful complement to simple SNR-based selection, as shown here by the MBD attribute stack parameter. The iterative scheme we presented is adaptable to a real-time seismic monitoring environment and we consider it a practical automatic tool in combination with various seismic event/phase detectors, provided initial hypocentre estimates and/or phase arrival times exist. If arrival time picks are available, from manual picking or from an automatic phase detector (e.g. Withers et al. 1998), our calculation of P-and S-phase onset estimates, though not required, might provide additional picks or improve existing ones.