An automatically generated high-resolution earthquake catalogue for the 2016–2017 Central Italy seismic sequence, including P and S phase arrival times


 The 2016–2017 central Italy earthquake sequence began with the first main shock near the town of Amatrice on August 24 (Mw 6.0), and was followed by two subsequent large events near Visso on October 26 (Mw 5.9) and Norcia on October 30 (Mw 6.5), plus a cluster of four events with Mw > 5.0 within few hours on 18 January 2017. The affected area had been monitored before the sequence started by the permanent Italian National Seismic Network (RSNC), and was enhanced during the sequence by temporary stations deployed by the National Institute of Geophysics and Volcanology and the British Geological Survey. By the middle of September, there was a dense network of 155 stations, with a mean separation in the epicentral area of 6–10 km, comparable to the most likely earthquake depth range in the region. This network configuration was kept stable for an entire year, producing 2.5 TB of continuous waveform recordings.
 Here we describe how this data was used to develop a large and comprehensive earthquake catalogue using the Complete Automatic Seismic Processor (CASP) procedure. This procedure detected more than 450 000 events in the year following the first main shock, and determined their phase arrival times through an advanced picker engine (RSNI-Picker2), producing a set of about 7 million P- and 10 million S-wave arrival times. These were then used to locate the events using a non-linear location (NLL) algorithm, a 1-D velocity model calibrated for the area, and station corrections and then to compute their local magnitudes (ML). The procedure was validated by comparison of the derived data for phase picks and earthquake parameters with a handpicked reference catalogue (hereinafter referred to as ‘RefCat’). The automated procedure takes less than 12 hr on an Intel Core-i7 workstation to analyse the primary waveform data and to detect and locate 3000 events on the most seismically active day of the sequence. This proves the concept that the CASP algorithm can provide effectively real-time data for input into daily operational earthquake forecasts,
 The results show that there have been significant improvements compared to RefCat obtained in the same period using manual phase picks. The number of detected and located events is higher (from 84 401 to 450 000), the magnitude of completeness is lower (from ML 1.4 to 0.6), and also the number of phase picks is greater with an average number of 72 picked arrival for a ML = 1.4 compared with 30 phases for RefCat using manual phase picking. These propagate into formal uncertainties of ±0.9 km in epicentral location and ±1.5 km in depth for the enhanced catalogue for the vast majority of the events. Together, these provide a significant improvement in the resolution of fine structures such as local planar structures and clusters, in particular the identification of shallow events occurring in parts of the crust previously thought to be inactive. The lower completeness magnitude provides a rich data set for development and testing of analysis techniques of seismic sequences evolution, including real-time, operational monitoring of b-value, time-dependent hazard evaluation and aftershock forecasting.


I N T RO D U C T I O N
On the 24 August 2016, a M w 6.0 earthquake occurred in Central Italy near the town of Amatrice, starting a seismic sequence characterized by a cascade of moderate extensional earthquakes.Two months later, a M w 5.9 earthquake occurred, on 26 October 2016, near the village of Visso (Fig. 1).This activated the northern edge of the fault system, and was followed 4 d later by the largest earthquake in the sequence, M w 6.5 on 30 October, near the town of Norcia.After a further month, four moderate magnitude earthquakes of 5.0 ≤ M w ≤ 5.5 occurred on the 18 January 2017 near Campotosto, at the southern edge of the Amatrice fault system.
The total length of the normal fault system activated by the 2016-2017 seismic sequence is 70 km.This is a very seismically active section of the regional Central-Northern Apennines fault system, where large historical and instrumental earthquakes with M w ≥ 6.0 have occurred in the past.These include events dated from the 13th century C.E. (Rovida et al. 2016) up to the last 30 yr, as well as the 1997 Colfiorito (Chiaraluce et al. 2003) and 2009 L'Aquila (Valoroso et al. 2013) sequences.The Norcia-Amatrice sequence reactivated the area in between these two earlier sequences.
The permanent RSNC network of seismic stations, operated by the Italian National Institute of Geophysics and Volcanology (INGV) for the surveillance of the Italian seismic activity, has a mean minimum detection magnitude of M L ≈ 0 and a completeness magnitude of M L = 1.4 in the considered period (ISIDe Working Group 2007).Fig. 1 shows the permanent seismic network before the Amatrice earthquake, together with the seismic activity recorded in the 5 yr prior to the Amatrice sequence (ISIDe Working Group 2007).Fig. 1 also shows the location and focal mechanisms of the earlier Colfiorito 1997 sequence to the north and the L'Aquila 2009 sequence to the south (in conventional black and white in Fig. 1), and the two M w > 6.0 events that occurred during the 2016-2017 sequence (red/white in Fig. 1).These four events are the only earthquakes with M w > 6.0 in the last 30 yr.All the focal solutions (TDMT-Time Domain Moment Tensor Catalogue; http://cnt.rm.ingv.it/tdmt)display clear extensional fault movement displayed along NW-trending normal faults, roughly parallel to the strikes and dips of the mapped fault breaks.
Immediately after the Amatrice main shock, the emergency team of National Institute of Geophysics and Volcanology began to install 22 seismic stations to complement the permanent ones.Then, scientists from the British Geological Survey, with the support of the NERC Geophysical Equipment Facility and SEIS-UK, deployed an additional 24 broad-band stations within the next few days, resulting in a dense network of 155 stations, with a mean separation in the epicentral area of 6-10 km.This dense seismic network produced over 2.5 terabytes of data in 1 yr, too much to handle by manual phase picking techniques used in preparing the RefCat from RSNC data by manual phase picking.Accordingly, we used automatic processing of the continuous recordings to generate a more comprehensive earthquake catalogue with better locations and improved detection of events below the previous magnitude of completeness.Such automatic procedures have commonly been used to analyse seismological data from both background seismicity and periods of enhanced activity during seismic sequences (Di Stefano et al. 2006;Diehl et al. 2009;Satriano et al. 2011;Lomax et al. 2012;Valoroso et al. 2013;Spallarossa et al. 2014;Romero et al. 2016;Wollina et al. 2018).They complement alternative continuous waveformbased techniques such as template-matching (Gibbons & Ringdal 2006;Shelly et al. 2007;Peng & Zhao 2009) and deep learning approaches for earthquake phase association (Ross et al. 2018).All of these must guarantee an appropriate level of reliability in derived data such as phase arrival times, locations, origin times and (local) magnitudes comparable to those obtained from manual analyses, and do this for a greater number of events per unit of computing time.Obtaining this reliability can be challenging during seismic sequences, where events frequently overlap in time or occur simultaneously in different parts of a network.There are three major advantages of having a larger number of events and a smaller detection threshold in catalogues extracted from automated processing of waveform data.The first is the better constraint on earthquake frequency-magnitude parameters used in probabilistic seismic hazard analysis or operational earthquake forecasting from the increased bandwidth of data.This increased bandwidth, and the increased number of events, also has the potential to reduce the estimated errors in such parameters.The second is the benefit from from introducing small magnitude events to introduce secondary triggering effects illustrated by forecast models developed for the AVN sequence (Mancini et al. 2019).Extending the inclusion of secondary triggering to smaller magnitudes is now possible using the high-resolution catalogue.Third, uncovering the smaller events from high-resolution data may shed new light on important details of the fault architecture.These advantages demonstrate the importance of decreasing the threshold of detected and characterized events in improving forecasts.Gulia & Wiemer (2019) investigate whether observations of b-value variation in time can lead to alarmbased forecasts but current testing of the method to other sequences reveals some caveats for the effectiveness of potential warnings (Dasher-Cousineau et al. 2020) In this work, we adopted the Complete Automatic Seismic Processor procedure (CASP; from Scafidi et al. 2019), to analyse an entire year of recordings of the 2016-2017 Central Italy seismic sequence in a consistent way.We chose CASP because it proved not only to be fast in processing a large amount of seismological data, but also to detect consistent P-and S-phase arrival times, allowing the accurate location of events.The other techniques mentioned above may be able to detect more events and provide accurate relative locations, but can miss events and/or not always provide the absolute locations calculated here.The core of CASP is an advanced automatic wave arrival time detector and location software, based on a chain of modular procedures constituted by iterative algorithms (named 'RSNI-Picker 2 ', from: Spallarossa et al. 2014;Scafidi et al. 2018).The pragmatic choice of CASP allows us to improve (i) detectability, in terms of number of correctly detected arrivals times or hit rate, (ii) reliability, in terms of minimizing the rate of false or imprecise picks and (iii) accuracy of results (Scafidi et al. 2016).It also allows us to generate catalogues of events quickly (∼12 hr), to the point where operational earthquake forecasts could be made daily.In this paper we prove this concept, and show that the new data from the temporary stations, analysed by the CASP algorithm, can reveal new features of the fault architecture and improved estimates of parameters used in probabilistic seismic hazard analysis and operational earthquake forecasting.
In the Table 1, we list the properties of the RefCat and enhanced catalogues.The methods used in producing the enhanced catalogue are described in more detail in the following section.

P -A N D S -P H A S E P I C K I N G , E A RT H Q UA K E D E T E C T I O N A N D C H A R A C T E R I Z AT I O N
This section describes the elements of the work flow we used to retrieve the enhanced earthquake catalogue obtained by the CASP method on the dense seismic network.The work flow itself is illustrated in the flowchart of Fig. 3.

The seismic network
The area affected by The Amatrice sequence had been regularly monitored before its onset by the stations of the Italian National Seismic Network (INGV Seismological Data Centre 2006) and by additional local and regional seismic networks (respectively the TABOO- Chiaraluce et al. 2014 andRESIICO-Marzorati et al. 2016, networks) operated by INGV.This permanent network was enhanced by the deployment of complementary set of 22 3Cstations from the INGV emergency network of temporary, portable stations within the first 10 d of the sequence (Moretti et al. 2016).An additional 24 broad-band stations that were installed by the 10th of September, by the British Geological Survey (BGS) and School of Geosciences at the University of Edinburgh, in active co-operation with INGV to optimise the enhanced network.The final configuration of the enhanced, dense seismic network contained 155 stations (Fig. 2), bringing the station inter-distance down to 6-10 km.Hypo, Hypo2000, NonLinLoc NonLinLoc Average number of phases (M L 1.4-1.5 events) 30 72  This network configuration operated stably from 24 August 2016 to 31 August 2017, producing a massive data set of continuous waveforms (≈2.5 terabytes).The data set is now freely available, in standard miniseed format, from the ORFEUS (Observatories & Research Facilities for European Seismology) and IRIS (Incorporated Research Institutions for Seismology) web portals (https://www.orfeus-eu.org/,https://www.iris.edu/hq/).

P-and S-phase arrival times and earthquake detection
The CASP software analyses the recorded waveforms directly in the standard miniseed format, organized in daily (24-hr) continuous time windows as retrieved by the standard seedlink archive format.
The first step of the automatic procedure is the event detection based on a standard STA/LTA analysis, empirically calibrated for each station as a function of the site's ambient noise and the characteristics of its installed instruments.The main parameters involved in this calibration are (see Table 2): the low-and high-corner frequencies of the band-pass filter (F Low and F High), the shortand long-term average constants (STA and LTA), the STA/LTA ratio threshold (Level), the window length for the STA/LTA ratio calculation (Dur), and the minimum time-interval permitted between two consecutive triggers (LenMin).These parameters have been set such that the algorithm is very sensitive to the occurrence of candidate events, producing a very large number of triggers.This minimizes the chances of failing to detect 'true positives', at the expense of also producing many 'false positives' associated with non-seismically generated noise.However, the false positives are then removed on the basis of the location residuals and of the quality of locations, in the final step of the CASP procedure, which has been proven to discriminate effectively between true seismic phases and other signals.
For each trigger detected through the STA/LTA analysis, the final trigger time is then determined as the minimum of the Akaike Information Criterion (AIC) function (after Akaike 1974) computed within a signal window around the previous trigger identification.The use of the AIC-based algorithm allows us to overcome some typical errors associated with STA/LTA analysis (Scafidi et al. 2019).
The station trigger data were then input in into the CASP event detection module (see Fig. 3).The detection module is based on a 'coincident' system, where a defined number of data channels must be triggered within a defined time window in order to declare the (potential) beginning of a candidate event.It has proven to have strong advantages in detection of events occurring close in time during a seismic sequence.Event detection was further optimized by splitting the analysis into sub-networks defined by 11 different geographical zones defined by the pattern of seismicity (Fig. 2), allowing the accurate detection of as many events as possible.The zones overlap, but this is necessary to allow us to recognize identical events detected by the different networks, and hence to remove duplicates in the catalogue.Accordingly, we report in Table 3, for each subnetwork, the name, the number of stations included and the parameters controlling the event detection, that is the number of stations which must be triggered within the coincidence window length.
The seismograms of the potential events are then extracted from the continuous recordings and converted in standard Seismic Analysis Code (SAC) format.Each time window has 10 s of pre-trigger time and a total duration of 45 s.Considering the network density, for every triggered event we extracted waveforms from all the stations located within 90 km from the preliminary epicentre of the possible event.Then, the waveform for the candidate event is analysed through the advanced automatic picker and location engine, named RSNI-Picker 2 (RSNI-P2 in Fig. 3; from Spallarossa et al. 2014;Scafidi et al. 2016Scafidi et al. , 2018)).
The RSNI-P2 is based on an iterative procedure for the automatic identification of phase arrival times by calculating AIC functions.Iterations consist of different steps, separately performed for P-and S-phases, where arrival time identification is checked and refined based on a preliminary earthquake location we computed with the NonLinLoc software (Lomax et al. 2000) and a 1-D velocity model of the area (De Luca et al. 2009).
Complex seismic sequences are often characterized by the occurrence of multiple main shocks within a short time scale going from seconds to weeks.The likelihood of events occurring closely spaced in time (even if not necessarily in space) makes the automatic analysis of seismic data, in terms of phase association and event location, a very difficult task.The waveforms for two, or more, seismic events occurring almost at the same time could overlap, leading to one or more events being missed.However, this will only occur if the signals from the most distant station for the earlier event arrive around or later than those from the nearest station for the later event.RSNI-P2 includes a smart search component for such nearly contemporary earthquakes.After a first event is detected and located, the procedure dynamically cuts each waveform to exclude the portions of the signal belonging to the first event.This waveform cut is done by identifying the end of the seismic signals belonging to the earlier event, and by evaluating the event magnitude (i.e. higher magnitude implies longer signals to be cut, and vice versa) and the hypocentral distance from each station (i.e. the starting time of each time window is independently selected).As a consequence, the time windows available for a potential later event are determined.Then, the system starts a new search analysis on these waveforms, to check for and detect any missing earthquake.A similar process is done with the part of the seismograms before the first detected event, looking for other events that occurred just before its time window.
RSNI-P2 is also equipped with a tool for identifying out-ofnetwork events such as teleseisms or regional earthquakes through an appropriate spectral analysis.This algorithm is able to discriminate reliably between phase arrival times caused by non-local events in order to discard them and to have a clean final data set of local earthquakes (Scafidi et al. 2019).
The final results of the CASP procedure is a data set of P-and S-phase arrival times and an earthquake catalogue of origin time, location, depth and magnitude, all linked together.Every phase arrival time is attributed to the earthquake it belongs to, and every earthquake has the list of arrival times that led to its location.This provides an important benchmark for future studies based on these data.Moreover, every single result has its own error estimation, allowing the possibility to discriminate between different quality classes.For each set of arrival times, CASP provides quality factors (qf) that indicate the estimated uncertainty of the automatic detection in seconds, to clearly define the reliability of each datum.The earthquake location quality is an output of the location software, specifically a probabilistic estimation of the error produced within the NonLinLoc code (Lomax et al. 2000).

Probabilistic earthquake location with station corrections
The automatic analysis procedure we used in this study includes an iterative location phase, designed to identify and optimize the P-and/or S-arrival times.The final data set of P-and S-phase arrival times, respectively consisting of 7016 435 P and 10 003 900 S phases.The total number of identified and associated phases with earthquakes is much greater than in RefCat.In the magnitude range 1.4-1.5 the average number of picked arrival times is 72 and 30 for the enhanced catalogue and RefCat, respectively (see Table 1).The significantly higher average number of phases of the enhanced catalogue with respect to the standard one is due to: the higher number of stations, concentrated in the epicentral area, used for the enhanced catalogue, and the iterative search procedure of S phases (see 'P-and S-phase arrival times and earthquake detection' paragraph).Downloaded from https://academic.oup.com/gji/article/225/1/555/6044232 by British Geological Survey Keyworth user on 28 June 2021 The final data set of arrival travel times was then used as input to a final optimized relocation run, producing the final list of seismic events with their hypocentral location, depth, origin time and associated uncertainties.To do this we used the same NonLinLoc procedure adopted for the picking phase, and the same 1-D propagation model, derived from De Luca et al. (2009).The main difference is the introduction of static station correction values, to reduce the discrepancies between the adopted propagation model and the real Earth.
A proxy of the best station corrections can be obtained from the mean residuals obtained for a set of located events.In order to reduce the trade-off between residual reduction and changes in location, a sub-set of events with very stable locations and a redundant number of phases was selected for a benchmark test.Moreover, in order to reduce the possible impact of a non-uniform sampling of the investigated area, events were selected by imposing a regular grid (size 5 km) on the hypocentral volume of the sequence, and by choosing a maximum of 10 events for each grid element showing the highest number of phases.In this way, we avoid oversampling the ray paths in the volumes with the highest number of events and stations.Thus, a sub-set of about 3600 events was selected from the data set of preliminary locations.
The relevant phases were picked by in a recursive procedure, in which the mean station residuals of the previous location iteration were used as station correction for the following iteration.The cumulative root mean square of residuals tended to stabilize after three iterations, and the mean station residuals at this point were adopted as station corrections for the final location.Corrections were used only for stations showing a hit count larger than 50.A representation of the obtained station corrections is shown in Fig. 4. In general, the corrections in the epicentral area defined by the cloud of seismicity in Fig. 1 are rather small, confirming the validity of the adopted 1-D velocity model.Conversely, stations in the Adriatic foreland (on the eastern side of the area studied), show systematically positive station corrections.This is consistent with relatively low velocities being present at the surface and in the upper crustal layers in this area, as proposed independently by Carannante et al. (2013) in their tomographic study.
Obviously this complete procedure of station correction would not be possible in a near real-time application of the method; however a good estimate of station corrections would be available for the permanent part of the network from background seismicity.Corrections for the temporary stations would be obtained with an increasing accuracy during the evolution of the sequence, but, as already stated, their contribution is less critical with respect to stations in the Adriatic foreland, mainly permanent ones.
We show in Fig. 5(a) map view of the 440 697 relocated events making up the final enhanced earthquake catalogue.For comparison, we also report in Fig. 6 the number of events per day of the obtained catalogue (in red) versus those present in the RefCat (ISIDe Working Group 2007).Our procedure significantly increases the number of retrieved events by a factor varying from four to more than five during the sequence, with an overall mean ratio of 5.22.This holds even during the phases of very high event rate associated with the major destructive events-at the beginning of the sequence, during the period 26-30 October 2016 and soon after the 17 January 2017.This confirms the efficiency of the detection and picking algorithms, even for seismic events that are closely spaced in space and time.In a recent work, Zhang et al. (2019) associate and locate about 3300 events for a subset of 5 d of the sequence (14-18 October 2016) using the Rapid Earthquake Association and Location algorithm (REAL).For the same period, our automatic database includes more than 4300 well-located events.
After the final relocation step, we performed a final 'cleaning' of the derived data set for P-and S-wave arrival times to remove redundant entries and to ensure consistency of outputs in the catalogue.As a result, only those phases contributing to the final location are included in the final catalogue containing the dataset of arrival times.In particular, P and S arrival times with residuals greater than 2 s or with a zero weight in the location were discarded.Nevertheless, all the events contained in the final earthquake catalogue have at least six P and S phases.
Prior to the cleaning procedure, some 4 per cent of the locations (18 945 events) had (unphysical) negative depths.Within those events, there were 899 events in the A-class for quality (5 per cent), 2888 events in the B-class (15 per cent), 4782 events in the C-class (25 per cent) and 10 376 events in the D-class (55 per cent).For the definition of the adopted quality classes, see Section 3.3.We decided to keep only the events whose depth is positive in the final catalogue, that is its elevation is below the local topographic height.As a result, we excluded 11 871 events, 88 per cent of those in the C and D classes of poorly located events.The final dataset of arrival times contained entries for 6871 990 P and 9941 649 S phases.
The number of detected and validated P and S arrival times per day for the whole analysed year is shown in Fig. 7(a).The number of S phases is always slightly higher than the P ones; this is due in part to the fact that the search for S phases is always driven by a trial location that is already reliably known from the P-phases.In addition, mainly for small events, very often the signal to noise ratio of S waves (with respect to P coda) on horizontal components is higher than the SNR of P waves (with respect to pre-event noise) on the vertical component.A typical example is reported in Fig. 8, (two stations of event 170 315 000 204, Ml 0.17).Station ED19 has a good P-and S-picking.For the temporary station T1256, automatic  In the most seismically active single day, on the end of October 2016, more than 75 000 P arrival times and 110 000 S arrival times were detected and correctly assigned to more than 3500 events.The distribution of the automatic quality estimation of both P and S phases is also reported in Figs 7(b) and (c).The average value of the estimated errors in the arrival times is ±0.05 s for the P phases and ±0.13 s for the S phases.More than 90 per cent of arrival times had estimated errors below 0.1 and 0.25 s, respectively, for the P and S waves.

Validation of results
In order to evaluate the effectiveness and accuracy of the automated phase picking algorithm, we analysed a subset of the data where we were confident in the phase data obtained by manual phase picking methods.We chose a day with a number of events close to the long-term average and without any particular clustering, so that the space, time and magnitude distribution of this subset can be assumed to be representative of the whole data set.We selected the 7 January from 00:00 UTC to 06:00 UTC and hand-picked phases for all the events recognized by the detection procedure, resulting in 345 events with both a good quality location based on the manual picks, and a corresponding event in the automatic data set.It was then straightforward to compare the phase and catalogue parameters for these two independently generated data sets.
We compare the automatic and manual picks and locations (epicentre and depth) in Fig. 9. Fig. 9(a) shows the time differences for the P phase picks: on the left the whole data set is analysed, while on the right only the best quality picks are plotted.It is quite evident that for P phases most picks are either nearly coincident with the human ones or, in few cases, completely wrong (errors larger than 1s are all reported in the last bin).Fortunately, these large errors are recognized as outliers by the location procedure without any further tuning, and hence are not used for the final comparison.
After removing these events, the distributions of differences in arrival time, epicentral location, and depth are even more peaked Downloaded from https://academic.oup.com/gji/article/225/1/555/6044232 by British Geological Survey Keyworth user on 28 June 2021  (and the number of large errors reduced).The estimated systematic error in the automated phase picks estimated by the difference between the high-confidence manual and automated picks is less than 0.04 s: the level of confidence is 87.0 per cent within this threshold, and 93.3 per cent within 0.08 s.For S picks, the distribution appears broader (Fig. 9b, left-hand panel), and is again skewed towards positive values, implying delayed picking in the automatic system.If we reduce the analysis to the best quality picks (estimated errors below 0.08 s, Fig. 9b, right-hand panel) the distribution appears narrower and more symmetric.In this case, 72.6 per cent of the differences are below the threshold, and 85.2 per cent below twice this value.For both P and S picks the distribution of the systematic error is somewhat asymmetric, with a slight skew to positive values, implying the automatic pick is slightly biased to be later than the human one.These results confirm that the S picks are around twice as uncertain in time as the P picks, but the absolute uncertainty remains small.In most cases, the CASP procedure is able to furnish a reliable and accurate estimate of the phase arrival time, and to discriminate reliable results from outliers.
The locations obtained by the automatic picks in the test period were then compared with those of the representative sample of the manual ones.Fig. 9(c) in the top row shows histograms of the depth difference (automatic minus manual depth), the modulus of difference in depth distance, and the modulus of difference in epicentral for events with estimated horizontal errors below 3 km and vertical errors below 5 km.Some 94.1 per cent of the locations show horizontal distances below the threshold of 3 km, and 95.5 per cent below the 5 km threshold for the depth error.The depth difference appears nearly symmetrical, showing that the automatic procedure does not introduce any systematic bias in the depth estimate.
If we reduce the analysis to the more reliable locations (horizontal and vertical error <1 km), the distributions become still narrower (bottom row of Fig. 9c): 92.1 per cent (for the epicentral distance) and 83.9 per cent (for the depth distance) of the events fall within the defined thresholds.Overall, within the limits stated in the analysis section, these results provides a firm validation of the accuracy of the phase picking method in comparison with manual analysis of the same data for a representative sample of data.

Quality of earthquake locations
In order to classify the quality of the final earthquake locations, we applied the procedure proposed by Michele et al. (2019) to the resulting catalogue.These authors proposed a criterion to assess the location quality, consisting of the combination of the uncertainty estimates, properly normalized, provided by the NonLinLoc location code (Lomax et al. 2000).The procedure quantifies the location quality estimate in terms of a unique numeric normalized value the quality factor which varies between qf = 0 (best quality location) and qf = 1 (worst quality location).Then, the location obtained is assigned to a quality class depending on the qf parameter value according to the following scheme: A-class (0 < qf ≤ 0.25), Bclass (0.25 < qf ≤ 0.50), C-class (0.50 < qf ≤ 0.75) and D-class (0.75 < qf < 1.00).We report in Fig. 10 the distribution of the location parameters, divided into these different quality classes (class: A in red, B in green, C in blue and D in black).The parameter distributions all show increasing dispersion while moving from the best (A) to the worst (D) class, with the single exception of the plots for the number of phases picked, where the distribution narrows with decreasing quality, and the average number of phases identified also decreases.These results are not inconsistent with each other-we would expect better locations and a more variable number of phase picks in good quality data.We end up with a catalogue of earthquake locations, distributed between the quality classes as A-31.8 per cent, B-32.0 per cent, C-18.3 per cent and D-17.9 per cent.The relationship between the main location parameters as function of the magnitude is shown in Fig. 11.The reported quality factors qf are mean values computed in bins of 0.02 of magnitude.The mean quality factor and the number of phases picked is fairly constant in the magnitude range 2.0-3.6 (see Figs 11a and b).Each class has a quality factor that is peaked around a magnitude that increases systematically with increasing quality.This is consistent also with the distribution of the number of events in Fig. 11d), where the higher quality data peaks at a higher magnitude, and the lower quality data peaks at low magnitudes where the signal to noise ratio would be the lowest.In turn, this is consistent with a lower-mean number of phases contributing to the earthquake locations for the low-quality data.The quality factor is quite constant for different earthquake magnitudes for all quality classes (Fig. 11e).4.0.Finally, for the higher values of magnitude, we observe a degree of volatility in the mean qf and in the number of phases (see Fig. 11c).We know that the automatic picks for (the usually small number of) moderate-to-large earthquakes that occur during one seismic sequence strongly depend on factors not considered here, for example instrument clipping or the presence of a minor event within the nucleation phase.Thus it may be advisable even with an automated phase picking method to continue to analyse data for the relatively few largest events manually for the time being, at least until automated techniques have been adapted to account for such effects.

Local magnitude (M L ) computation
The local magnitude (M L ;Richter 1935Richter , 1958) ) is also calculated automatically, using the complex multithread algorithm embedded inside RSNI-P2.Since M L could be strongly biased by the signal processing methods adopted to correct for instrument response, we also adopted an automatic method to select the best pre-deconvolution filter parameters on the basis of signal-to-noise analysis.Specifically, the high-pass corner frequency of the Butterworth pre-deconvolution filter is automatically determined, identifying the lowest frequency (in the 0.5-2.0Hz range), which gives a signal-to-noise ratio greater than a threshold value (e.g. usually 4.0).This approach allows the proper consideration of the relatively low frequencies, around 0.5 Hz, relevant for earthquakes with higher magnitudes (e.g.M L > 3).This is important to avoid M L underestimation for these events.On the other hand, the lower frequencies are discarded in the case of low-magnitude earthquakes recorded by broad-band instruments in order to minimize any bias from microseismic noise.The low-pass corner frequency of the Butterworth pre-deconvolution filter is always selected on the basis of the sampling frequency such that it is always lower than the Nyquist frequency.
After an earthquake is located and seismic signals are filtered with the above procedure, M L is automatically evaluated using the Downloaded from https://academic.oup.com/gji/article/225/1/555/6044232 by British Geological Survey Keyworth user on 28 June 2021 algorithm proposed by Spallarossa et al. (2002).This method consists in generating synthesized Wood-Anderson seismograms from horizontal component digital recordings and applying a calibrated attenuation function for the monitored area (Di Bona 2016).M L is obtained separately for each seismic station and the event magnitude is estimated by averaging the data from all available stations.Only stations with reliable P-or S-phase picks (i.e. with NonLinLoc associated location weight greater than zero) are used.
After this initial M L calculation, a further quality selection is performed.At first, to overcome any biased M L computation due to waveform saturation and/or distortion and possible near-field effects, seismograms recorded by stations close to the epicentre are discarded.The procedure checks the data for each hypocentre-tostation distance, and applies a threshold defined by an empirically calibrated magnitude-distance relation.After extensive testing, this distance threshold has been assumed to span between 5 km for earthquakes with M L = 1.0 and 50 km for earthquakes with M L > 5.0.Secondly, potential bias due to recordings with low signal-to-noise ratio and/or those affected by unknown attenuation effects for large distances are avoided by discarding data from stations far from the  station magnitude values to reduce the influence of potential outliers.Otherwise, the initially computed M L value remains valid.All of these steps are automated, reproducible and consistently applied once the control parameters have been chosen.

R E S U LT S
Fig. 12 shows the spatial distribution of all the events contained in the enhanced earthquake catalogue obtained by the methods described in section 2. The plot is divided into four columns, one for each location class based on the quality factor (qf) defined above (from class A to class B, C and D).The cross sections are drawn across the largest events of the sequence.Section A crosses the south the cluster of four M w > 5 events of January 18, sections B and C cut through the M w 6.0 Amatrice and M w 6.5 Norcia main shock, respectively, and the northernmost section D crosses the location of the M w 5.9 Visso earthquake.
The enhanced catalogue consists of 440 697 events, with-1.0≤ M L ≤ 5.9, and covers the first year of the Central Italy seismic sequence, starting from the 24 August 2016 (the date of the first Amatrice main shock) to the 31 August 2017 (the date of the removal of the BGS temporary seismic stations).
Despite the differences in quality, the general pattern of the seismicity in the different classes is remarkably consistent in both horizontal and vertical projections (Fig. 12).The locations for the two best quality classes (A and B), are absolutely comparable to those based on manual phase picks by Michele et al. (2016), Chiaraluce et al. (2017) and Improta et al. (2019).The general pattern is also clearly resolved by the lower quality data (C and D) at the kilometre scale of the diagram.
In order to quantify the clustering characteristics of the RefCat and of the new catalogue, we subdivided the whole volume interested by the sequence in 242 000 cubic cells with 1 km long edges, and counted the events falling in each cell.For the RefCat, 83 326 events fell in the selected volume and they occupied 12 950 cells (5.3 per cent of the total); 90 per cent of the earthquakes used just 6201 cells (2.6 per cent of the total).If we select just the A-class events (qf < 0.25) of our new catalogue, we obtain 140 122 events in the selected volume and they use 11 369 cells (4.7 per cent); 90 per cent of the events is limited in 4328 cells (1.8 per cent).In conclusion, even taking into account just the best quality locations, we obtain a higher number of events in a source volume more reduced with respect to those of the RefCat.If we also include the B-class events (quality factor < 0.50), we obtain 281 017 events, using 18 671 cells (7.7 per cent); 90 per cent of these are limited to 5717 cells (2.4 per cent).This selection reports a number of events 3.37 times higher than the RefCat and still with a higher clustering factor.It must be remembered that the RefCat is based on manual phase pickings, while the high resolution catalogue is fully automatic.If we analyze the whole new catalogue (434 596 events in the volume), we fill 39 223 cells (16.2 per cent), but the majority (90 per cent) are in 10 839 of these (4.5 per cent).As expected, and as evident in the maps and in the cross-sections, the lower quality events are more dispersed forming a diffuse cloud around the whole area.Nevertheless, even for the whole catalogue the majority of the events are concentrated in a limited volume, indicating significant localization of seismicity-well located small events do not stray too far from the volumes affected by larger ones.
The quantitative contribution of this dispersed seismicity to our knowledge of the process is a matter for debate, mainly due to the low quality of their locations.To investigate their effect, we extracted category D low-quality events (qf > 0.75) falling in cells with a minimum distance of 5 km from each cell populated by events from the A + B catalogue.These events are the most likely to have poor locations and hence are the most likely to be associated with a potentially artificial seismic 'cloud'.We obtain 327 events with this criterion, 0.07 per cent of the whole catalogue.This means that the wide cloud of seismicity recognizable in maps and cross-sections (Fig. 12) of D-quality events is generated by rather few poorlylocated events.Moreover, if we compute the mean quality factor of this sub-set of events, we obtain a value of 0.97 (qf ranges from 0 for the best location to 1 for the more unstable ones).It is evident that this sub-set of events belongs to the tail of our error distribution, and do not add to our understanding of fault architecture or potential seismic zoning in the region.
To investigate what these locations represent in more detail, we visually inspected some of their waveforms.We found a few examples of real earthquakes located far from the sequence source area (the true sparse background seismicity), but mainly the cloud is comprised of mislocated events, often linked to nearly-contemporaneous events not correctly managed by the automatic picking procedure.The cloud is therefore an artefact of poorly located events.This quality check confirms that the quality factor is a good discriminant for the reliability of the location.For applications related to the reconstruction of structure geometries we would thus recommend using only the best-quality events (low qf) while for a purely statistical analysis (i.e.b-value analysis), when a mislocation of some km could be not critical, the whole catalogue may be used.
The enhanced seismicity catalogue provided by the CASP procedure provides a much clearer picture of the activated structures than before.For example, we observe 1) new geological structures, for example a fault located in the footwall of the system in section C not detected in the previous work and 2) shallow (< 5 km of) depth earthquakes that partly fill portions of the shallower crustal volume previously characterized by low seismic activity (e.g.sections B along the Amatrice fault plane (cf.Michele et al. 2016Michele et al. , 2020;;Chiaraluce et al. 2017;Improta et al. 2019).
All the main events with M w > 5.0 are well located by the CASP algorithm, with the single exception of the first Amatrice main shock (M w = 6.0) which has an estimated depth of 0 km.This was most likely a consequence of (i) a lower number of stations (at this stage just the permanent ones were available) and (ii) the low number of strong motion sensors with non-clipped data.In combination, these resulted in a non-optimal location, so we excluded it from the final catalogue.
Our results clearly demonstrate the high performance and robustness of the automatic CASP procedure (Scafidi et al. 2019) in retrieving a large number of well-constrained events, along with better estimates of their location, and a more complete catalogue at low magnitude.To illustrate the improvement in completeness, the frequency-magnitude distributions for the RefCat and for the enhanced catalogue are shown in Fig. 13.The estimated completeness magnitude for the enhanced catalogue is close to M L 0.6, indicated by the upper end of the red line, is almost one magnitude unit better than that of the RefCat catalogue (M L 1.4 indicated by the upper end of the blue line).The b and a value (see Fig. 13) of the G-R relationships and their respective uncertainties are computed using a maximum-likelihood assessment.The b value for the enhanced catalogue is 0.965 and for the RefCat is 1.110.The significant difference in the b value and in the cumulative number of events in the magnitude range 1.4-2.4,apparently complete for both catalogues, in our opinion is related to an overestimation of M L induced by the procedure used in the routine INGV analysis for the Ref-CAT catalogue, particularly for low-magnitude events.Indeed the adoption, in the CASP method, of a high-pass filter adapted to the signal-to-noise ratio allows to reduce the possible bias on magnitude estimates of the microseismic noise recorded by broad-band sensors and superimposed to the earthquake signal.

D I S C U S S I O N
High-resolution earthquake catalogues will allow us to improve future operational earthquake forecasting by extending the retrospective experiments where the focus is to improve our understanding for earthquake triggering mechanisms.One of our main motivations was to prove the concept that CASP would allow us to provide high-resolution earthquake catalogues in near-real time.In practice, CASP is able to detect and locate more than 2.8 events per minute and took less than 12 hr of computation time on a standard workstation equipped with an Intel Core i7-7700 CPU using 4 parallel threads to analyse 1 d of data containing more than 3000 detected events.Most of this computation time is spent by the Non-LinLoc location procedure because several iterations of picking and location for both P and S phases are required to achieve a stable and reliable result.The 12-hr computational time is critical since it will allow effectively real time catalogues to be produced, even for operational aftershocks forecasts updated daily during a large seismic sequence.The improvement in magnitude of completeness threshold of the automatic procedure will benefit the current Operational Earthquake Forecast scheme in Italy (Marzocchi et al. 2014) since the predictive power of short-term clustering models is directly related with the inclusion of small magnitude events enhancing secondary earthquake-to-earthquake triggering.We expect that the benefits from such catalogues will extend to retrospective efforts investigating the temporal evolution of earthquake sequences through the estimation of statistical parameters, such as the b-value of the frequency-magnitude distribution that has been extensively used in a variety of seismotectonic environments to predict hazardous behaviour through b-value decreases (e.g.Tormann et al. 2015 for tracking post-2011 seismicity in Japan).The lower magnitude of completeness is one of the major advantage of the enhanced catalogue, because it provides a broader bandwidth of observations with which to test the hypothesis of a scale-invariant (power-law) distribution of source rupture area, seismic moment or energy inferred from the magnitudes.The broader bandwidth of observations in principle also reduces the uncertainty in a and b (e.g.Main 1996), both critical parameters in both probabilistic seismic hazard analysis (where they are assumed stationary) and operational earthquake forecasting (where they may change with time).However, the results presented here indicate there are some outstanding issues to address in terms of systematic effects magnitude determination and scaling before this potential can be realized.
Despite these results, there is still some work to do in order to increase the resolution power during the busiest time windows (e.g.hours-days) of aftershock activity associated with the occurrence of the largest events.As shown in Fig. 14, the mean magnitude of the events within the CASP catalogue, increases from a background value around M L 0.5 up to M L 2.5 soon after the occurrence of all of the M w > 5 events (indicated by the black vertical lines).This is particularly evident around the end of October 2016 when two large events occur within a short interval time, that is the Visso 26th and Norcia 30th of October shocks.These early aftershock phases are the only times when the detection capability of the CASP method becomes almost comparable to that of the Italian seismic monitoring room.
We have highlighted several important outcomes achieved by using the CASP procedure.Nevertheless, it is also worth remembering that one of the key controls on resolution is the availability of a large number of densely spaced 3-component seismic stations.To exemplify this, Fig. 15 shows the evolution of the mean quality factor qf for the hypocentre locations defined in Section 3.3, as a function of the number of available stations.The number of available stations ramps up while the temporary network was under installation in the early days of the sequence, while it winds down when they are removed towards the end.Overall, there is a clear inverse correlation between the number of stations used in the location and qf.For example, qf decreases from around 0.65-0.45when the temporary network stations have been deployed at the start of the time period shown, and then increases again to 0.6 when the stations are removed towards the end of the time window.This improvement in quality with respect to the number of stations is not a surprise, but Fig. 15 confirms how the impact is significant in this case.
CASP is only one of the innovative techniques currently able to provide very large earthquake catalogues.Template-matching approaches (TM; Shelly et al. 2007;Peng & Zhao 2009 among others) for example, exploit the similarity of earthquake waveforms between closely-spaced events, allowing the detection of earthquakes previously hidden in the noise.This approach has successfully been applied in a number of examples, including induced seismicity environments (Skoumal et al. 2015).A major disadvantage is that the method is insensitive to event templates that are absent from the starting catalogue, such as events occurring in volumes that were previously inactive.Template matching of itself does not produce absolute P-and S-arrival times, and hence does not allow the absolute location of all the newly detected events.Instead, such catalogues often quote relative locations associated with phase shifts identified in the cross-correlation procedure.As a consequence, while it is usually possible to improve the number of detected events by more than a factor of 10, the number of well-located events typically increases only by a factor of 2-5 (Diehl et al. 2017 andRoss et al. 2019), compared to a factor 4-5 here with the CASP procedure (Fig. 6).
More recently, profiting of the dramatic progress made by neural networks in deep machine learning, new approaches have been proposed for both phase detection and picking (Ross et al. 2019;Zhu & Beroza 2019).At the cost of a non-trivial training stage, needed to teach the system how to recognize body waves arrival times, the major benefit is to extract a very large number of probability distributions for the presence of a P wave, S wave and noise in continuous waveform data.By applying a series of filtering and decimation operations, these features are automatically extracted and classified.These new systems seem to be able to work properly on relatively new data, whose characteristics may differ from those in the training set.Once the seismic phases have been detected, there is still the need to generate phase association, a challenging task given the amount of data collected at a large number of closely spaced stations during a seismic sequence, and often requiring tuned with user-defined parameters.
Our approach is able to detect P-and S waves absolute arrival times, allowing the accurate location and magnitude computation for hundreds of thousands of both very small events (down to negative magnitudes) and events occurring in areas and fault portions previously silent.A first-order comparison of our catalogue with those generated by both template matching and deep learning approaches, demonstrates that our approach performs well, at least in terms of detection rate.Our catalogue covers a small area of about 7000 km 2 for 1 yr of seismic activity, and we compare its performance to that of the QTM catalogue (Ross et al. 2019(Ross et al. ), covering 10 yr (2008(Ross et al. -2017) ) of seismic activity for Southern California generated by the template matching approach.Ross et al. (2018) identified about 495 earthquakes per day across the region with an average time of 174 s between events.Our mean detection rate is about 1230 per day, with an average time between events of 70 s.During periods of intense seismicity, with peaks of more than 3000 events per day, the average inter-event time is about 30 s (Fig. 6).These values are comparable to those obtained by Ross et al. (2019) using their deep learning approach for the first 12 hr of the 2016 Bombay Beach sequence; resulting in about 1000 events in 12 hr and an average time between events of about 40 s.In future work,  we will conduct a benchmark test of the different methods on the same data, but this preliminary comparison demonstrates that our approach produces broadly comparable performance to that of other state-of-the-art techniques, acknowledging the relative advantages and disadvantages of each.

C O N C L U S I O N S
The automated phase-picking method provided by the CASP algorithm, applied to data from a high-resolution seismic network, generated significant improvements in the number of events located and the magnitude of completeness in the case of the Norcia-Amatrice earthquake sequence.The procedure is rapid, taking less than 12 hr to analyse the primary waveform data and to detect and locate 3000 events during the most seismically active day of the sequence.This means it can be used to inform decisions that need to be made in near real time on a time scale of around a day or so, and in principle to provide a basis for regularly updated operational forecasting of future events during a seismic sequence.The procedure was validated by comparison of the derived data for phase picks and earthquake parameters with a reference catalogue based on initial manual phase picking.The results confirm a high degree of accuracy in the automated procedure, with an average estimated formal error of 0.04 s in P-phase picks, 0.08 s in S-phase picks, 0.9 km in epicentral location and 1.5 km in depth, with most events having uncertainties well within these ranges.The quality of the data is strongly correlated with respect to the number of available stations and the magnitude of the events.With the exception of periods with many overlapping events early in the aftershock sequences of the largest events, the magnitude of completeness is reduced from 2.5 for the standard Italian catalogue to around 0.6.Together these provide a significant improvement in the resolution of fine structures such as local planar structures and clusters, the identification of shallow events in a previously inactive part of the crust, and potential improvements in estimates of statistical parameters used in probabilistic seismic hazard analysis and operational earthquake forecasting

Figure 1 .
Figure 1.Map of the study area.Black circles represent earthquake epicentre locations in the 5 yr before the Amatrice Main shock (23 August 2011 to the 23 August 2016; from the Italian bulletin).The circle diameter scales with earthquake magnitude in M0.5 steps.The red stars associated with the black/white focal mechanisms shown are the locations of the two events with M w > 6.0 that occurred in the area in the last 30 yr (M w 6.0 Colfiorito 1997 to the north and M w 6.1 L'Aquila 2009 to the south).The red stars with red/white focal mechanisms are the locations of the two events with M w > 6.0 that occurred during the (Central Italy) sequence studied here.Yellow triangles are the permanent seismic stations of the Italian seismic network managed by INGV.

Figure 2 .
Figure 2. Final seismic network configuration (left-hand panel), colour-coded by type, and the epicentral area (coloured in yellow) and overlapping, colourcoded, sub networks defined in the main text (right-hand panel).

Figure 3 .
Figure 3. Flow-chart of the CASP automatic procedure followed in this work.

Figure 4 .
Figure 4. Representation of the static stations corrections for P (left-hand panel) and S waves (right-hand panel) as colour coded circles (grey and white for negative/positive corrections, respectively), containing the station name and the correction value (in seconds); the circle size is scaled with the correction value.

Figure 5 .
Figure 5. Map view of the seismicity distribution comprising all the 440 697 events retrieved by this study by analysing 1 yr of recorded data, with focal mechanisms as indicated.Colour coding is the same as for Fig. 1.

Figure 6 .
Figure 6.Number of events per day in the enhanced catalogue (black) versus RefCat catalogue (red).

Figure 7 .
Figure 7. (a) Number of P (blue) and S (green) arrival times per day; (b and c) incremental distribution of P-and S-arrival times estimated errors, respectively, with cumulative probabilities in yellow.

Figure 8 .
Figure 8. Example of automatic P and S phase picking for two stations.For the temporary station T1256, automatic P-picking was not possible due to the low SNR, nevertheless the S arrival was correctly recognized.

Figure 9 .
Figure 9.Comparison between automatic and manual picks and locations for the selected representative sample of events.(a) Comparison of P picks times (automatic-manual).Left-hand panel: whole data set; right-hand panel: estimated picking error <0.04 s.(b) Comparison of S picking times (automaticmanual).Left-hand panel: whole data set, right-hand panel: estimated picking error <0.08 s.(c) Comparison of automatic and manual locations.Top row: estimated horizontal location error (erh) <3 km and vertical error (erz) <5 km; bottom row: erh <1 km and erz <1 km.Left-hand panel: depth difference (automatic-manual, km); central panel: depth distance (km); right-hand panel: epicentral distance (automatic-manual, km), showing incremental and (in red) cumulative probabilities.

Figure 10 .
Figure 10.Statistical distribution of the location parameters used as input to evaluate the location quality factor.

Figure 11 .
Figure 11.(a) Mean number of phases (brown) and mean quality factor (purple) as a function of magnitude.(b) Mean number of events versus magnitude, divided by quality classes.(c) Mean quality factor (qf) versus magnitude, divided by quality classes.

Figure 12 .
Figure 12.Maps and cross-sections of the events divided by the classes defined in section 3.3.

Figure 13 .
Figure 13.Frequency magnitude distributions for the enhanced catalogue (red) and RefCat (green).The b and a values and their respective uncertainties are computed using a maximum-likelihood assessment

Figure 14 .
Figure 14.Mean local magnitude (M L ) in the enhanced catalogue (black) versus mean local magnitude in the RefCat catalogue (red).Dark and light grey points represent, respectively, the daily maximum and minimum of the local magnitude in the enhanced catalogue; orange and yellow represent the daily maximum and minimum of the local magnitude in the RefCat catalogue.

Figure 15 .
Figure 15.Mean quality factor (black) and number of stations (red) versus time for the enhanced catalogue.

Table 2 .
Parameters involved in the STA/LTA triggering procedure.

Table 3 .
Subnetworks name, configuration and related parameters involved in the STA/LTA triggering procedure.