Sources of uncertainties and artefacts in back-projection results

earthquakes, composed of multiple subevents with varying focal mechanisms, BP tends to image subevents emanating large amplitude coherent waveforms, while missing subevents whose P nodal directions point to the arrays, leading to discrepancies either between BP images from different arrays, or between BP images and other source models. Using the Java event, we investigate the impact of 3-D source-side velocity structures. The 3-D bathymetry together with a water layer can generate strong and long-lasting coda waves, which are mirrored as artefacts far from the true source location. Finally, we use a Wharton Basin outer-rise event to show that the waveﬁelds generated by 3-D near trench structures contain frequency-dependent coda waves, leading to frequency-dependent BP results. In summary, our analyses indicate that depth phases, focal mechanism variations and 3-D source-side structures can affect various aspects of BP results. Thus, we suggest that target-oriented synthetic tests, for example, synthetic tests for subduction earthquakes using more realistic 3-D source-side velocity structures, should be conducted to understand the uncertainties and artefacts before we interpret detailed BP images to infer earthquake rupture kinematics and dynamics.

BP methods can be roughly divided into beamformingstyle methods and frequency-domain methods. Beamforming-style methods include conventional beamforming , Nthroot stacking ), Phase-weighted stacking Tan et al. 2019), Hybrid Back Projection (Yagi et al. 2012), Image Deconvolution Back Projection (Wang et al. 2016), etc. The idea behind beamforming-type methods is straightforward and intuitive: these methods take advantage of reciprocal symmetry between sources and receivers, and determine the location and timing of the HF sources by stacking the back-projected waveforms to the source region. However, beamforming is burdened by the so-called swimming effect [i.e. strong artefacts swimming towards the array (Ishii et al. 2007)], although strategies like the reference window and station (e.g. Zhang & Ge 2010) and crosscorrelation (e.g. Yagi et al. 2012), can lessen this effect. In contrast to the beamforming-style BP, frequency-domain methods, such as MUltiple SIgnal Classification [MUSIC (Meng et al. 2011)] and Compressive Sensing [CS (Yao et al. 2011)], assume non-coherent signals from various sources and noise, and search for the most coherent radiators in the frequency domain. MUSIC and CS are much less affected by the swimming effect, and are considered of higher resolution (Yao et al. 2011;Meng et al. 2012a), although their implementations are usually more sophisticated, requiring much higher computational costs.
While different BP methods produce generally similar results for great earthquakes when similar data sets are used, these results, in many cases, are interpreted inconsistently. For instance, based on their BP results for the 2012 M w 8.6 Wharton basin earthquake sequence, Fan & Shearer (2016a,b) proposed that a large earthquake could dynamically trigger near-source aftershocks, while Yue et al. (2017) contended that the detected early aftershocks could be ascribed to the artefacts produced by the coherent reverberations of seismic waves in the sea water. Similarly, several BP results derived from the North American array (e.g. Koper et al. 2011;Wang & Mori 2011) suggest the frequency-dependent rupture process of the 2011 Tohoku earthquake, while Meng et al. (2012a)'s BP results for synthetic and real data showed that the frequency drift phenomenon in the 1-8 s band could be largely explained as the array-related swimming effect in beamforming style methods and relatively weak P waves. The discrepant interpretations raise the question how to assess the BP results, or whether (and to what extent) BP results are biased or contaminated by potential artefacts from other sources and structures. Therefore, to better understand BP results and use them to infer rupture kinematics and dynamics, we should have a better understanding of the uncertainties of BP results. In general, uncertainties of BP results are contributed from the BP methods and the data. The uncertainty of BP methods can be expressed as the array response function (ARF), which shows to what extent the delta pulse input in the source region can be recovered by BP, as discussed in some studies (e.g. . For this type of uncertainty, it is possible in some cases to obtain the error bars (e.g. Meng et al. 2012b) to show the confidence level, using strategies like bootstrapping. The uncertainty contributed from the data can be divided into two categories. The first category is related to the traveltime error, which is in general caused by the source-side 3-D velocity structure. Such traveltime error can lead to the mislocation of the high frequency radiator.  analysed BP results of relatively small events in Japan subduction zone (i.e. M w 6 + earthquakes) and showed the traveltime error might cause ∼20 km location error. The location error can be either random, if the traveltime error is caused by small scale velocity structure heterogeneities (e.g. , or systematic, if the traveltime error is caused by the lithospheric scale velocity structure variation (e.g. Liu et al. 2017). The second category along with other factors is related to the waveforms complexities that are caused by any source but rupture process. This type of waveform complexity is the main target of this paper, which will be explained in greater details.
Efforts to lessen the uncertainty of BP methods have also been proposed. Meng et al. (2016) proposed the use of aftershocks to calibrate the traveltime error caused by the source-side 3-D velocity structure, and applied it to the 2015 M w 7.8 Gorkha, Nepal, earthquake. Meng et al. (2016) showed that path calibration based on aftershocks could reduce the discrepancies of BP results derived from different arrays. Also, Liu et al. (2017) showed that using a 3-D global P-wave tomography velocity model could reduce the error in traveltime calculation, and thus improve the consistence between different arrays. To mitigate the interference from the depth phases, Yagi et al. (2012) proposed a hybrid BP method, which cross-correlates the teleseismic waveforms with theoretical Green's functions (GFs), and then back-projects them to derive rupture images. Similarly, deconvolution BP proposed by Wang et al. (2016) extracts the rupture evolution of the main event by deconvolving beamforming waveforms of the main shock from a reference event (e.g. an aftershock).
However, these previous studies have not yet fully addressed the cause of waveform complexities, an intrinsic deficiency in BP methods that is originated from the oversimplified GF assumption. Any systematic perturbation of the wavefield stemming from factors other than the simple rupture may generate uncertainties and artefacts in the BP results. For example, besides direct P waves, other following phases (e.g. pP, sP depth phases) exhibit strong and coherent waveforms that may generate artefacts. Although the aforementioned convolution and deconvolution efforts (e.g. Yagi et al. 2012;Wang et al. 2016) are aiming at reducing the contamination from depth phases, they are subject to the accuracy of either theoretical or empirical GFs. In the cases of large earthquakes, the depth range of the rupture is always much larger than that of the aftershocks, and usually composed of multiple fault segments with different geometries and slip directions. Therefore, the improvement based on simple point source GF convolution or deconvolution is limited. Besides, velocity structure complexities (e.g. topography/bathymetry, coexistence of solid and liquid materials, and heterogeneous velocity structures) are known to produce strong and coherent P waves, which could likely be mapped onto source processes. For instance, the long-lasting coda of the 2015 M w 8.2 Illapel earthquake triggered debates on whether its source duration is 100 s (Melgar et al. 2016) or more than 200 s (Lee et al. 2016). Yue et al. (2017) also showed that seismic wave reverberations in the water layer above the sharply varying bathymetry could produce strong BP signals, which could then be misinterpreted as aftershocks or rupture process. In addition, structural complexities could produce frequency-dependent path effects that may also contaminate BP imaging. Indeed, frequency dependent energy radiation from an earthquake is evidenced in a few cases (e.g. Simons et al. 2011;Avouac et al. 2015;Yao et al. 2011), but interpreting discrepancies between BP results at different frequencies as frequency-dependent ruptures could be biased by the frequency-dependent wave propagation effects (e.g. Okamoto et al. 2018;Wu & Irving 2018). Furthermore, it is quite common to see different arrays produce inconsistent BP results, which are usually attributed to different Pwave radiation patterns towards different arrays, and results from different arrays are simply merged to produce a final rupture image (e.g. Meng et al. 2011;Zhang et al. 2017). Recently, Yin & Denolle (2019) investigated the relationship between the array resolvability and the earthquake kinematic rupture models, through a synthetic test approach. This work is inspiring, however, their synthetic calculation is based on a homogeneous full space velocity model, therefore the depth phases and source-side 3-D structure are not incorporated.
In general, an advantage of BP methods is little requirement of prior information of sources and/or structure complexities by intrinsically simplifying or ignoring these complexities, allowing quick imaging of the earthquake rupture process. However, reality is that any complexity precluded by BP is complicating seismic waveforms, and therefore could be back-projected as artefacts onto BP images, bringing difficulties and ambiguities into interpreting BP results. How these complexities influence BP images should be investigated and understood in more details. A way to improve the understanding of BP results is to combine the advantage of BP methods with our knowledge of sources and structures as much as possible. As we stated above, various factors have been recognized as the potential sources of BP signals and some of them have been discussed since the first BP result was published. However, they have not yet been thoroughly investigated in a quantitative way as it is difficult to isolate every origin of seismic wiggles as well as their BP images. To get around these difficulties, we adopt a synthetic test strategy, in which we can test the impact of each influencing factors by back-projecting the corresponding synthetic data. We investigate the uncertainties caused by depth phases, focal mechanism variations and 3-D source-side velocity structures, which can all produce coherent teleseismic P waves (Fig. 1). To test these factors one by one, we design several scenarios with various sources and velocity set-ups to generate synthetic seismograms, which are then backprojected using the MUSIC method (see more method details in the supplement). The BP results are then compared with the known inputs, to understand the sensitivity and impact of these factors. In each test, representative real earthquakes are also analysed, to compare and verify the results from synthetic tests.

B P M E T H O D , E A RT H Q UA K E S A N D A R R AY C O N F I G U R AT I O N S
To minimize the swimming effects in beamforming style BP methods and to reach higher resolution, we adopt a frequency domain BP method, MUltiple SIgnal Classification (MUSIC) method (Meng et al. 2011, Fig. S1). During waveform data processing, we first filter the synthetic data to a specific frequency range (e.g. 0.1-1 Hz) and normalize each trace by its individual standard deviation. Then, we align the first 5 and ∼10 s of P waveforms for synthetic data and real data, respectively, using the Multi-Channel Cross Correlation method (Vandecar & Crosson 1990;Lou et al. 2013), and keep only the data with identical first P polarity and with cross-correlation coefficients greater than 0.8 with a reference waveform. We then discretize the source region into small grids, and back-project the teleseismic P waves to the grids to determine the most likely sources. We apply this procedure to both synthetic and real data.
We test scenarios for four representative regions, based on historical earthquakes and available seismic arrays. We select the Tohoku region, Japan (Fig. 2a) to test the influence of depth phases. This region is surrounded by three large aperture arrays (North American array, European array and Australian array), which provide good azimuthal coverages. For focal mechanism variation test, we select the Kaikoura region in New Zealand (Fig. 2b), as that is the location of the 2016 M w 7.8 Kaikoura earthquake, which is characterized by multiple fault segments with various fault geometries and is considered to be one of the most complicated events in modern observational era. Finally, for the source-side 3-D velocity structure tests, we choose the Java and the Sumatran subduction zones in Indonesia (Figs 2c and d), which are characterized by complicated velocity structures (e.g. coexistence of solid, deep water and strongly varying bathymetry), that are common features for most subduction zones.
In the depth phase and focal mechanism tests, we use a simple 1-D velocity model for the source region to generate the synthetics with a fast numerical code tel3 (Kikuchi & Kanamori 1991;Qian et al. 2017). In 3-D source-side velocity structure tests, we calculate the corresponding 1-D and 3-D synthetic waveforms using a hybrid numerical method named SEM-DSM . This method uses the spectral element method (SEM) to compute wavefields at the boundaries of the source-box, and then propagates them to teleseismic distances using the 1-D GFs pre-computed by the direct solution method (DSM). This hybrid method is able to compute high frequency teleseismic P waves efficiently, which is critical for our purpose.

Depth phases tests
To generate synthetic waveform data, we use a shallow dip angle thrust focal mechanism (Fig. 3c) with a source duration of 1 s at the epicentre location of 39 • N, 142 • E (Fig. 2a). We place the source at the depth of 10 and 30 km, respectively. With a 1-D velocity model, we compute synthetic waveforms for stations in the North American array, European array and Australian array (Fig. 1a). After filtering the waveforms to 0.1-1.0 Hz and processing the waveforms following the procedure mentioned in Section 2, we back-project them (Figs 2a and b) to the source region. Fig. 3 displays the synthetic waveforms (a-b) and the corresponding BP results (c-h). For both events and all three arrays, direct P waves are very coherent (Figs 3a,b), and as expected, they are backprojected as HF radiation signals around (<5 km) the input location (Figs 3c-e). Besides, depth phases (pP, sP) for both events are also coherent at these three arrays, and their strengths are comparable to the direct P waves (Figs 3a and b). As a consequence, these delayed phases are mirrored as artefacts with timing consistent with the arrival times of the depth phases. These artefacts migrate systematically towards the arrays and the offsets are proportional to the depths of the input sources, that is, a ∼7 km offset at ∼5 s after the direct P waves for the 10 km source depth, and a ∼20 km offset at ∼10 s after the direct P waves for the 30 km source depth. The ray path analysis show that the depth phases are back-projected to their reflection points on the surface (Fig. S2). We also observe array discrepancies, that is the artefacts from the North American and Australian arrays migrate further than those from the European array. This is because the pP reflection point (pP phase dominates the North American Array) is located further away from the input epicentre compared with the sP reflection point (sP phases dominate the European array, and pP and sP co-dominate the Australian Array) (Figs 3a, b, Fig. S2). The estimated rupture speed, if we treat the depth phase images as subevents, is ∼1.5-2.0 km s -1 , which is within the range of rupture speed of many earthquakes and therefore can easily mislead the rupture speed estimation.
To better illustrate the impact of depth phases, we also select two earthquakes with moderate magnitudes in the Tohoku region, and back-project their teleseismic P waves recorded by the same arrays as in the synthetic test. One of these earthquakes is an M w 6.7 normal event ( Fig. 4c), which occurred on 11 April 2011, in the inland region of Honshu with a centroid depth of ∼13.0 km (GCMT), and is considered as a shallower event. The other is an M w 6.8 thrust event (Fig. 4d) that occurred on 12 May 2015, offshore Honshu, with a centroid depth of ∼43.9 km (GCMT) and therefore considered as a deeper event. After aforementioned data processing, we back-project the selected waveforms to the source regions using the MUSIC method. As shown in Figs 4(a) and (b), the coherence and strength of depth phases in the real data are similar to that in the synthetic data, although they appear a bit noisier and less coherent. The depth phases in the waveforms of the shallower event are bit more difficult to discriminate from the direct P wave, as they are mixed together. The waveforms of the deeper event exhibit well separated depth phases in the North American and Australian arrays, but only the direct P phase is visible on the European array, due to weak depth phases.
The BP results of the two events are shown in Figs 4(c)-(h). For the shallower earthquake, the recovered HF radiators from all the three arrays are compact, located within 10 km of the epicentre, in which the depth phase is imaged to the location very close to that from the direct phase, as expected from the waveforms. For the deeper earthquake, the BP results from the Australian and American arrays show a similar array-wards migrating depth phase pattern, similar to that in the synthetic tests. These artefacts are located ∼50 km (the North American array, Fig. 4f) and ∼30 km (the Australian array, Fig. 4h) away from the epicentre, at ∼20 s after the beginning of the rupture, which is again consistent to the synthetic tests, leading to a rupture speed of ∼2 km s -1 . The European array results do not suffer from this artefact because of the weak depth phases (Fig. 4b). Note, that the swimming direction ( Fig. 4h) is slightly different from that observed in synthetic test (Fig. 3h), that could be caused by the azimuth distribution ( Fig. 4b) of the used array is not exactly the same as that in the synthetic tests (Fig. 3b). The difference of the focal mechanisms between the real event and that used in the synthetic test could also make some contributions.

Focal mechanism variation test
We select the 2016 M w 7.8 Kaikoura earthquake, New Zealand, to test the rupture scenario that involved complex subevent focal mechanisms. This earthquake was one of the most complex events, and ruptured at least 12 fault segments with varying fault geometries and slip directions, as evidenced by geodetic and seismological data (Wang et al. 2018). Several BP results for this earthquake from different arrays all imaged a unilateral rupture towards the northeast Xu et al. 2018), however, these results differ significantly in the imaged termination location. For example, the Australian array shows termination around the Papatea fault, while the South American array results extend further northeast along the Kekerengu fault , which agrees better with the surface rupture shown in geodetic and field observations (Wang et al. 2018). To describe the kinematic rupture process of earthquakes, these differing BP results are usually simply weighted and stacked to form the rupture image of the earthquake , as is usual for other cases (e.g. Meng et al. 2012b).
The array dependent BP results are likely caused by focal mechanism variations as the rupture progressed, with the last ruptured segment on the Kekerengu fault dominated by strike-slip motion, which differed from the dominant motions on beginning part of the rupture. To further verify and quantify this explanation, we use a multiple point source model simplified from the finite fault model of Wang et al. (2018) and generate synthetic data for tests. For each segment in Wang et al. (2018), we use a point source with a focal mechanism, timing and horizontal location determined by averaging the parameters on each fault segment, except for the largest plate boundary fault segment, which is represented by three point sources (Fig. 5b). To simplify the analysis and focus on the impact of the varying radiation pattern, we fix all the point sources at a depth of 5 km to reduce the complexities introduced by depth phases (e.g. pP and sP). Each point source is assigned the same 1-s source duration and the same 6.0 moment magnitudes. We use this set of multiple point sources to generate 1-D synthetic data (Fig. 5a) for the South American and Australian arrays. These data are then back-projected to the source region by the MUSIC method.
Figs 5(b)-(d) displays the BP results from the Australian and South American arrays. In the first 50 s, the two results are quite similar, both having recovered the source locations between the epicentre and the Papatea fault. However, for the last 20 s of the rupture, the Australian array images only the last point source with very weak energy. In comparison, the South American array BP recover the last source with much stronger energy. Decomposing the total waveforms into contributions of individual source ( Fig.  Figure 3. (a-b) Selected velocity waveforms (0.1-1.0 Hz) for the 10 km (a) and 30 km (b) source depths on the European (red), North American (blue), Australian (black) arrays. (c-h) BP results of waveforms in (a-b) with the array names and input source depths indicated. The beach ball in c shows the focal mechanism of the synthetic earthquake. The black stars denote the epicentres. The circles are BP results, scaled by the peak stacked-power and coloured by the centre time of BP time window relative to the hypocentral time. S3) reveals that the strength of the last point source shows much larger amplitude in the South American array than the Australian array. In the first 40 s, the timing and amplitude of the normalized beamforming power, which is obtained by tracking the slant-stacked HF waveforms in the BP radiators' location, are quite consistent for the two arrays (Figs 5e and g). However, in the later time window (50-60 s), the peak of the beamforming power appears about 10 s earlier in the South American array than the Australian array. This is also primarily caused by focal mechanism variations, that is, the Australian array receives stronger energy from the 11th source, while the South American array records larger amplitude waves from the 9th and 10th sources (Fig. S3). The beamforming power at ∼70 s is much larger according to the South American array than the Australian array, due to the difference in sampling the radiation pattern for the 14th source (Fig. S3). Some sources are imaged with either no or weak power (e.g. the 6th and 13th sources from the Australian array, and the 7th and 13th sources from the South American array), because their waveform amplitudes are relatively small, or these sources are temporally and spatially close to sources with greater amplitudes. In general, the South American array recovers the input sources better than the Australian array. The shape of the beamforming power of the South American array is also slightly more similar to the strength of the moment-rate function (Figs 5c, e and g).

3-D source-side velocity structure test
The 3-D source-side velocity structure can significantly modulate seismic waveforms (e.g. by generating strong and long-lasting codas). This effect increases with frequency as the high frequency waveforms are more sensitive to fine structures (e.g. Okamoto & Takenaka 2009;Qian et al. 2019). In general, determining the impact of the 3-D source-side structure on BP as well as other source inversion procedures is very challenging in three aspects. First, distinguishing the coherent structure-induced signals from source signals is very difficult, especially in high frequency waveforms. Secondly, accurately calculating high-frequency 3-D waveforms is time consuming, particularly for teleseismic distances. Thirdly, a highly resolved structure model that would enable observations to be matched deterministically is usually rare. In this study, to distinguish structure-induced signals from source signals, we use simple point source or simplified finite fault models in our simulations. Next, to accurately calculate the teleseismic 3-D synthetic waveforms, we adopt an efficient hybrid numerical method, SEM-DSM, proposed by . Finally, although high-resolution 3-D velocity models are scarce, we use the available topography and bathymetry model, SRTM15 PLUS (Becker et al. 2009;Olson et al. 2014), resolved to ∼500 m, for our high-frequency waveform simulations. Our simulation details are described below and in Section 3.4.
For the 3-D source-side velocity structure test, we select the Java subduction zone, where large interplate earthquakes occur frequently, with many events rupturing very close to the trench (e.g. the 1994 M w 7.7 and the 2006 M w 7.7 tsunami earthquakes). This area, as well as other trench systems, is known to have an extremely complex velocity structure, characterized by sharply varying bathymetry, up to 10-km water layer, and usually thick sediments. In our simulations, we choose the source box to cover the rupture area of the 2006 M w 7.7 tsunami earthquake, which has been recorded by the dense Eurasian array. The dimension of the source box is 8 • × 8 • × 100 km (lon. × lat. × depth), which is large enough to ensure that the influence of numerical errors from box boundaries onto the BP is negligible (Fig. S4). In the source box, the water layer, bathymetry and topography are incorporated, with the PREM (Dziewonski & Anderson 1981) used as the background velocity model. For the sea region, we fill the sea with the property of water; for land area, we set the material properties above sea level to be the same as in the first layer in PREM. We do not change the interface geometry below the first layer, as the first layer in PREM has a thickness of 15 km and the velocity structure from the second layer is the same as PREM (Fig. S5). The bathymetry and topography data are extracted from the SRTM15 plus model (Becker et al. 2009;Olson et al. 2014). Here, we do not implement the complex heterogeneous velocity model in the simulation, because, first, no such 3-D velocity model is available in this region, and secondly, we focus mainly on the water reverberation phases, which are most sensitive to bathymetry and the water layer (e.g. Yue et al. 2017;Qian et al. 2019). More complex velocity models (e.g. sediments and slabs) can be incorporated in the future tests (see more details in the 'Discussion' section). In the simulation, we discretize the source box into 512 (latitude) × 512 (longitude) × 50 (depth) meshes, in which the water layer is divided vertically into five elements, which results in a numerical accuracy of ∼1.7 s. To compute the 3-D synthetics, we use a point source and place it at the USGS epicentre (9.284 • S, 107.419 • E) and the GCMT centroid depth (20 km) (https://www.globalcmt.org/CMTsearch.html) of the 2006 M w 7.7 Java earthquake with a source duration of 4 s. We then generate the 3-D synthetic waveforms in the Eurasian array (Fig. 2c). We avoid a complex source input here to minimize the confusion caused by both source and structure, so that we can focus on the 3-D structure impact. Synthetics are also calculated for the PREM without and with a 4-km water layer, which approximates the average water depth in this region. To simplify our description, we name these waveforms as 3-D, 1-D-solid and 1-D-water waveforms, respectively. To better understand the waveform features in both frequency and time domain, we filter these waveforms to 0.01-0.6 Hz, and conduct a continuous Morlet wavelet analysis, which is implemented in the Python package, Obspy (Figs S6-S9). The spectrogram of 3-D velocity waveforms displays a remarkable feature of high energy bursts in the 40-80 s time window with dominant frequency of ∼0.15 Hz (Fig. S6), while 1-D-solid and 1-D-water waveforms are much simpler, with negligible coda wave amplitudes after 40 s (Figs S8 and S9). This 0.15 Hz energy burst in 3-D waveforms is ubiquitous even when we use a shorter the source duration of 2.0 s (Fig. S7). Therefore, we filter the waveform to 0.1-0.5 Hz to cover the frequency that dominates the strong coda in the 3-D synthetics. We then back-project these synthetic data with the MUSIC method. Fig. 6(a) displays the waveform comparisons among 1-D-solid, 1-D-water and 3-D synthetic data. As shown, except for the additional phases in 1-D-water column that modulate the waveforms at 10-15 s after the first P wave, the 1-D-solid and 1-D-water waveforms are quite similar and simple. In contrast, the 3-D waveforms are much more complicated, characterized by strong coda waves that persist 70 s after the direct P waves. The strength of coda waves varies with azimuth, where the strongest coda waves are generated towards the northwest direction, and only weak coda waves are excited towards the northeast diction. Strikingly, the coda energy features an obvious temporal pattern, which can be roughly grouped into two stages. In the first stage, the coda energy follows the depth phases from 15 to 20 s after the direct P waves (the black pluses in Fig. 6a). In the second stage, the coda waves become even stronger start 40-50 s after the direct P waves, and last for about 30 s. The coda waves in the second stage are very coherent, and characterized by a clear azimuthal-dependent moveout (marked as the green pluses in Fig. 6a). This kind of temporal and spatial patterns may indicate varying origins of the coda waves.
The BP results for the 1-D-solid, 1-D-water and 3-D synthetic data are displayed in Figs 6(b)-(g). In line with their waveforms, back-projecting 1-D-solid and 1-D-water synthetics derives very simple and clean images, except that the depth phases are also imaged as artefacts, as we demonstrated in Section 3.1. However, the BP results from 3-D waveforms are complicated and intriguing. The HF radiators at 15-25 s are powered by stronger relative energy compared with that in the 1-D-solid result (the circle size in (c) is greater than that in (e), which corresponds to the first-stage coda waves likely generated by the water reverberation above the input source. The coda waves in the second stage are back-projected as an HF radiation swarm located ∼120 km NNE of the input source. The location of this swarm coincides with the sharp bathymetry variation. Note, that the water layer there is ∼1-2 km, corresponding to a resonance frequency of ∼0.2-0.3 Hz, indicating this swarm or the coda energy in this time window is generated by the water resonance along with the trapping effect caused by bathymetry. The normalized beamforming power exhibits strong energy from 40 to 90 s (Fig. 6c), in contrast with almost negligible power in the 1-D results (Figs 6e and g). If one assumes these signals as high frequency sources from rupture or triggered aftershocks, the apparent rupture or wave propagation speed is ∼3-4 km s -1 , which is close to the shear wave speed. These coda waves most likely resulted from the interaction between bathymetry and SV or even SH (3-D effect) waves, which were in that way then converted into P waves. Obviously, such structure-induced BP signals should not be interpreted as the rupture process or triggered aftershocks.
We also provide BP results derived from the real data of the 2006 M w 7.7 Java earthquake in the same frequency band as that in the synthetic data BP (Fig. 7). Detailed analysis of the kinematic process of the earthquake is not the aim of this paper. Instead, we provide a preliminary BP result to verify our synthetic tests. Intriguingly, the BP images multiple high-frequency radiators bursting after 140 s, which are located between the source and the island. These radiators roughly aggregate into two swarms, as highlighted by the black circles in Fig. 7(b). The spatial distribution of BP signals after 140 s does not align with those from the first 140 s, which show a clear linear feature more or less parallel to the trench, as highlighted by the red circle (S1) in Fig. 7(b). The location of some of these late signals (S3) is consistent with the artefacts in our synthetic test. Therefore, S3 is likely caused by the same structure effect. The artefacts at the coast appear at about 140-160s after the first P-wave arrival, which is much later than shown in the point source synthetics test (∼80 after P wave in Fig. 6c). A possible interpretation of this discrepancy is that these artefacts shown in the real data result are likely not caused by the earliest rupture but those at about 70 s, which is corresponding to the first peak in the high frequency power (Fig. 7c). Similarly, the artefact at about 180 s is likely corresponding to the second peak of the power at about 100 s. The origin of S2 is probably more complex and intriguing, as we do not see a signal from S2 in the synthetic tests. This could be caused by a basin structure that is not considered in our synthetic test, or as stated in , splay fault rupture in the overriding plate. To further clarify the source of S2 burst, more deterministic waveform analyses may be needed, but these are beyond the scope of this paper.

Frequency-dependence of BP results
Frequency dependence is a major characteristic of wave propagation, because different frequency components of waveforms, which represent different sensitivity to the structure, may travel along different ray paths (or have different sensitivity kernels), even in a 1-D velocity model. This feature is enhanced in 3-D structures (e.g. diverse scatter distribution and size, attenuation, and heterogeneous structures), and is generally stronger at higher frequencies, as the sensitivity of wavefields to the fine 3-D structure increases with frequency. In addition, earthquakes usually rupture multiple-scale asperities and multiple fault segments with varying geometries, characterized by inhomogeneous rupture speeds and rise times, which can all lead to strong frequency-dependence of energy release. Therefore, the earthquake waveform records, being the convolution of the source and propagation processes, are characterized by frequency-dependent wiggles, and are much more complex at high frequencies. Consequently, back-projecting these waveforms will inevitably result in frequency-dependent features, originating from both sources and/or structures. Disentangling structure and source contributions is extremely challenging, especially when back-projecting high-frequency waveforms. Actually, frequencydependent BP results are usually attributed to the source process (e.g. Koper et al. 2011;Yao et al. 2013), but the source side 3-D velocity structure contribution has been largely ignored.
Here, we use the 2012 M w 7.2 Wharton Basin outer-rise earthquake as a scenario to investigate how frequency-dependence originating from the 3-D structure affects BP results. As we mentioned in the introduction, according to the BP results, the coda signals of this earthquake that arrived ∼50 s after direct P waves have been interpreted either as the dynamically triggered aftershocks or water reverberations of the triggered events (Fan & Shearer 2016a,b, 2018 or as water reverberations of the mainshock based on 2-D simulations (Yue et al. 2017). Here, to further clarify this debate, we focus on the frequency-dependent feature in waveforms and BP results. Similar to the strategy we adopted earlier, we generate synthetic data using a more realistic source-side velocity structure, and then back-project the synthetic data at two representative frequencies to further investigate the structural impact. We also compare the results of synthetic tests with those derived from real data to verify our synthetic test results.
The same SEM-DSM hybrid tool is used to generate 3-D synthetics at teleseismic distances for the Eurasian array, which was used by Fan & Shearer (2016a). The source box size and the meshing strategy used here are the same as those in Section 3.3. To make the source more realistic, we use a finite fault model made available by the USGS (https://earthquake.usgs.gov/earthquakes/ev entpage/usp000jdar/f inite-fault, Fig. S10) as input to calculate the synthetics. The finite fault model is simplified to 23 point sources (Fig. S10), which are aligned on the NWW oriented fault plane. The model releases all of its moment in 20 s (Figs 8c, f and S10). Then, we back-project the waveforms to the source region in two frequency bands, 0.05-0.08 Hz and 0.08-0.3 Hz. We also process the real data for the 2012 M w 7.2 earthquake, and back-project them in the same way as the synthetic data. Here, we mainly focus on relatively low frequencies (i.e. 0.05-0.08 Hz and 0.08-0.3 Hz), as the two bands differ greatly in the time-spectral domain (i.e. the 0.08-0.3 Hz band shows strong energy burst after 40 s, similarly to that we have observed in Section 3.3, while in the 0.05-0.08 Hz band, no energy burst is presented after 40 s (Figs S11-13). Another reason is that the finite fault model was derived at a relatively low frequency, and lacks high-frequency energy, which leads to dominant low frequency contents in the synthetic waveforms (Fig. S12). In the supplementary material, we also provide simulation results of a point source input modified from the GCMT solution of this earthquake and the corresponding BP results (Figs S11 and S14).
The synthetic and real waveforms, and the corresponding BP results, are summarized in Fig. 8. One can clearly see the frequencydependent features in both the synthetic and observed waveforms (Figs 8a,d,g,j). At 0.05-0.08 Hz, both sets of the waveforms show visible coda waves through the ∼100-s time window. The BP results for both data sets in this frequency range are quite similar (Figs 8b,c,h,i), with most of the radiators located near the hypocentre in the first 45 s, although BP results for the real data are distributed along a northwest line, while those for the synthetic data are more concentrated. Interestingly, both results show strong signals at ∼70-100 s, and the location of these radiators more or less overlaps with that from the first 45 s. The later signals are probably caused by the resonance of the P waves in the water right above the source.
At 0.08-0.3 Hz, the coda waves are much stronger than at 0.05-0.08 Hz, still quite coherent, and characterized by clear azimuthaldependent arrivals starting from ∼50 to 60 s after direct P waves (Figs 8d and j). Theses azimuthal-dependent arrivals are more obvious in the synthetics than the real data. Despite the discrepancies between synthetics and real data, which are probably caused by the imperfect structure and source models, the first-order features are fairly consistent. Here, we focus on this first-order complexity, rather than a wiggle-by-wiggle waveform match. For the beginning part of the P waves, BP results of both datasets show clear signals near the epicentre. However, the signals from the synthetic data only last for ∼20-30 s, while signals from the real data last for ∼50 s. This is because the finite fault model releases most of the energy in the first 20 s, while the actual energy release probably lasted for about 50 s from several subevents (see the strong and highly coherent P waves at ∼25 and ∼40 s in Fig. 8j). Also, a strong BP signal located ∼40 km to the northwest of the epicentre lasts from 50 to 90 s in the real data, where the synthetic BP result does not show strong signal. This could be due to the imperfect bathymetry (limited resolution) or other 3-D structures (e.g. a basin), both of which can produce long-lasting resonance. The most intriguing part of the result is the strong energy bursting near the trench at ∼60 s for both the synthetic data and the real data. This energy burst corresponds to the coherent coda waves shown in the waveform record sections. The real data waveforms are less coherent than the synthetic data, probably because they include contributions from both the structure near the epicentre and near the trench (i.e. the real data in this frequency band are composite waveforms), which could also explain the slightly different timing of the near-trench signals. This energy is likely caused by the water reverberations that are trapped near the trench, similar as another case in Illapel (Qian et al. 2019). The resonance frequency of the water reverberation lies in the tested frequency range (0.08-0.3 Hz), therefore produce the long lasted coda waves. The tilted bathymetry produces smaller incident angles for the waves shooting into the ocean than the flat bathymetry, so that more energy is transmitted into the water. The tilted bathymetry also produces larger reflection angles for the rays shooting into the solid earth, therefore more energy is reflected into the water.
In short, we show through synthetic and real data BP analyses that frequency dependence originating from the 3-D structure plays a pivotal role in modulating the wavefields of the 2012 M w 7.2 Wharton Basin outer-rise earthquake, leading to frequency-dependent BP images, which should be taken into consideration when interpreting the corresponding BP results for such earthquakes.

D I S C U S S I O N
We have analysed the impacts of the depth phases, the radiation pattern (or focal mechanism variation), and the 3-D source-side velocity structure on the BP results in our synthetic tests. Such waveform analyses can identify BP artefacts, and deepen our understanding of the sources of uncertainties in BP results.
For shallow (0-30 km) earthquakes, our analysis shows that the artificial signals induced by the depth phases are located up to ∼20 km away from the true source. The impact of depth phases in the 0-30 km depth range is probably secondary, given the uncertainty from other factors, like the traveltime error caused by 3-D sourceside velocity heterogeneities, which can probably result in ∼20 km spatial variation of BP results . However, for intermediate depth (40-100 km) earthquakes, such impact significantly contributes to the uncertainties, as the BP images of depth phases significantly deviate from those of the direct phases and the arrival time of the depth phases is within the earthquakes' duration time. For example, Ye et al. (2014) back-projected waveforms recorded by four teleseismic arrays to derive the rupture images for the 2014 M w 7.9 Rat Islands archipelago, Alaska, earthquake, which had a GCMT (https://www.globalcmt.org/CMTsearch.html) focal depth of 104 km. The beamforming peaks from different arrays differed by more than 50 km, featuring an array-ward shifting pattern. Such difference was possibly caused by the depth phases. One can expect that the depth phases' influence will become larger if the used array is located at the nodal orientation of the target earthquake. For earthquakes deeper than 200 km, the depth phases usually arrive much later than the duration of the earthquakes, and thus can be excluded from the BP time windows. Besides the artefacts caused by depth phases, earthquake depth can also influence the power or intensity of BP results (e.g. Okuwaki et al. 2018), leading to additional ambiguity in interpreting depth-dependent rupture. Yin et al. (2016) suggested BP results of higher frequency (e.g. 0.5-2.0 Hz) waveforms might be less affected by depth phases, as depth phases travel longer in the high attenuated shallow structures. Indeed, depth phases show different frequency characteristics and coherency than the direct phase. Such depth dependent effect is increasing with source depth. In addition, S wave, therefore sP phase, is easier affected by the structure above the source due to stronger attenuation and scattering effects. The stronger 3-D structure effect, including the scattering, topography and structure heterogeneity, also decreases the coherency of the depth phases. Fig. 9 shows the P waveforms at three frequency bands (0.05-0.2, 0.2-1.0, 0.5-2.0 Hz) for the 2014 M w 6.3 intermediate depth (D GCMT = 82.5 km) earthquake occurring in Kyushu, Japan. At all these frequency bands we  (b, e, h, k). We also show the projection of these BP results along profile AA '(b) in (c, f, i, l). The black lines and the triangles in (b, e, h, k) denote the trench location extracted from (Coffin et al. 1997). The blue lines in (c, f, i, l) show the bathymetry along profile AA'; the black lines in (c, f) indicate the moment rate function of the finite fault model; and the green lines in (c, f, i, l) refer to the relative beamforming power.
can observe depth phases in almost all azimuths. It clearly shows that, the sP phase is much stronger at 0.05-0.2 Hz than at higher frequency ranges. The coherency of the depth phases also decreases as the frequency increases. Similar phenomenon could be observed in the synthetic waveform as well (Fig. S15).
Our radiation pattern test demonstrates that for complicated earthquakes rupturing multiple segments with varying geometries, an individual array may only reveal a portion of the rupture process, and combining arrays from different azimuths and analysing them through forward (like our approach) and backwards (e.g. BP) ways should help better understand the rupture. This kind of analysis could be conducted before attributing the discrepancy between BP results and other source models (e.g. finite rupture models) to source complexities (e.g. frequency-dependent rupture), especially 888 Z. Hongyu, W. Shengji and W. Wenbo when a single array is used. Another complication caused by focal mechanism is the power of HF radiators. The normalized and stacked power at an array is usually treated as the relative strength of HF radiation energy during the rupture process. However, the focal mechanism variation, depth phases, and the 3-D source-side structure can modulate the power, and therefore the power may not accurately represent the high-frequency energy release from the source. Such deviation is strongest in the arrays that are located near the nodal orientation of P waves. One possible strategy to relief the uncertainty caused by focal mechanism variation is to use a global network instead of regional networks. However, the validation and resolution of the global array should be synthetically tested, as the waveforms recorded by global or large aperture arrays are much less coherent because of the wider azimuthal ranges in sampling the radiation patterns, and consequently the BP results may be smeared significantly . A better strategy to lessen the uncertainty might be to combine BP with other source inversion methods (e.g. finite fault inversion), as in this study or previously (e.g. Yagi & Okuwaki 2015). In previous studies, while both BP and finite fault results were presented together, different assumptions about what BP and finite fault inversions represent have led to issues that have not been resolved (e.g. how to combine the moment-rate function with power of the HF energy). Our strategy provides an alternative, with a more deterministic analysis of waveforms at broad-band frequency ranges. We foresee further progress when we combine more sophisticated tests (e.g. using dynamic and finite fault rupture models) and observational data.
When the BP frequency approximates the characteristic frequency of specific structural features (e.g. resonance frequency of a basin or water column), or the wavelength approximates structural dimensions (e.g. scattering size), the corresponding waveforms may be easily resonated or modified, therefore resulting in additional BP radiators. These effects may be ubiquitous. For example, a frequency of 0.2 Hz for P waves corresponds to a wavelength of several tens of kilometres, which is comparable to the thickness of slabs (Deal et al. 1999). The impact of slab structure should be considered in BP analysis to further improve the understanding of its contributions. In general, structure induced frequency dependent features are more restricted in a narrow frequency band, while rupture can produce much broader spectrum in the seismic signals. This feature provides a way to mitigate the impact of structure effect. For instance, one can avoid the specific frequency range used in BP once synthetic tests are conducted to show the strong structure effects. Also, synthetic tests can help discriminate the structure-induced BP artefacts, which may be characterized by specific frequency, timing and location, thus to avoid overinterpretation. The synthetic test strategy is tedious but pragmatic. We suggest to collect data of mediate size aftershock/foreshocks scattered in the rupture area of large earthquakes, carefully implement the source-side structure model and generate synthetics; then analyse the synthetics and their spectrograms, and compare the synthetics with real data to clarify contributions from structures. The obtained knowledge eventually should be applied to interpret the BP results and high frequency waveforms of great earthquakes.
In this study, we have not yet tested other aspects of heterogeneous velocity models, such as basins and scatters, which can also produce coherent and complicated 3-D wavefields. The main reason that stops us from testing these factors is lack of a high-resolution 3-D velocity model, and the requirement of its very careful handling, such as very fine meshes to adequately represent a basin model. Details of testing such models are left for our future efforts.
An important application of BP results is to infer the rupture velocity of earthquakes (e.g. , which is a critical parameter of earthquake physics (e.g. Kanamori et al. 1998). We have shown that the depth phase corresponds to a 'rupture speed' of 1.5-2.0 km/s, and the water phases can result in an apparent speed of 3.0-4.0 km s -1 , close to commonly accepted rupture speeds. Thus, factors that can contribute to the uncertainty and artefacts of BP should be carefully considered when estimating the rupture speed, especially the local rupture speed. Realistic cases could be even more complicated, and more careful and comprehensive investigations are required to consolidate BP results.

C O N C L U S I O N
Our tests have revealed how the depth phases, the focal mechanism variation, and the 3-D source-side velocity structure can coherently and systematically complicate waveforms recorded by seismic arrays, and thus could generate systematic uncertainties and artefacts in BP results. Such artefacts could be clarified and thus minimized by conducting simulations like 3-D waveform modeling, using a finite or multiple point source model with more realistic structure models, and then back-projecting the waveforms to compare the results with the input source. Although the subsurface velocity models in most regions are not accurate enough to resolve high-frequency waveforms, specific synthetic test designs can help to resolve ambiguities or avoid misinterpretation of BP results. We suggest that target-oriented synthetic tests and effective cross-validation should be performed before using BP results to characterize earthquakes and infer rupture physics.

A C K N O W L E D G E M E N T S
This project is supported by the Earth Observatory of Singapore grant (M4430255). All seismic data and the station codes are collected through the Data Management Center of the Incorporated Research Institutions for Seismology (IRIS), using Wilber 3. All the figures are plotted using the Generic Mapping Tools (GMT), and the Python Matplotlib package. The seismic data are processed using the Python package, Obspy and the waveforms are aligned and selected using the Python package, aimbat (Lou et al. 2013). We thank Pavel Adamek for his help in clarifying and improving several aspects of the writing. We appraciate comments and suggestions from Jiuxun Yin, Zengxi Ge and editors, which help to improve the quality and presentation of the manuscript. This work comprises Earth Observatory of Singapore contribution no. 268. This research is partly supported by the National Research Foundation Singapore and the Singapore Ministry of Education under the Resarch Centers of Excellence initiative.

S U P P O RT I N G I N F O R M AT I O N
Supplementary data are available at GJ I online. Figure S1. (a) The dense (red triangles) and sparse array configurations (blue triangles). (d) The synthetics with Gaussian noise) and (g) pure Gaussian noise waveforms in the sparse array. Here, we don't show the waveforms in dense array because waveforms in the sparse array are down-sampled from the dense array. The black rectangles in (d) and (g) mark the time window in which the BP is done. (b, c, e, f, h, i) The BP snapshots corresponding to the time window indicated in (d) and (g), with used methods, arrays and waveforms indicated in the title. The MUSIC and CS results are rescaled to 0 to 1, indicated by the colourbar at the bottom. Figure S2. Illustration of the generation of pP (red lines) and sP (blue lines) waves for source depth fixed at 10 km (a) and 30 km (b), respectively. The red and blue numbers show the distance from the reflection points of p and s waves to the epicentres. Here, we set the teleseismic distance to be 50 • , and a PREM model is used to calculate the reflection points of P and S waves. Figure S3. Decomposition of representative synthetic seismograms (0.1-1.0 Hz) in Australian Array and South American Array into individual waveforms generated by different synthetic point sources for the focal mechanism tests. Figure S4. Comparison of waveforms and back-projection results between two box setting, 8.0 × 8.0 • × 100 km, and 7.0 × 7.0 • × 100 (lat. × lon. × depth). (a-c) Comparison for Java case (Section 3.3). (d-i) Comparison for Sumatra case (Section 3.4). In the legend, 8.0 degree means the waveform comparison or BP result comparison is for the 8.0 × 8.0 • × 100 km box, while 7.0 • 7.0 × 7.0 • × 100 km box. Figure S5. Implemented S-wave velocity structures in the first 30 km in depth for Java (a) along the profile AA' in Fig. 6(b), and Sumatra (b) along the profile AA' in Fig. 8(b). Figure S6. Spectrogram of the 3-D synthetic velocity data (EIL station, latitude 29.67 • , longitude 34.95 • ) for a point source model modified from the GCMT solution of the 2006 M w 7.7, Java, Indonesia earthquake. The waveform is filtered to 0.01-0.6 Hz. Here we use the source duration of 4 s to make it more energetic for high frequency contents. Figure S7. Same as Fig. S6 except the source duration is shortened to 2 s. Figure S8. Same as Fig. S6, except the spectrogram is for the 1-Dsolid synthetic velocity. Figure S9. Same as Fig. S6, except the spectrogram is for the 1-D velocity model with water layer. Figure S10. The finite source model of the 2012 M w 7.2 Sumatra earthquake (https://earthquake.usgs.gov/earthquakes/eventpage/us p000jdar/f inite-fault). The red line in the main picture refers to the used rupture trace, and purple star the USGS epicentre. The left insert shows the temporal evolution, which is indicated by the colourbar, and spatial distribution of sources along the rupture trace. Here, the focal beach balls are projected to the vertical plane along the redline in the figure. The right insert shows the moment rate of the source model. Figure S11. Spectrogram of the synthetic velocity data (EIL station, latitude 29.67 • , longitude 34.95 • ) for a point source model modified from the GCMT solution of the 2012 M w 7.2, Wartion Basin, Indonesia earthquake. Here we use a half duration to 2 s for source time function. Figure S12. Same as Fig. S11, except the USGS finite source model is used to generate the synthetics. Legends are the same as in Fig.  S11 (EIL station, latitude 29.67 • , longitude 34.95 • ). Figure S13. Spectrogram of the real velocity waveform data of the same station as in Fig. S11. Figure S14. (a, d) Synthetic waveforms at 0.05-0.08 Hz (a) and 0.08-0.3 Hz (d) generated by using the GCMT solution. The corresponding BP results are shown in (b, e). We also shown the projection these BP results along profile AA'(b) in (c, f). Black lines and triangles in (b, e) denote trench location extracted from [Coffin et al. 1997]. Blue lines in (c, f) show the bathymetry along profile AA', and green lines in (c, f) refer to the relative beamforming power. Figure S15. 1-D synthetic velocity waveforms of the 2014 M w 6.3 Kyushu, Japan earthquake (GCMT solution, https://www.glob alcmt.org/CMTsearch.html) at three frequency bands 0.05-0.2 Hz, 0.2-1.0 Hz, 0.5-2.0 Hz computed by using PREM model with no attenuation (black), low attenuation (blue, Q p = 300, Q s = 150) and high attenuation (red, Q p = 70, Q s = 35), in the shallow part of the model. We use shorter (1 s) source duration to ensure sufficient high frequency energy in the synthetic waveforms. The focal solution used here is 251 • /30 • /163 • (strike/dip/rake).
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper.