Recording earthquakes for tomographic imaging of the mantle beneath the South Paciﬁc by autonomous MERMAID ﬂoats

We present the ﬁrst 16 months of data returned from a mobile array of 16 freely-ﬂoating diving instruments, named MERMAID for Mobile Earthquake Recording in Marine Areas by Independent Divers, launched in French Polynesia in late 2018. Our 16 are a subset of the 50 MERMAID s deployed over a number of cruises in this vast and understudied oceanic province as part of the collaborative South Paciﬁc Plume Imaging and Modeling (SPPIM) project, under the aegis of the international EarthScope-Oceans consortium. Our objective is the hydroacoustic recording, from within the oceanic water column, of the seismic waveﬁeld generated by earthquakes worldwide, and the nearly real-time transmission by satellite of these data, collected above and in the periphery of the South Paciﬁc Superswell. This region, characterized by anomalously elevated oceanic crust and myriad seamounts, is believed to be the surface expression of deeply-rooted mantle upwellings. Tomographically imaging Earth’s mantle under the South Paciﬁc with data from these novel instruments requires a careful examination of the earthquake-to- MERMAID travel times of the high-frequency P -wave detections within the windows selected for reporting by the discrimination algorithms on board. We discuss a workﬂow suitable for a fast-growing mobile sensor database to pick the relevant arrivals, match them to known earthquakes in global earthquake catalogs, calculate their travel-time residuals with respect to global seismic reference models, characterize their quality, and estimate their uncertainty. We detail seismicity rates as recorded by MERMAID over 16 months, quantify the completeness of our catalog, and discuss magnitude-distance relations of detectability for our network. The projected lifespan of an individual MERMAID is ﬁve years, allowing us to estimate the ﬁnal size of the data set that will be available for future study. To prove their utility for seismic tomography we compare MERMAID data quality against “traditional” land seismometers and their low-cost Raspberry Shake counterparts, using waveforms recovered from instrumented island stations in the geographic neighborhood of our ﬂoats. Finally, we provide the ﬁrst analyses of travel-time anomalies for the new ray paths sampling the mantle under the South Paciﬁc.

Seismic data recorded in the global oceans are sparse in both spatial and temporal coverage, especially in the Southern Hemisphere. Fig. 1 maps the location of every seismic station for which, in principle, data are retrievable from the Incorporated Research Institutions for Seismology (IRIS), showing how underserved the oceans are relative to the continents. The history of seismic studies in and under the oceans, which are complex and costly, is short, and no single seismic instrument has yet combined the ability to deliver high-quality data with autonomy, low cost, low latency, and nim-bleness. Our instrument named MERMAID for Mobile Earthquake Recording in Marine Areas by Independent Divers ) fills a gap in instrumentation by providing low-cost seismoacoustic records suitable for global seismology (Simons et al. 2006b) from the oceans in near real-time  without the requirement of a research vessel for deployment and, being unrecovered, negating the need for a recovery cruise.
To place MERMAID in historical perspective: only about one hundred seismic records from the deep-ocean bottom existed by the early 1960s, according to Bradner (1964). Early attempts to instrument the oceans for regional and global seismology came in the form of seismometers dropped in free fall onto the seafloor from a ship, with a variety of mechanisms for recovery and data retrieval (Ewing & Vine 1938;Bradner 1964;Whitmarsh 1970). Progress toward true instrument autonomy came in the form of freelydrifting telemetered devices, either neutrally-buoyant mid-column floating versions of ocean-bottom sensors (Bradner et al. 1970), or sonobuoys, with a hydrophone loosely suspended from a surface buoy (Reid et al. 1973). Most of these experiments were shortlived due to power restrictions. Longer-lived moored sonobuoys (Kebe 1981) and hydrophones (Fox et al. 1993) provided continuous hydroacoustic data at the expense of requiring seafloor cables to power them, restricting their spatial coverage.
In the last three decades, ocean bottom seismometry with long-life robust, three-component broad-band sensors has flourished (Zhao et al. 1997;Webb 1998;Webb & Crawford 2003;. Nevertheless, to this day such instruments remain physically large and expensive to install (Beauduin et al. 1996;Collins et al. 2001), requiring a specialized research vessel for deployment and recovery (Stephen et al. 2003). Establishing semi-permanent installations Romanowicz et al. 2006) worldwide remains a developing goal for the international community (Montagner et al. 1998 Evolving from single-station cabled seafloor installations (Butler et al. 2000;Petitt et al. 2002;Romanowicz et al. 2006), multi-station, multi-instrument cabled arrays have been rooted on the seafloor offshore Japan (Hirata et al. 2002;Shinohara et al. 2014), the Canadian Northeast Pacific (Barnes et al. 2013;Matabos et al. 2016), and Oregon (Cowles et al. 2010;Toomey et al. 2014;Kelley et al. 2016). These installations provide highquality data with low latency, but they require massive upfront costs, demand costly maintenance, are limited by cable access and, being permanent, cannot be rapidly reinstalled or reassigned in the case of developing seismic crises (e.g., Duennebier et al. 1997).
The current fleet of recovered ocean bottom seismometers (OBS) is autonomous but unable to transmit data while deployed, hence data acquisition and processing are separated by months or years, unless catastrophe precludes recovery (Tolstoy et al. 2006). More recently, wave-powered gliders that float at the surface and may be remotely controlled to remain in the vicinity of an oceanbottom station have been used to relay data from seafloor to shore via acoustic modem and satellite uplink (Berger et al. 2016). This coupling of technologies allows the delivery of seismic data from the seafloor in near real-time. While they have shown promise, such solutions remain fragile and costly and they have not yet enjoyed large-scale deployment. Kohler et al. (2020) proposed a pilot experiment that would see the installation of a long-term broad-band seismic network on the seafloor utilizing the newest advances in wave glider and OBS technologies including in situ battery replacement. Such campaigns, where data are acquired autonomously and in near real-time and with instrument lifespans measured in years instead of weeks or months, will generate data sets complementary to those returned by MERMAID. Beyond gliders, still other solutions to the logistical problem of data recovery are currently being tested, including ocean-bottom systems that periodically release data pods from the seafloor, each with a self-contained telecommunications unit to relay data via satellite upon surfacing (Hammond et al. 2019). Finally, while the age where the cables themselves may act as seismic sensors appears to have arrived (e.g. Lindsey et al. 2019;Sladen et al. 2019;Williams et al. 2019), such technology is in its infancy.
While MERMAID's data sets of hydroacoustic time series, collected by a single hydrophone floating at mid-column water depths, forever will remain less "complete" in comparison with data sets recorded by a well-coupled three-component broad-band oceanbottom seismometer, the benefits provided by the instrument are many. These include its lower manufacturing costs, its logistical simplicity, its algorithmic flexibility (Sukhovich et al. , 2014 in selecting promising seismic phases to report with each surfacing, and its longevity-currently projected to be about five years (⇠250 dive cycles) on a single lithium battery charge. Fulfilling the promise of the first-generation MERMAID instrument ) and substantiating the record accumulated by MERMAIDs of the second generation (Sukhovich et al. 2015;Nolet et al. 2019;, the nearly 1400 records presented here, collected by the current third generation of instruments, are closing the seismic data gap in the world's oceans. Studying the interior of the Earth using seismic tomography (Nolet 2008;Romanowicz 2008;Rawlinson et al. 2010), primarily using P-delay times remains to date MERMAID's primary strength and objective.  and Nolet et al. (2019) have shown that the accuracy of MERMAID's position underwater, interpolated from multiple surfacings, and the accuracy with which the arrival time of seismic P phases can be determined from the often noisy acoustic records, are of sufficiently high quality to constrain velocities for tomographic inversion.  presented a new algorithm for the multiscale estimation of event arrival times and their precision, which closes the loop from detection and discrimination of P waves in the ocean, to the accurate determination of their travel times, to the assessment of their uncertainties.
In this paper we leverage all of these developments and present the first 16 months of data returned by the 16 MERMAIDs owned and operated by Princeton University that were deployed in French Polynesia in August and September 2018. We compare their waveforms with traces available from 20 seismic island stations in the same region, and with records from a set of five comparatively less expensive but increasingly more abundant Raspberry Shake instruments .
We study the statistics and completeness of our growing catalog of seismic data and estimate the total number of tomographicquality records that can be expected to be recorded by each MER-MAID over its projected five-year lifetime. We compute MERMAID travel-time residuals against the one-dimensional (1-D) ak135 velocity model of , correct those for bathymetry and MERMAID's cruising depth and, lastly, readjust them using the fully three-dimensional (3-D) and elliptical P-wave speed model LLNL-G3Dv3 of . We compute signal-tonoise ratios and estimate the uncertainties on our residuals, and compare these statistics with a complementary data set derived from traditional seismometers and Raspberry Shake stations installed on nearby ocean islands. Travel-time residuals will be the input data for future tomographic inversions that use our uncertainties as weights.
Finally, for a taste of the likely signals from the Earth's mantle that will emerge from our data collection we project our carefullymeasured residuals onto their 1-D ray paths to reveal average velocity perturbations that tomographic studies will further investigate.

THE MERMAID INSTRUMENT
The purpose of the MERMAID float is to return seismic data of tomographic quality from the global oceans in near real-time. The instrument (Fig. 2) and its dive cycle (Fig. 3) were inspired by oceanic floats (Swallow 1955;Rossby & Webb 1970;Davis et al. 1992Davis et al. , 2001, which have become ubiquitous in the global oceans (see Gould 2005, for historical perspective).
The international Argo program remains one such project of particular influence because it has been providing the scientific community with a wealth of temperature, salinity, and trajectory data over the last several decades (Lavender et al. 2000;Davis 2005;Roemmich et al. 2009;Abraham et al. 2013). Along with the payload required for in situ observations and hydrographic profiling, a contemporary Argo float is equipped with a hydraulic pump which modulates an expandable bladder that allows it to be neutrally buoyant at many mid-column depths, a Global Positioning  System (GPS) for location tracking, and a satellite link for data transmission. A typical Argo dive cycle begins with the instrument sinking to a depth between 1000-2000 m below the sea surface, where it passively drifts for around 10 days before resurfacing. During ascent, its conductivity-temperature-depth (CTD) sensor measures a roughly vertical column. Once at the surface it acquires a GPS fix, transmits the new data via satellite, and the process repeats. Because they are autonomous and drift at the whim of ocean currents Argo floats are practically guaranteed to sample the water column at a previously unsampled location every time they ascend. Some 4000 Argo floats are actively reporting from within every ocean on Earth, and on average some 800 are being deployed yearly to maintain the fleet. Like MERMAID, they are not designed to be recovered.
The first-generation MERMAID float was a SOLO (Sounding Oceanographic Lagrangian Observer) float (Davis et al. 2001) fitted with a hydrophone and a processing unit to return seismologically viable seismoacoustic data recorded at its parking depth (Simons et al. 2006b). The second-generation MERMAID Sukhovich et al. 2015) was a modified APEX (Autonomous Profiling Explorer) float. The current third-generation MERMAID is a redesign from the ground up by Yann Hello at Géoazur and French engineering firm OSEAN SAS (Hello & Nolet 2020). The autonomous float carries a High Tech HTI-96-Min Hex hydrophone, a Gardner Denver pneumatic pump, a u-blox NEO-M8N GPS unit, a two-way Iridium communication module, Electrochem lithium batteries, and dedicated onboard detection and discrimination software . Once deployed MER-MAID sinks to a predetermined depth (usually 1500 m, adjustable) and records the ambient acoustic wavefield while freely drifting with the mid-column currents. If triggered by seismic activity, or once a threshold time is reached, MERMAID surfaces, transmits the new data, downloads mission-command files via satellite, and repeats the process. Fig. 3 shows the first five dive cycles completed by MERMAID P0012 after its deployment on 10 August 2018, and Fig. 4(a) shows the drift trajectories of all 16 MERMAIDs discussed in this study.
The onboard algorithm used to monitor and process the ambient acoustic wavefield (Sukhovich et al. , 2014 was designed to trigger on tomographic-quality teleseismic P-wave arrivals sensitive to mantle structure. Once parked at depth the hydrophone is switched on and data acquisition starts. The hydroacoustic data are processed in real-time by a short-term average over long-term average (STA/LTA) algorithm (Allen 1978), and written to a Secure Digital (SD) card, which retains them for one year. If the adjustable STA/LTA threshold is exceeded, a windowed section of Sampling the mantle under the South Pacific with MERMAID 5 those data is further interrogated via wavelet decomposition (Simons et al. 2006a), and its energy distribution across six wavelet scales is compared with statistical models of various signals known to exist in the oceans (many of which are not generated by seismic events). A quality criterion encodes the probability that the record under inspection includes a P-wave arrival. If the criterion is high, MERMAID immediately ceases data acquisition and surfaces to transmit the signal. Candidate signals that do not trigger immediate surfacing are stored in memory and marked for transmission at the next opportunity. For all records discussed in this study, MER-MAID transmitted wavelet and scaling coefficients in the Cohen-Daubechies-Feauveau (2,4) basis (Cohen et al. 1992) of a time series originally sampled at 40 Hz and filtered between 0.1 Hz and 10 Hz before digitization. The MERMAID records presented here are seismoacoustic pressure time series sampled at 20 Hz. MERMAID delivers seismic data from the oceans in near realtime with immediate surfacing and data transmission within hours of recording the strongest signals. MERMAIDs are individually programmable and mission parameters such as parking depth, maximum time to remain there, criterion thresholding values to trigger surfacing, and so on, all may be monitored and adjusted thanks to two-way Iridium communication. While the ability exists to request data from the buffer for up to one year prior (which we have done with success), we found the default trigger algorithm to perform well, and in this study we restrict our discussion to those triggered records that MERMAID sent us on its own accord. Indeed the default onboard algorithm was left untouched for the entirety of the deployment for all 16 floats discussed here.

THE SPPIM PROJECT
The 16 Princeton-operated third-generation MERMAIDs of this study are part of the South Pacific Plume Imaging and Modeling (SPPIM) array of 50 MERMAIDs deployed into the South Pacific to study the underlying mantle composition and temperature using seismic tomography. Drifting united under the EarthScope-Oceans banner, these MERMAIDs are supported and maintained by a global consortium (http://earthscopeoceans.org). A profile of EarthScope-Oceans and a detailed deployment history of the SPPIM array are given in the Supplemental Material.

Geographic and geologic context
MERMAIDs drift with the ocean currents-they do not (yet) land on the seafloor like ocean-bottom seismometers. Fig. 4(a) shows the drift trajectories of the numbered floats discussed in this study. Every dot represents one GPS fix taken by MERMAID at the surface, color-coded for time elapsed since its deployment (dark blue for the launch day, dark red for the last GPS fix of 2019). By connecting these dots we obtain an approximate (Davis 2005) map of the ocean currents at the 1500 m parking depth. See Nolet et al. (2019) for drift statistics broken down into shallow-and deep-drift components. Also labeled in Fig. 4(a) are the locations of other seismic sensors against which MERMAID data are compared later in this study. Those are individually marked by triangles, except for the collection of stations on Tahiti, French Polynesia, which is marked by a single larger triangle with the corresponding station codes listed in the legend.
For added geologic and geodynamic context Fig. 4(b) shows a bathymetric map of the same region. We see myriad islands, seamounts (Wessel et al. 2010), hotspot tracks (Wessel & Kroenke 1997), and, in lighter colors, large swaths of anomalously elevated oceanic crust known as the South Pacific Superswell (McNutt & Fischer 1987;McNutt & Judge 1990). Fig. 4(c) is a "vote" map according to the clustering analysis of Cottaar & Lekić (2016), marking the number of models showing anomalously slow P-wave velocities at 2700 km depth. The five models in use are HMSL-P06 (Houser et al. 2008), GyP-SuM (Simmons et al. 2010), LLNL-G3Dv3 , SPani (Tesoniero et al. 2015), and ME2016 (Moulik & Ekström 2016), selecting VP where both VP and VS were available. Cottaar & Lekić (2016) classified P-wave velocities at discrete locations within those five models into three bins: slow, neutral, and fast. Three models concurring are a "majority," and five a "consensus." In Fig. 4(c) we see a consensus among all models that large regions exhibiting anomalously slow P-wave velocities lie at the base of the mantle under the South Pacific (Tanaka et al. 2009a). This region is known as the Pacific Large Low-Velocity Province (LLVP) and is one of two nearly antipodal regions on Earth, the other being the African LLVP (Garnero et al. 2016). Combined, Fig. 4 shows the SPPIM array from the water column to the core-mantle boundary to frame the features above which it drifts. However, the majority of seismoacoustic data that our SPPIM array records traverses mantle features between these two extremes, and the web-based SubMachine tool (Hosseini et al. 2018) can be used to redraw this vote map (Shephard et al. 2017) with various models at different depths.
The exact nature of the interaction between LLVPs and surface features like oceanic hotpots and the South Pacific Superswell has long been a topic of debate (e.g. Davaille 1999;Adam et al. 2014). Compelling evidence in the form of whole-mantle tomography (French & Romanowicz 2015) implies that the former may feed the latter via conduits of hot uprising rock that span, potentially discontinuously, from the core-mantle boundary to the surface. The exact geometries, dimensions, and rooting structures of these conduits within or near the boundaries (Cottaar & Romanowicz 2012) of LLVPs remain an area of active research (Garnero et al. 2016). What is known for certain is that the Pacific LLVP is expansive in breadth and height, purportedly rising to the mantle transition zone under the South Pacific Superswell (e.g. Tanaka et al. 2009b;Cottaar & Lekić 2016), it is characterized by anomalously slow seismic velocities, and it lies under our SPPIM deployment. The teleseismic arrivals recorded by MERMAID are therefore expected to sample slow regions of the deep mantle, and their measurement will help refine future tomographic studies.   Colored traces are MERMAID pressure records and gray traces are velocity seismograms from nearby island stations (traditional and Raspberry Shake). All traces are filtered between 1-5 Hz. Two-digit serial numbers preceding or following each colorful trace identify the recording MERMAID, and island station names are in gray. Theoretical travel-time curves computed in the ak135 velocity model , corresponding to the events quoted in the titles, are overlain as black and gray curves and identified in the legend.

Filling the data gaps
The colorful traces are MERMAID pressure records in the uncorrected units of digital counts. In the range of frequencies (1-5 Hz) that we discuss in this study the response is known to be flat (Guust Nolet, Olivier Gerbaud, and Frédéric Rocca, personal communication, 2021; see Section A3 and the Supplemental Material). Each MERMAID seismogram is arbitrarily color-coded so that it is easily distinguishable. The precise length of each seismogram varies based on the triggering parameters of the STA/LTA algorithm. With current defaults they are generally between 200 and 300 s long. The seismograms in Fig. 5 are demeaned, detrended, and tapered with 0.1-ratio cosine-taper (Tukey) window, and bandpassed between 1-5 Hz using a one-pass, four-pole Butterworth filter. Each trace is normalized for plotting, resulting in arbitrary amplitudes within and between the panels of Figs 5(a)-(d). The black and gray lines correspond to the theoretical travel times of the phase(s) quoted in the legend as computed in the ak135 velocity model  for the event identified in the title. Seismoacoustic phases beyond the P wave, for example the S-wave arrival evident near the end of the MERMAID 09 trace in Fig. 5(a; in yellow), are examined by Simon et al. (2021b).
The gray traces in Fig. 5 are velocity seismograms from nearby stations, again normalized per trace for easy viewing. Station names are labeled inside the right ordinate axis. Overlapping traces were removed. To mimic the MERMAID records we trimmed all seismograms to 250 s, with the theoretical first-arrival time at 100 s (the approximate time of the STA/LTA trigger in MERMAID seismograms). We see that MERMAID signal-to-noise ratios for the events shown compare favorably to those of the island stations. This comparison is formalized in Section 5.2 for high-quality residuals culled from all three instrument platforms.
The MERMAID algorithm identifies time series determined via probabilistic wavelet-subspace analysis to be likely teleseismic Pwave arrivals. The algorithm does not provide arrival-time picks beyond the precision afforded by the underlying STA/LTA detection algorithm, nor is it privy to recent global seismicity. Therefore, to produce record sections like those in Fig. 5 we must first determine if the seismograms match any events in a global catalog. We searched the National Earthquake Information Center Preliminary Determination of Epicenters (NEIC PDE) Bulletin (https://earthquake.usgs.gov/data/comcat/ catalog/us/) for recent events. The Supplemental Material details the matching procedure by which we associate MERMAID seismograms with their corresponding earthquakes.

THE MERMAID EARTHQUAKES CATALOG
We will first take a broad look at the seismic catalog itself, before drilling down to the statistics of the rate of return of identified events for individual MERMAIDs. We will also discuss the completeness of our catalog as compared to other global seismic catalogs available over our study period. The purpose of this section is to answer questions relevant to any new seismic instrument, such as: "How many earthquakes does MERMAID record per year, and what are the distributions of their magnitudes, epicentral distances, and locations?"; "What do the recorded magnitude-distance relations tell us about detectability thresholds?"; and, "What is the probability that any single earthquake will be recorded by any single MERMAID, and how many earthquakes is each projected to record in its lifetime?" 4.1 Earthquake catalog summary: in pictures Fig. 6 plots the distributions of earthquake magnitudes, distances, and signal-to-noise ratios (SNRs) in the first earthquakedetectability diagram for the third-generation MERMAID. The SNR is the estimated variance ratio of the signal and noise segments, as identified by our picking procedure (see the Appendix), applied to a 30 s window filtered between 1-5 Hz and centered on the theoretical first arrival. Fig. 7 plots the ray paths for those earthquakes, binned by event depth. In all, 683 MERMAID seismograms were identified as containing at least one phase arrival associated with one of 288 unique earthquakes. Fig. 6(a) shows that MERMAID sampled a fairly large range of earthquake magnitudes. The smallest earthquake, a mb 4.2 at 97.8 km depth in Tonga Islands, was recorded by P0008 at an epicentral distance of 2.2 . The largest event, a Mw 8.2 on 19 August 2018 at 600.0 km depth in the Fiji Islands Region, was recorded by all five MERMAIDs deployed at that time (see Table 1 for SPPIM-array deployment dates). The mean magnitude of all events recorded is M 6.2. The numbers of MERMAIDs reporting earthquakes, broken down by magnitude, are discussed in Section 4.2. Fig. 6(b) is a histogram of those same earthquakes but now binned in terms of their epicentral distances. We see fairly consistent sampling at a variety of epicentral distances, implying MERMAID samples tomographically useful data at the global scale, Figure 6. Earthquake magnitudes, epicentral distances, and signal-to-noise ratios (SNRs) in our data set. In total, 683 MERMAID seismograms were identified to contain at least one phase arrival associated with one of 288 earthquakes. (a) Histogram of earthquake magnitudes. (b) Distribution of earthquake epicentral distances is roughly uniform out to around 90 , except for the peak around 10 , which arises from frequent M 4-4.9 earthquakes near Fiji sampled by the most proximal floats, mainly P0008. (c) Scatter plot with marker sizes representing the SNRs of individual arrivals. A lower-detection threshold hovers just above M 6 near 160 , where MERMAID picks up core phases. The horizontal strings of points are due to the fact that often more than one MERMAID reports the same earthquake.
including phases which have transited the core of the Earth (Simon et al. 2021b). Finally, Fig. 6(c) plots the SNRs of the first-arriving phases, represented by the size of the marker, as a function of magnitude and epicentral distance.
These are the first global detectability statistics for the thirdgeneration MERMAID-to be compared to the first-generation results shown by Simons et al. (2009). We see trends common to all seismic instruments: small events are preferentially recorded at short epicentral distances before geometrical spreading and attenuation sap them of their energy, while larger events (greater than M 6 in the case of MERMAID) may be recorded globally. Fig. 7 places the data of Fig. 6 into their geographical context. The ray paths between the epicenters and MERMAID locations at the time of recording are binned by event depth: shallowfocus (Fig. 7a, at less than 70 km); intermediate-focus (Fig. 7b, between 70 km and 300 km); and deep-focus (Fig. 7c, greater than 300 km). Listed above each map in Fig. 7 are the numbers of unique events recorded within those depth ranges. We find that MERMAID records shallow events most often, with 275 corresponding seismograms reported by our 16 MERMAIDs , although the counts for Sampling the mantle under the South Pacific with MERMAID 9 Figure 7. Source-receiver ray paths in our data set, separated by event depth (shallow, intermediate, and deep-focus). In each panel, the great-circle paths (black curves) connect the earthquake epicenters (red asterisks) with interpolated locations of MERMAID at the time of recording (orange triangles).
the other depths prove that MERMAID records earthquakes originating at depths from the shallow crust to deep within subducting slabs. The shallowest earthquake in the catalog had its hypocenter at 2.2 km under Northern Alaska, and the deepest ruptured at a depth of 652.4 km in the Fiji Islands Region. Fig. 7 also shows that MERMAID primarily recorded subduction-zone earthquakes occurring along the Pacific Rim, the "Ring of Fire" of nearly continuous chains of volcanoes fed by subducting oceanic crust that encircles the Pacific Ocean from New Zealand to Chile (Rinard Hinga 2015). Approximately 90% of annual global seismicity occurs in this most active of regions. 4.2 Earthquake catalog summary: by the numbers Table 1 is a breakdown of the rate of return of seismograms per MERMAID. The first column lists the MERMAID serial numbers, the second their deployment dates, and the third the total duration in weeks over which each MERMAID was active from deployment to the end of 2019. The fourth and fifth columns list the total number of seismograms returned, and the subset of those identified, respectively, and the sixth column quotes the latter as a percentage. The seventh column lists the average number of seismograms returned per full year of activity, and the eighth column lists the same statistic pertaining to the identified seismograms only. The penultimate row totals the columns, while the ultimate row lists their averages. Some values in Table 1 are rounded to the nearest integer, potentially only after performing the requisite operation implied by the row or column. As such, some values that are reportedly multiples, sums, or means of other table entries may not be entirely self-consistent.
Let us first take a bird's-eye view of the data presented in Table 1 before teasing apart the statistics of the rate of return of individual MERMAIDs. From the penultimate row of Table 1 we see that our 16 MERMAIDs enjoyed a cumulative total of 1118.9 weeks (21.4 years) of deployment in the South Pacific, over which time they recorded and transmitted 1363 seismograms. Of those, 683, around half of the set, were positively associated with events in global catalogs available at the time using the methodology described in the Supplemental Material (see Fig. S1). By summing yearly return rates, we find that our subset of the SPPIM array achieved a cumulative rate of return of 996 seismograms per year of deployment. Given the historical identification rate from column six this equates to about 497 identifications per year. The other seismograms represent myriad diverse signals corresponding to small or local events, oceanic T waves from unidentified sources, as well as a substantial number of what we suspect to be instrument glitches, which almost exclusively affected MERMAIDs P0012, P0013, and P0020. To be clear, the MERMAID catalog contains many seismograms which are unidentified by the standards upheld here, but which do in fact record earthquakes that otherwise went undetected by the global seismic network-not every unidentified event is just noise.
These data are further distilled in the last row of Table 1 where we list the rate of the return of an "average" MERMAID. There we quote the means of the columns, i.e., not weighted by the length of time that any individual MERMAID was deployed. Ergo, the final value in this row is the number of identified seismograms expected from any MERMAID in any given year. Our sample size of 16 is small and limited in time and space, but this number is the first step towards defining the expected long-term output of an "average" MERMAID. The final value in this row is perhaps most relevant for future MERMAID deployments: on average each instrument returned 31 identified seismograms per year. With a projected lifespan of five years, we expect a return of over 150 positively identified seismograms over the lifetime of each MERMAID. The values in the final column which contribute to this mean are broadly distributed, ranging from a maximum of 134 returned by P0008, to a minimum of 9 returned by P0022. This spread is not due to implicit differences in the floats-indeed they all are identical both in manufacturing and software, and the programmable parameters (e.g., parking depth, detection criteria thresholds, etc.) were left unchanged for the duration of the deployments. Rather, this variance is most likely due to the geographic distribution of MERMAIDs. Instrument P0008, the busiest of the group, returned so many identifiable seismograms because it cruised the oceanic region between Fiji and Samoa, near enough to the former (and drifting closer-see Fig. 4) to record many seismograms matched to light and moderate earthquakes whose energy never registered on the more distant MERMAIDs in the open ocean. Apart from simple source-receiver distance considerations, other variables influencing the rates of return are the local oceanic and bathymetric environments. We hypothesize that most notable among these are ocean storms. Mid-column noise has been shown to correlate strongly with wind speed (McCreery & Duennebier 1993;Nichols & Bradley 2016). Additionally, the SNRs of signals received by MERMAID are mediated by factors that do not impact common terrestrial stations. As is the case for any ordinary terrestrial station the noise is time-variable, but perhaps more importantly the impedance along the ray path between a repeating earthquake and MERMAID is also time-variable, in contrast to terrestrial stations that do not drift. A drifting MERMAID may find itself in regions with varying sedimentary cover, attenuating or amplifying incident P-wave energy, resulting in weaker or stronger acoustic conversion at the seafloor (Ewing et al. 1957;Stephen 1988). Multiple additional factors such as local water depth (Lewis & Dorman 1998;, the presence of nearby seamounts and other kinds of rough-bottom topography (Dougherty & Stephen 1991), the width and depth of the SOund Fixing and Ranging (SOFAR) channel (Munk 1974) over the spatial range and across the seasons covered by the SPPIM array, and other unstudied and yet unknown factors may also all play a role in the conversion of energy (Tolstoy & Ewing 1950;Okal 2008) and in determining the local ambient noise field (Gualtieri et al. 2019).
Even after correcting for distance and magnitude we cannot yet disentangle the various factors that contribute to the large variance in the rate of return of individual MERMAIDs. No modeling of the true nature of acoustic conversions under our floats has been performed, indeed even the bathymetry is not well constrained in some areas under the SPPIM array, nor did we correlate wave or storm records with our seismic data (likely the main driver of timevariable background noise levels, see, e.g. Webb & Cox 1986;Babcock et al. 1994;Gualtieri et al. 2013;Farra et al. 2016). It will be interesting to probe whether the MERMAIDs that sent the least data spent the most time in the noisiest regions, stalled over areas of the seafloor with inefficient seismic-acoustic coupling, were muted by some other unidentified disturbance, or were influenced by some combination of all of these factors. Future work will benefit from simulation of seismic-acoustic conversions into realistic ocean layers at (relatively high, ⇠1 Hz) frequencies (e.g. Fernando et al. 2020).

Earthquake catalog completeness and statistics
We now move to comparing our seismic catalog with other global catalogs available at the time. How "complete" is our catalog compared to those others? Conversely, how many global earthquakes did MERMAID miss? No catalog can include all earthquakes of all magnitudes, globally (Kagan 2003), but here we do take the number of events recorded in global seismic catalogs to be the true population size against which we will derive completeness statistics. The histograms of the right column parse the total number of events identified by each float. Note that they do not necessarily imply a per-MERMAID identification rate because they remain unadjusted for deployment duration (e.g., MERMAIDs P0013 and above were not yet deployed during the first M8 event in the final row, which explains why those instruments count at most a single identification). a function of time, beginning from the first deployment of P0008 on 5 August 2018 through the end of 2019. Note that the Princeton fleet of 16 MERMAIDs was not completed until 14 September 2018, marked by a dashed vertical line. The histograms in the right column aggregate these data over time, but separate them by float and by magnitude range. We represent missed events, not reported by any MERMAID, as crosses placed below the zero line. For example, Fig. 8(e) shows that of all M 7 earthquakes that occurred globally while the complete Princeton array was deployed, three remained unreported. Conversely, Fig. 8(g) shows that no events were missed in the magnitude range M 8+ (recall that only five MERMAIDs where deployed during the first such event, and it was reported by all five). So many M 5-5.9 events went undetected that rather than plotting them in Fig. 8(a) the mean miss rate (around 4 events per day) is reported below the zero line. Note the different scaling of the ordinate axes in Fig. 8, which highlights the fact that the rate of re-turn of identified earthquakes correlates strongly with magnitude. Finally, in each of the stem plots one event is highlighted in black. These are the events reported by the greatest number of MERMAIDs within each magnitude range, as previously rendered in the record sections of Fig. 5.
The histograms in Fig. 8 parse the cumulative return per float. Figs 8(b) and 8(d) (M 5-5.9 and M 6-6.9, respectively) visualize the observation of Table 1 that P0008 outpaced all the other floats in terms of reporting identifiable earthquakes (even after adjusting for its longer deployment duration, which we do address in Tables 2-6), which we attribute to its geographic proximity to Fiji and Tonga, as mentioned in Section 4.2. The complementary distribution for M 4-4.9 earthquakes is not shown because most of those light events were missed. Note, however, that over 80% of those positively identified were reported by MERMAID P0008.
We reiterate that these statistics are plotted against the NEIC PDE Bulletin of global events, which itself is incomplete. There  exists a complementary set of uncataloged events that remain undetected by the current global seismic network. MERMAID records some of those events and the analysis of those records is the target of future work, but we can report some statistics here: there remain 245 records after the removal of presumed instrument glitches from the list of unidentified MERMAID seismograms. That results in an upper-bound estimate of ⇠15 additional uncataloged earthquakes (not necessarily unique) detected by every MERMAID during the SPPIM deployment so far, or about 1 additional uncataloged earthquake detected by every MERMAID each month.

Estimating the final size of the earthquake catalog
With Table 1 we found that, between deployment in late 2018 and the end of 2019, each MERMAID in our fleet returned an average of 43 identified seismograms, or about 31 per year. With Fig. 8 we saw how identifications were distributed across different magnitude ranges. Here we extrapolate those historical data to estimate the final size of the complete MERMAID catalog considering the anticipated five-year lifespan of each instrument. As in Table 1, some of the numbers in the following tables are rounded. Tables 2-6 break down the rates of identified returns, projecting how many identifications each float is likely to return in its lifetime. The first column in Tables 2-6 lists the MERMAID serial number. The second quotes the total number of earthquakes that occurred over the deployment duration of that specific float. This number represents the maximum number of earthquakes that each float could have individually identified during its deployment. The third and fourth columns list the number and percentage of those events that were identified. These properly quote the per-MERMAID identification rates implied by the histograms in Fig. 8 but which are obfuscated there by differing deployment durations.
The fifth column of Tables 2-6 is analogous to the final column of Table 1, except here it is further parsed by magnitude. The sixth column estimates the expected total number of identified seis- mograms reported over a five-year lifespan by multiplying the previous column by five. For light and moderate earthquakes especially, this method of estimation is likely sound because they occur so frequently that any annual variability in their sample size is expected to be relatively small. Conversely, great earthquakes occur so infrequently that our projections based upon their statistics only within the time period the SPPIM array could be greatly skewed by anomalous seismicity rates.
To combat the potential issue of anomalous sample sizes skewing our projections we assembled a data set of all events cataloged by IRIS from 1985 through to the end 2014. Thirty years of data surely provides a large enough sample size within each mag- Sampling the mantle under the South Pacific with MERMAID 13   Tables 2-6, where the overline, yr, denotes such an "average" year. In column seven we multiplied these average seismicity rates by the percentage of the total that each float identified (column four) to compute a second estimate of the expected total number of identified seismograms that any individual MERMAID may return in a year. In column eight we again multiplied this value by five to project the final number of earthquakes each float may be expected to identify in its lifetime.
The penultimate row of Tables 2-6 tallies the totals of the columns, and the ultimate row reports their means. The last value in the final row gives our best estimate of the number of identified earthquakes that any given MERMAID will report within that magnitude range over its lifetime. We calculate 23 M 4, 40 M 5, 56 M 6, 29 M 7, and 4 M 8 earthquakes, or just over 150 earthquakes in total. For our fleet of 16 this approaches 2500 identifications. As we have seen, the variance in the rate of return among the floats is large. For example P0008, with its 188 identified events, has already surpassed its expected lifetime-total return.
As in Section 4.2 we conclude that it is likely the geographic location, the frequency and severity of nearby storms, and perhaps the geologic seafloor setting above which MERMAID drifts, that most influence the rate of return of identified seismograms by any individual float. This point is well made in Table 2 columns one through three, where we see that P0011 could have recorded just about as many M 4 earthquakes as P0008, but the former identified none and the latter identified 73.

THE MERMAID RESIDUALS CATALOG
Having understood MERMAID's ability to detect global earthquakes, we now move to discussing the seismograms themselves. In particular, we are interested in the high-precision picking of first-arriving P or p waves, the estimation of uncertainty about those times, and what MERMAID residuals against various velocitymodel predictions may tell us about mantle structure. The rows are ordered from low to high uncertainty. The first three seismograms (Fig. 9a-c) are the lowest-uncertainty records in the MERMAID catalog, and the final three ( Fig. 9j-l) display seismograms with picking uncertainties equal to 0.15 s. The middle rows, Fig. 9(d-f) and Fig. 9(g-i), show seismograms for which the corresponding uncertainties straddle the 33rd and 66th percentiles between these two uncertainty bounds, respectively. Each panel of Fig. 9 plots 30 s of one MERMAID seismogram with its timing relative to the theoretical travel time of the first-arriving P or p wave (dashed black vertical line at 0 s). After tapering and filtering, these segments yielded our picks (solid red vertical lines, with uncertainties shown as dashed red vertical lines at ±2SDerr along the time axis) of the first arriving p or P wave, with the corresponding adjusted residual quoted above each panel. The portion before our arrival-time pick is considered "noise" and colored gray, while the portion after our pick is considered "signal" and colored blue. Thus, the SNRs reported here, and for the remainder of the study, are computed as in eq. (1) but we specifically only consider 30 s windows like those shown in Fig. 9, and we compute the estimated variances directly from the gray and colored segments as partitioned by our picks. On each seismogram we circle the maximum (absolute) amplitude of the first-arriving phase. Its value and the delay time between it and our first-arrival pick are reported in brackets outside the left ordinate and upper abscissa axis, respectively. We define the first-arriving phase window to be the segment of the seismogram starting at our arrival-time pick and ending 1.75 s later. We chose 1.75 s because it is longer than our minimum-retained phase of 1 Hz, ensuring that we capture at least one complete cycle, and it is shorter than 2 s, the estimated roundtrip travel time of the surface-reflected phase (the large-amplitude, opposite-polarity "ghost"). Earthquake parameters including origin time, magnitude, depth, and distance to the recording MERMAID are noted in the upper inset boxes within each panel. Our estimated signal criteria of the SNR of the seismogram and twice the standard deviation of the timing uncertainty on the residual are quoted in the lower inset boxes.

Comparing MERMAID against nearby island stations
To prove the tomographic utility of MERMAID residuals we compare their statistics against measurements made for the same events at island stations located in the oceanographic neighborhood of the slowly dispersing SPPIM array.
We compare our uncalibrated seismoacoustic pressure records (the MERMAID "seismograms") with velocity seismograms from land-based seismic sensors. We then compare the MERMAID catalog of travel-time residuals, SNR estimates, and travel-time uncertainties with a similar catalog that we construct for island seismic stations in the vicinity of MERMAID. Fig. 1 showed MERMAID's oceanic neighborhood. At 32 million km 2 it spans a large portion of the South and some of the North Pacific, nearly 6.5% of Earth's surface, or roughly double the area of Russia. The Appendix details which stations we used for this comparison and how we retrieved and processed their data. We picked all travel time-residuals using the same methodology regardless of station type.
We compare all measurements against the 1-D ak135 model. We perform this simple 1-D comparison first to show that the distributions of residuals from more traditional seismic instruments and MERMAID agree well, indicating that MERMAID is returning tomographically useful data. Later we recompute MERMAID residuals in the fully 3-D, elliptical LLNL-3DGv3 crust and mantle model of  to interrogate the geographic distribution of velocity perturbations in Earth's mantle as recorded by MERMAID. Fig. 10 shows the distributions of p-and P-wave travel-time residuals (top row), their SNRs (middle row), and their uncertainty estimates (bottom row), for traditional seismometers, MERMAID, and Raspberry Shake stations, left, middle, and center, respectively.
The top row of Fig. 10 plots the travel-time residuals for all first-arriving p and P waves whose residual fell between 10 s before and 10 s after their ak135 predicted arrival time. For MERMAID (Fig. 10b), we use the adjusted travel times as explained in the Ap-pendix and label them appropriately as t ? res , while for the traditional (Fig. 10a) and Raspberry Shake stations (Fig. 10c) no adjustments are needed. The only quality control applied to the MERMAID residuals was the rejection of those that exceeded (positively or negatively) the 10 s cutoff. For the residuals in Fig. 10(a) and Fig. 10(c) we required the additional quality criteria that their SNRs (middle row) must be equal to or greater than the minimum over the MERMAID data set (Fig. 10e), and that their uncertainty estimates (bottom row) must be equal to or smaller than the maximum corresponding value in the MERMAID data set (Fig. 10h).
This selection aims to mimic human intervention on the MER-MAID data. Indeed, during manual review often the first author would sometimes reject, for various reasons, a phase-arrival pick that aligned nicely with a theoretical arrival time. Our automated picker, in contrast, reports any signal with an SNR greater than one. Therefore, often by algorithmic necessity the picker will trigger on something that is extremely low-SNR (just above one), which by coincidence also aligns with some theoretical first arrival, but which a human would readily reject (e.g., a reflected phase from a small and distant earthquake). Every single MERMAID seismogram discussed here was reviewed by a human and a phase-arrival was verified to exist, but not every seismogram from the nearby stations was reviewed (rather, each was simply run though the same phasepicking algorithm). Therefore, it was determined that the minimum SNR and the maximum two-standard deviation uncertainty estimates of the identified MERMAID data set could serve to approximately winnow the nearby data to the standards by which the author's eye accepted or rejected a pick. This is clearly an imperfect process, and in doing it we are not implying that noise levels across all three instrument classes are equal (they are not, see the middle row). The number of traces that passed this winnowing process and contributed data to the histograms of Fig. 10(a-c) is quoted above each panel.
The distribution of first-arrival MERMAID residuals in Fig. 10(b) agrees well with the complementary distribution from traditional seismometers in Fig. 10(a), and to a lesser extent the same distribution computed for the Raspberry Shake stations in Fig. 10(c). All are positively biased, meaning that, on average, the travel time of the first-arriving p or P wave was delayed compared to the ak135 reference model. MERMAID's mean residuals fall between the other two instrument classes. Their standard deviations are smaller than they are for the other two instrument classes, bolstering our claim that MERMAID reports data useful for seismic tomography in recording tomography-grade p-and P-wave arrivals.
The middle row of Fig. 10 displays histograms of the (logarithmic) SNRs of the first-arrival residuals. Note that the data extend beyond the abscissa limits shown. The minimum SNR is identical for all three histograms because it is the smallest SNR in the MERMAID data set and it was used as a threshold for the others. Despite this attempt to winnow the data from nearby stations we believe that we see the result of human intervention in the shape of the MERMAID histogram. The MERMAID SNRs have a broad mode near their mean, and the histogram more slowly ramps up to its maximum than compared with the other two instrument types, decreasing differently. This is the likely result of rejecting lower-SNR picks in the MERMAID data set via the manual review step not applied to the other data. Hence our method of winnowing the traditional and Raspberry Shake data using hard thresholds computed from the MERMAID data set is conservative in retaining a greater proportion of high-noise, high-uncertainty traces than would be the case for manual data reduction. Generally speaking, MERMAID SNRs fall between those computed from traditional stations and Raspberry Shake instruments. The median SNRs of the MERMAID and traditional data sets are similar, and both are higher than the same statistic for the Raspberry Shake data set. Interestingly, the maximum SNR among the three differs greatly, with that of the traditional stations being much greater than MERMAID, itself greater than Raspberry Shake.
We have plotted all data that passed our winnowing procedure from each instrument class. In Fig. S2 of the Supplemental Material we recreate Fig. 10 using only earthquakes for which all three instrument classes had at least one station returning data. The limitation to common events implies a selection of events that occurred after the installation of Raspberry Shake stations, the most datalimited instrument class in our study. In this apples-to-apples comparison of the SNRs, traditional seismometers recorded the firstarriving phase with a maximum SNR of 2.6⇥10 7 ; MERMAID with a maximum SNR of 6.4⇥10 4 ; and Raspberry Shake with a maximum SNR of 4.6⇥10 2 . For reference, using eq. (1) and assuming the same amplitude signal, these results equate to a 26 dB reduction in the noise level of a traditional station as compared to MERMAID, and a 21 dB reduction in the noise level of MERMAID compared with Raspberry Shake. Two caveats to consider are that these values do not take differences in epicentral distances into consideration and that MERMAID seismograms can contain high-amplitude reverberations for many tens of seconds after the initial arrival that may artificially inflate the SNR of those pressure records.
A more detailed comparison of the waveforms recorded by all three instrument classes than may be gleaned from Fig. 5 is given in Figs S3-S5 of the Supplemental Material, which plot the 12 highest-SNR seismograms from the three instrument classes, in the same format as Fig. 9, but only considering the data in the common catalog (as in Fig. S2). From those figures it is immediately obvious that the noise levels for both traditional stations and MER-MAID stations are much lower than Raspberry Shake (granted, the former are generally nearer to their respective events), with MER- Sampling the mantle under the South Pacific with MERMAID 17 MAID displaying noise levels more akin to traditional stations than Raspberry Shake. The bottom row of Fig. 10 shows histograms of the uncertainty estimates, 2SDerr, associated with each first-arrival residual, as developed by  and discussed in the Appendix. These values were quoted in the lower-right legends in each panel of Fig. 9. As in the middle row, the data extend beyond the axis limits. We see that the uncertainties on the residuals associated with both traditional stations in Fig. 10(g) and MERMAID floats in Fig. 10(h) display a satisfying exponential decay, with their mode nearer the lower end of the uncertainty scale. Both enjoy the lowest uncertainties at nearly the same proportion. The distribution of Raspberry Shake uncertainties in Fig. 10(i) is quite different from either of the other instrument classes. It is not obviously peaked at the lower end, and it is approximately uniform across the full range. Furthermore, while the lowest and highest uncertainties corresponding to each instrument class are roughly the same, the medians are quite different, as the value associated with Raspberry Shake is nearly four times that of the other two.
One caveat concerning our method of uncertainty estimation is that for comparisons to be usefully made across various instrument types, all data must be downsampled to the sampling frequency of the lowest-sampled instrument. We estimate uncertainties in terms of samples, converted to time via multiplication with the sampling interval. Therefore, given the same estimated sample uncertainty, the timing-uncertainty associated with a 100 Hz Raspberry Shake seismogram that has not been downsampled would be reported as being fivefold lower than that associated with a 20 Hz MERMAID seismogram. In that sense, rather than considering the timing-uncertainty estimates output by our method as absolute times, the uncertainties may better serve as relative metrics for comparisons between and across data sets. Practically speaking, they map an SNR to a time, with which the eye may or may not agree, but they nonetheless provide a convenient means to sort and winnow data.

TOWARD SOUTH PACIFIC P-WAVE TOMOGRAPHY
Having compared the quality of MERMAID residuals to the best data available from permanent island stations in the area, we place them in their geographic context to explore the velocity perturbations that they reveal in the mantle under the South Pacific. In this section we limit ourselves to mantle P waves for use in tomographic inversions. As to the other phases that MERMAID also records, such as shear (S) waves, surface wavetrains, T phases, core phases, and other hydroacoustic conversions from seismic and non-seismic e.g., volcanic, events, see Simon et al. (2021b) and Pipatprathanporn & Simons (2021). Fig. 11 plots the highest-quality first-arrival p-and P-wave travel-time residuals of our MERMAID data set. As in Fig. 10, we rejected any residuals that exceeded ±10 s to ensure our picker triggered on a legitimate phase arrival and not on other spurious energy. We additionally rejected residuals whose two-standard deviation uncertainty estimates, 2SDerr, exceeded 0.15 s, the limit beyond which the first-author's eye began to distrust the picks and/or when it was felt that the uncertainties were underestimated. For reference, Fig. 9 displays the full range of the data plotted in Fig. 11(c) and Fig. 11(d): from the four lowest-uncertainty residuals (top row), through the 33rd and 66th percentiles of uncertainty (second and third row, respectively), to the four highest-uncertainty residuals which passed muster in the bottom row of Fig. 9 (the uncertainty threshold there being 0.15 s). This quality criterion was decided upon after inspecting all seismograms in the MERMAID catalog. The highest uncertainty among all first-arriving P waves in the catalog is 2.03 s.
The residuals in Fig. 11 are coded blue for fast (the first arrival is early compared with theory) and red for slow (the first arrival is late), and they are smeared along their ray paths from source to receiver. We plot them against three velocity models: ak135 in Fig. 11(a; eq. A1); ak135 adjusted for bathymetry and MER-MAID cruising depth in Fig. 11(c; eq. A2); and the fully-elliptical 3-D crust and mantle model LLNL-G3Dv3 in Fig. 11(e; eq. A3). In all three cases the initial residuals were computed in the adjusted ak135 model, t ? ak135 , as is shown in Fig. 9, and then each was individually readjusted using the relative travel-time difference between that model and ak135 or LLNL-G3Dv3 to generate Fig. 11(a) and and Fig. 11(e), respectively. This means that the residuals shown here were not re-picked using slightly adjusted windows computed in the three different models, which is acceptable because the maximum absolute 3-D 1-D travel-time difference for the residuals plotted in Fig. 11(e) is 4.65 s, well within a 30 s window centered on the theoretical first arrival.
In total, 506 residuals passed these quality thresholds for the standard 1-D model, 510 for the adjusted 1-D model, and 509 were retained in the 3-D case. The distributions of those residuals are displayed as lighter bars in the histograms to the right of their corresponding smeared-residual maps in Fig. 11. Their means are marked by a dashed vertical lines. Data extend beyond the axis limits. Two numbers are bracketed in the upper right corner of each histogram. The first quotes the number of residuals plotted in the histogram, and the second the total number of residuals which passed quality-thresholding and are shown on the corresponding map. The statistics quoted for each histogram were computed using the latter set. The darker bars stacked inside each histogram plot the subset of residuals corresponding to earthquakes greater than magnitude 5.5 and at teleseismic distances between 30 and 100 .
Starting with the smeared residuals in Fig. 11(a) and their distributions in Fig 11(b), we generally see large positive anomalies (red; delayed) associated with equatorial ray paths, and loweramplitude negative anomalies (blue; early) associated with more polar ray paths. These data were not corrected for bathymetry or recording depth. Further, the ak135 model used here is spherical and thus does not account for ellipticity (resulting in larger residuals for equatorial than for polar ray paths), or 3-D structure of the crust and mantle. The first point results in an overall mean-shift of around 1 s for all residuals in the histogram in Fig. 11(b), and the second point adds an additional bias whose geographic distribution is governed by back-azimuth. Combined, these obscure the true signal of mantle-velocity anomalies that MERMAID records. The teleseismic subset of residuals numbers 215 and they have a larger average delay of 2.71 s.
The residuals in Figs 11(c and d) have been adjusted for bathymetry and MERMAID cruising depth, though they remain in the spherical ak135 velocity model. As such, the mean-shift in Fig. 11(b) has been reduced by over 1 s in Fig. 11(d), but the signal Figure 11. Smeared travel-time residuals computed against ak135 (a; eq. A1), ak135 corrected for bathymetry and MERMAID cruising depth (c; eq. A2), and LLNL-G3Dv3 (e; eq. A3), and the distributions of those residuals in (b), (d), and (f), respectively. Here we show only the highest-quality residuals in the MERMAID data set: those with maximum two-standard deviation estimated uncertainties, 2SDerr, smaller than 0.15 s (the final row of Fig. 9 shows the three "worst" seismograms that made the cut). The colorbar is in units of absolute time, with its colors encoding the residual between our pick and the theoretical arrival time of the reference model (blue is fast, red is slow). Map (e) and its corresponding histogram ( of Earth's ellipticity remains visible in Fig. 11(c). In fact, that signal is now more pronounced in the North Pacific, with those data generally displaying negative residuals before applying the adjustment. As in the unadjusted 1-D case the subset of teleseismic residuals represented by the darker bars in the histogram shows a higher average bias than the combined data set at 1.64 s, now determined among 217 residuals. Finally, the residuals presented in the penultimate row of Fig. 11 are the closest yet to the real signal of velocity perturbations within the Earth's mantle. They are computed against the fully-3-D and elliptical crust and mantle model LLNL-G3Dv3. Immediately we notice the removal of the signal of Earth's ellipticity; the ray paths through the North Pacific no longer display generally large negative anomalies, and residuals smeared along equatorial ray paths see their generally large positive residuals lowered slightly. The map is still very red, however, implying that the majority of the 3-D residuals recorded by MERMAID displays positive, slow anomalies for all back-azimuths. Fig. 11(f) proves this to be the case, showing that, on average, the residuals recorded by MER-MAID in the South Pacific are 1 s late compared to LLNL-G3Dv3. As before, that delay grows when considering only the teleseismic subset, increasing to 1.17 s among 216 residuals. The final row of maps in Fig. 11 offer zoom ins around the boundary of our SPPIM array ( Fig. 1b and Fig. 4). The map at left plots only those raypaths are that are wholly contained in this region, and the map at right shows P-wave velocity perturbations in the LLNL-G3Dv3 model at 500 km depth, near the average maximum depth sensed by the residuals in Fig. 11(g). The red "seahorse" in this map further illustrates the point made in Fig. 4, namely that the mantle under the South Pacific around and below our SPPIM array is characterized by anonymously slow seismic velocities. The average delay of the regional subset of MERMAID data in Fig. 11(g) is 0.6 s among 305 residuals.
The darker subset of teleseismic residuals in Figs 11(b,d,and f) were winnowed so that their statistics could be directly compared with data collected during the Polynesia Lithosphere and Upper Mantle Experiment (PLUME; Barruol 2002) and broad-band ocean bottom seismographs (BBOBS; Suetsugu et al. 2005) experiments, which saw the deployment of 10 ocean-island broad-band stations and 10 OBS stations in French Polynesia between 2001). The resulting teleseismic P-wave data set including traces from PPTF, PTCN, and RAR totaled 1477 residuals corresponding to 121 earthquakes (Tanaka et al. 2009a). After correcting for deployment length (⇠1.3 years for MERMAID versus a combined ⇠1.8 years when the BBOBS and PLUMES arrays overlapped) and instrument count (16 versus 23; this normalization presumes OBS-detection rates similar to island stations, potentially a poor assumption) this amounts to around 10 high-quality ( Fig. 9) teleseismic residuals retained per MERMAID per year compared with around 35 for BBOBS and PLUME. However, unlike the BBOBS and PLUME data set, we were not privy to continuous multi-month time series, and we were instead presented with data segments preselected for us by the onboard detection algorithm. As such we are confident that we would bolster the MERMAID catalog with appropriate data requests from its buffer. Fig. 11 shows that, more often than not, MERMAID recorded seismic waves that traversed regions of the mantle more slowly than predicted. Interestingly, the distribution of these residuals in Fig. 11(f) has a higher mean than the analogous distribution for the adjusted 1-D-model in Fig. 11(d). It is also satisfying to note that adjustment to the 3-D model lowered their standard deviation to the smallest value (by a small margin) among all three models. Further, like the other histograms in Fig. 11, this one displays positive skewness but, most interestingly, it exhibits the largest positive skewness among all three (338 of the 509 residuals plotted are positive). Many tomographic inversions somewhat underestimate the magnitude of velocity anomalies due to the inversion methods used (Burdick & Lekić 2017), thus it is possible that our observations are illuminating stronger slow anomalies in the mantle, possibly associated with the LLVP. However, importantly, we have not accounted for timing errors caused by earthquake mislocation in any of the figures. That remains a vital preprocessing step to be completed before these data are used for tomographic inversion. Thus, while we are confident in the general trend of delayed residuals in our study area, we are as yet unable to speculate on their specific geographic significance. Indeed, the colors in the maps of Fig. 11 could be as influenced by earthquake mislocation as they are by the signal of Earth's mantle (see especially fig. 3b of Hosseini et al. 2020).

CONCLUSIONS
We described a new seismic instrument, the third-generation Mobile Earthquake Recording in Marine Areas by Independent Divers (MERMAID), which records earthquakes and transmits their seismograms in near real-time from the global oceans. The robotic floats dive to 1500 m depth below the sea surface and passively drift with the currents while monitoring the ambient acoustic wavefield, surfacing only to relay seismic data, their location, and to download new command files. We discussed the South Pacific Plume Imaging and Modeling (SPPIM) project, which has launched an array of 50 MERMAIDs in the South Pacific Ocean, deployed and maintained by a global consortium, EarthScope-Oceans. The array was completed in August 2019, and as of this writing 46 MERMAIDs are reporting data (see http://earthscopeoceans.org), and will be for many years to come. We highlighted the time-variable nature of the locations of the subset of 16 Princeton-operated MER-MAIDs, from their deployment in August 2018 through the end of 2019, whose data were the focus of this study.
We implemented a workflow to process the continuous data stream of incoming seismograms and match them with earthquakes in global catalogs. We reported on the quality and size of the MER-MAID earthquakes catalog, a data product of this study, built up over 16 months of deployment. Our MERMAIDs averaged around 31 event detections per year, equating to more than 150 over their projected five-year lifespan, though we found these numbers to be highly variable between different MERMAIDs, which we hypothesized to be largely controlled by their proximity to areas with different earthquake rates and noise regimes (e.g., frequent storms). We discussed the statistics of completeness for our MERMAID seismic catalog and parsed its numbers by magnitude to reveal the types of earthquakes to which MERMAID proved itself most sensitive. For "typical" global earthquakes, an "average" MERMAID had around a 0.5% chance of recording a M 5, a 9.5% chance of a recording a M 6, a 44% chance for an M 7, and an 81% chance of recording a M 8 earthquake.
We used a procedure to pick, with high precision, the arrival times of phases in MERMAID seismograms, and to estimate their uncertainties. We compared our catalog of first-arrival residuals, another data product of this study, against a similarly-derived catalog computed using all seismic instruments in the general vicinity of the SPPIM deployment. In all, we collected nearly 9000 seismograms from 25 island stations, corresponding to the 288 unique earthquakes recorded by MERMAID. We compared the distributions of first-arrival travel-time residuals, signal-to-noise ratios, and travel-time uncertainties to traditional seismic stations and Raspberry Shake instruments and found that MERMAID had more in common with the former than the latter, proving that MERMAID is indeed recording tomographically useful data.
We winnowed our set of first-arrival p-and P-wave travel time residuals down to the highest-quality subset-just over 500 picks-which we compared against the fully-3-D and elliptical model LLNL-G3Dv3. We found that, on average, those phase arrivals at MERMAID were delayed by 1 s, revealing that the novel ray paths sampled in this study navigated slow regions of the Earth's mantle. We furthermore found that bias increased to 1.17 s when only the subset of teleseismic events was considered. We displayed these residuals smeared along their ray paths to gain a geographic sense for the signature of those velocity anomalies under the South Pacific. These residuals, their weights being dictated by the associated uncertainties computed here, will form the basis of future tomographic inversions to probe the structure beneath the South Pacific Superswell.

DATA AVAILABILITY AND RESOURCES
The International Federation of Digital Seismograph Networks (FDSN) has granted Mobile Earthquake Recorder in Marine Areas by Independent Divers (MERMAID) the network code "MH" (https://fdsn.org/networks/detail/MH/; DOI: 10.7914/SN/MH). MERMAID miniSEED files are at the Institutions for Seismology Data Management Center (IRIS DMC; http://ds.iris.edu/ds/nodes/dmc/) for public distribution alongside their associated metadata and transfer functions, the latter of which became available (Guust Nolet, Olivier Gerbaud, and Frédéric Rocca, personal communication, 2021) as this manuscript went to press. The Supplemental Material contains text files detailing earthquakes and travel-time residuals computed in the three models discussed here. Continuously updated text files of GPS fixes reported by individual MERMAIDs are available at, e.g., http://geoweb.princeton.edu/people/simons/ SOM/P0008_all.txt (the stations described here are numbered P0008 through P0025, excluding P0014 and P0015, which never existed) and described at http://geoweb. princeton.edu/people/simons/SOM/hdr.txt. The Supplemental Material also includes reviewed and expanded GPS files detailing only the fixes most relevant to this study.

ACKNOWLEDGMENTS
This work was supported by the National Science Foundation (DGE-1656466 to J.D.S. and OCE-1917058, EAR-1644399 to F.J.S. and J.C.E.I.). We thank Dr. Olivier Hyvernaud at the French Commissariat à l'Energie Atomique et aux Energies Alternatives, Département Analyse et Surveillance de l'Environnement, for providing seismic data from the Réseau Sismique Polynésien, and for guidance on how to remove their instrument response. Commendation goes to Yann Hello, Sébastien Bonnieux, and the team at OSEAN SAS for the design of the third-generation MERMAID instrument, to Alex Burky for help generating Fig. 11, and to Guust Nolet for his support, advice, and encouragement. We thank Ifremer, Genavir, and commander Jean-François Barazer and crew aboard R/V Alis for a string of successful deployments in the South Pacific. We are grateful to the Associate Editor Dr. Gabi Laske, Dr. Karin Sigloch, and an anonymous reviewer for helpful comments that tremendously improved the manuscript. We appreciate Assistant Editor Dr. Louise Alexander who deftly guided our manuscript from submission to publication.
We are indebted to the many people behind the scenes who collect, organize, store, and disseminate data for use by the community. Next, we do our best to acknowledge each group individually for helping us collect the data set we analyzed; the Supplemental Material contains extra details about how exactly we collected these data from each individual contributor.
Waveform data and instrument response metadata corresponding to nearby stations were provided by the following data centers: Institut de Physique du Globe de Paris In this convention z is depth in m positive below the surface, v is the acoustic velocity in m/s, ✓ is the angle of incidence in degrees, and subscripts "w" and "r" denote those values in water and rock, respectively. Bathymetry at the recording location (zw) is interpolated using the 2014 General Bathymetric Chart of the Oceans (GEBCO) Bathymetric Compilation Group model , and MERMAID depth at the time of trigger (zMER) is measured via its onboard pressure sensor. The standard dive depth is 1500 m. We assume an acoustic velocity of 1500 m/s for water and 5800 m/s for rock, as with the upper layer in ak135. The incidence angle of the water column conversion is given by Snel's law (Nolet 2008), Eq. (A5) yields an adjustment tadj = +0.98 s for a P wave incident at 0 on the seafloor of a 4000 m deep ocean, and recorded by MERMAID at a cruising depth of 1500 m-in other words, for an "average" ocean depth and an "average" MERMAID cruising depth. Teleseismic waveforms bottoming in the lower mantle are incident at small angles on the seafloor. A rule of thumb holds that 1 s should be added to travel times computed in the ak135 velocity model (or, equivalently, 1 s should be removed from MERMAID travel-time residuals computed against ak135 as in eq. A1). The residuals reported by  for the second-generation MERMAID data were not corrected for bathymetry or cruising depth and hence this rule should be applied to the residuals reported there.
Also note that while we have spoken generally in this section about "the" adjusted model, the specific time adjustment applied in eq. (A4) is dependent on source-station geometry (via the incidence angle), ocean depth, and MERMAID cruising depth, and thus differs for every seismogram. The Supplemental Material details these 1-D travel-time adjustments, as well as the analogous 3-D adjustments to convert between ak135 and LLNL-3DGv3, which are also specific to individual seismograms.

A1.4 The uncertainty on the residual
Our AIC-based picking procedure provides uncertainty estimates. Method 1 of , used here, leverages the statistics of the seismogram to construct synthetic sequences from which timing-error distributions are generated via Monte-Carlo resimulation. Every assessed seismogram is simply modeled as a noise segment preceding a signal segment, individually generated by an uncorrelated Gaussian distribution and concatenated at the presumed arrival time. The means and variances of the two segments are estimated from the data themselves. In practice, zero-mean noise and zero-mean signal sequences result in synthetics whose two segments differ only in variance, and which match the SNR and the picked changepoint of the seismogram after which they are modeled. A new AIC arrival-time is picked on each synthetic, and the signed distance between it and the AIC pick on the real seismogram (the assumed truth) is tallied over 1000 simulations to generate the error distribution. We use twice the standard deviation of this distribution, 2SDerr, as our measure of timing uncertainty, in seconds. See Section 8 and the Supplemental Material for links to computer codes to compute the variables discussed in this section.

A2.1 Data retrieval
We queried IRIS for terrestrial seismometers with public data after July 2018. The search returned 19 stations: 14 "traditional" seismic sensors from GEOSCOPE (G), the Australian National Seismograph Network (AU), the Red Sismológica Nacional (C1), and the Global Seismograph Networks IRIS/IDA (II) and IRIS/USGS (IU); and five low-cost Raspberry Shake ) instruments (AM). Table A1 lists these stations and their locations. They amount to one for every 2.3 million km 2 , an area larger than Saudi Arabia, and they are very inhomogeneously clustered on islands. Additionally we obtained data from six short-period seismometers in the Réseau Sismique Polynésien (RSP) maintained by the Centre Polynésien de Prévention des Tsunamis (CPPT), in Papeete, Tahiti, French Polynesia (Talandier 1993). Stations codes and locations are listed in Table A2. Data from RSP have been used to seismically investigate underwater explosions (Reymond et al. 2003), Antarctic icecalving events (Talandier et al. 2002), and submarine volcanism (Wright et al. 2008;Talandier et al. 2016). Fig. 4(a) shows the locations of the nearby stations in Tables A1 and A2 relative to the SPPIM array.
We retrieved every available seismic trace from these stations corresponding to all 288 identified events in our MERMAID catalog beginning five minutes before the first predicted arrival by ak135 .
For each station in Table A1 we RKT -23.1247 -134.9720 (mid period; sampling rate between 1-10 Hz), B* (broad band; 10-80 Hz), H* (high broad band; 80-250 Hz), S* (short period; , and E* (extremely short period; 10-80 Hz) vertical channels. No data from mid-period instruments were returned, and all Raspberry Shake stations were short-period or extremely shortperiod. This yielded 7424 traces. Of those, 6992 were from traditional sensors, with data recorded during all 288 earthquakes in the MERMAID catalog, and 432 from Raspberry Shake instruments, accounting for data recorded during a subset of only 168 of those same earthquakes. Unlike the traditional stations that were in place before MERMAID P0008 was deployed, not all Raspberry Shake stations in Table A1 were installed before the SPPIM deployment. From the short-period instruments at the stations in Table A2 we obtained 1534 traces, corresponding to 284 MERMAID events. These data are not publicly distributed or long-term archived.

A2.2 Data processing and travel-time picking
Each trace had its mean and trend removed, and was tapered at both ends with a symmetric cosine taper of 5% the length of the trace (the SAC defaults). The instrument responses available in the polezero (SACPZ) files were removed by deconvolution using SAC (Goldstein et al. 2003;Goldstein & Snoke 2005), converting the raw data from digital counts into velocity seismograms. Each trace was high-pass filtered above 0.1 Hz and low-passed below 10 Hz. These frequencies were chosen to correspond as closely as possible to the sensitivity band of a MERMAID instrument, whose pressure records are filtered between those bounds before digitization, and whose instrument gain is flat (and negative!) within that bandwidth (see Section A3 and the Supplemental Material).
SACPZ and StationXML files with response data were readily available for the stations in Table A1. SACPZ files were not available for the stations in Table A2. The Supplemental Material contains the necessary details and software to perform instrument correction, which will be of use to others.
Finally, note that Fig. 5 serves merely as a visual aid to appreciate the types of signals that MERMAID records compared with other stations, given the same earthquake. We do not use the gray waveforms as shown there to make first-arrival picks. Rather, for every first-arrival time reported in this study, regardless of instrument, we make the arrival-time picks on segments like those in , not like those shown in Fig. 5. Hence, regardless of instrument type, each trace was processed as described in Section A1. For the island stations, the only difference was that, if required, they were decimated to 20 Hz or 25 Hz to match the sampling frequency of MERMAID, and no bathymetric (or eleva-tion) time corrections were applied. Seismograms were rejected if they were less than 200 s long, if they had any missing data within the taper window described in Section A1.1, or if the theoretical first-arrival time was near enough to an edge to result in the deconvolution taper used to remove the instrument response overlapping with the taper used for arrival-time picking.

A3 MERMAID POLES AND ZEROS
Finally, we print the poles and zeros for the third-generation MER-MAID, as experimentally derived by Guust Nolet, Olivier Gerbaud, and Frédéric Rocca. A report written by those authors entitled, "Determination of poles and zeroes for the Mermaid response," which details the experimental setup and results is additionally included as supplemental material to this study. Note the negative constant.  EarthScope-Oceans (http://earthscopeoceans.org/) represents a multidisciplinary group of geoscientists who are coordinating efforts to create a global network of sensors to monitor the Earth system from within the oceanic environment. It intends to shepherd national projects into the international forum where globally relevant, applicable, and mutually agreed-upon decisions can be made on technological aspects of instrument development, science objectives and priorities on different time scales, data management, dissemination, archiving, and education and outreach efforts; much like IRIS (https://iris.edu/) or Observatories and Research Facilities for European Seismology (ORFEUS; http://orfeus-eu.org/) are doing for the land-based seismological communities today.

S2 THE DEPLOYMENT OF THE SPPIM ARRAY
A 24-hr trial run completed 12 April 2018 was led by Kobe University's Hiroko Sugioka and JAMSTEC's Masayuki Obayashi from the R/V Fukae Maru. During this test deployment MERMAID N0003 recorded an mb 4.9 earthquake originating at 59.4±5.8 km depth, 56±6.6 km east of Ishinomaki, Japan (according to https://earthquake.usgs.gov/earthquakes/eventpage/us2000dyw3/), some 824 km distance from the instrument, which floated 469 m below the surface at the time. The

S3.1 Automated Preliminary Matching
Upon receipt of a fresh seismogram transmitted by MERMAID we immediately wish to determine whether or not the signals it contains correspond to known seismic events. To that end we developed a complete workflow executed in MATLAB to match untagged, raw seismograms to global seismic catalogs with minimal user intervention. This first step discussed next-the algorithmic querying of global catalogs, the Simon, Simons & Irving Figure S1. MERMAID seismogram after automatic preliminary matching. The blue trace in the top panel is the raw seismogram, while the gray traces below are wavelet-subspace projections at five scales, each overlain by their associated Akaike information criterion (AIC) curve (black) and AIC-based arrival-time pick (purple). The top panel is annotated with the theoretical arrival times of various phases from five distinct earthquakes, as noted in the subscripts, computed in the ak135 velocity model, and marked in time by vertical lines. These represent all the phases which have theoretical arrival times within the time window of the seismogram, associated with known global seismic events in the catalogs queried from IRIS. The time of the first-arriving phase associated with the largest earthquake in the set (p 1 ) is marked by a solid red vertical line. Its theoretical arrival time agrees well with the AIC-based arrival-time picks (which are independent of seismology) at the first three scales. The agreement of these two distinct arrival-time estimation methods lends itself to the confident assignment of this seismogram to the "identified" category. During manual review this figure (and a secondary, zoomed in version) is displayed to the researcher, along with the event metadata associated with all potentially matching events, and the researcher is led through a series of intuitive prompts in MATLAB for easy matching and sorting.
tagging of likely events, the annotation of seismograms with their theoretical phase arrival times, and the multiscale detection of phases against which residuals are displayed-occurs automatically and without user intervention after a MERMAID transmits a new seismogram. The preliminary matching process begins with the querying of global seismic catalogs with irisFetch.m (https://github. com/iris-edu/irisFetch-matlab/), a software packaged and distributed by IRIS, for seismic events that occurred in the hour preceding the seismogram. Next, travel times are computed for seismic body waves that are likely to be present in the record using taupTime.m in the MatTaup package (see Data Availability and Resources in the main text) for the ak135 velocity model. Each event with one or more phase-arrivals in the time window of the seismogram is deemed a preliminary match, and all such events are sorted by magnitude (generally the single greatest factor determining phase identification) and saved together as individual structures (a MATLAB data type) in a binary (*.mat) "unreviewed" file. Two figures display the raw seismogram at the top, upon which the theoretical phase-arrival times of the possible events are marked. The panels below plot the wavelet-subspace projections of the seismogram at five scales, each annotated with their arrival-time pick estimated using an Akaike information criterion ). The first plot, Fig. S1, displays the complete seismogram, and the second (not shown here) shows detail about a 100 s window centered on the first arrival of the event with the largest magnitude among all potential Supplemental Material: Sampling the mantle under the South Pacific with MERMAID S3 matches. Usually that is the true match, and thus the boxes within the top panel of Fig. S1 quote those metadata and the recording MERMAID (in this case, P0008). Its theoretical first-arriving phase is highlighted by a solid red line. All other possible phases in the time window of interest are rendered as dashed black lines. All named phases are labeled with subscripts identifying the rank of the associated event in the magnitude-sorted preliminary match list. In Fig. S1, p1 marks the theoretical arrival time of a p wave generated by the first preliminary match, and, S4 is the theoretical arrival time of an S wave from the fourth possible event match. These preliminary matches are automatically generated and the algorithm only requires a SAC file (Helffrich et al. 2013) as input; i.e., the only relevant information ingested by the algorithm in this preliminary-matching stage is a (mobile) receiver location and a time window. Hence, our procedure is not specific to MERMAID data, and we may reasonably assume that it has application beyond the scope of this study. For example, for single-station or array deployments of traditional broad-band land instruments, perhaps in the context of Comprehensive Nuclear Test-Ban-Treaty (CTBT) verification, Raspberry Shakes  or various other forms of crowd-sourced "citizen" seismology, e.g., recorded by mobile phones (Kong et al. 2016) or other low-cost instruments (Cochran et al. 2009;Jeddi et al. 2020), and for classroom seismic installations (Balfour et al. 2014;Subedi et al. 2020), where an experienced researcher may not be available to guide the matching process. While the code is provided with default parameters optimized for MERMAID data, these are easily tunable.

S3.2 Manual winnowing and sorting
The second step of the matching procedure involves manual review to sort the seismograms into two classes: "identified" and "unidentified." Those in the former class will have been assessed to contain energies consistent with phase arrivals corresponding to known earthquakes in global seismic catalogs, both by visual inspection and by considering their travel-time residuals with respect to the AIC picks. For every SAC file reviewed, the two PDFs generated in the first step are automatically opened for inspection, and the interactive program guides the user through a series of prompts to determine if the event can be identified and, if so, which event(s) and phase(s) should be saved.
The process begins with a printout of metadata for all potential events. At all times the user has quick access to all events and their corresponding residuals thanks to their MATLAB structure variables being loaded automatically with each seismogram under review. We refer again to Fig. S1, whose top panel plots the raw seismogram in blue. The arrival times marked on that top panel are the ak135 predictions. The panels below the first plot the subspace projections of the seismogram at five wavelet scales in gray, the amplitude of which corresponds to the left ordinate axis. Overlain in black in each panel is the associated AIC curve, A, used to generate the arrival-time pick at that scale, corresponding to the right ordinate axis. This curve is essentially an inverted likelihood-curve: where it is low, an arrival is likely. The specific AIC arrival-time pick is marked at each scale by a purple vertical line. Quoted in the inset boxes are the corresponding signal-to-noise ratios (SNRs), defined to be the ratio of the maximum-likelihood estimates of the variances of the signal and noise segments, the "signal" being the segment after the AIC pick, and the "noise" the segment preceding it. This definition of the SNR is the same as that in the main text, however, the signal and noise segments are not-there, we focus on the first arrival within a single frequency band (1-5 Hz) and a short time window (30 s); here we consider the complete time series (⇠200-300 s) at each subspace projection (x1-x5) resulting in five SNRs per seismogram. There, the seismograms being analyzed have already been positively matched to an identified event, which differs from the procedure here, where we wish to inspect the full bandwidth of each seismogram via a wavelet multiscale decomposition.
The AIC-based picks are ignorant of seismology. Their agreement, or lack thereof, with the theoretical arrival times of the phases from the match list informs the decision to designate a seismogram as "(un)identified." In the case of Fig. S1, the purple AIC picks at subspace projection scales one (x1) through three (x3) agree well with the theoretical arrival time of the first-arriving p wave in ak135. The AIC picks at the other scales are either low-SNR or very near an edge and may be disregarded. We also note that the picks shown here are not influenced by the edges, whose treatment we describe in , so users need not necessarily be wary of an arrival pick near an edge. However, we have noticed that our AIC picker will on occasion report an extremely short noise or signal segment associated with a time series that has no clear arrival. An example of this behavior is the noise segment in the last panel of Fig. S1, x5, which is extremely abbreviated, consistent with its having low variance, most unlike the variance of the signal segment. Therefore, a low-SNR signal that is very near an edge does warrant a close inspection. Regardless, because of the agreement between the theoretical arrival time of the p wave corresponding to the largest event and the AIC picks at low scales (high frequencies) in Fig. S1, this seismogram would be counted among the identified category. This sorting is accomplished via simple prompts that guide the user through a winnowing process that ultimately results in the seismogram being classified as identified or unidentified, and the relevant event data being saved to a binary (*.mat) "reviewed" file.
Ultimately the decision to mark a seismogram as identified or unidentified comes down to experience processing MERMAID seismograms like the one presented in Fig. S1. The hope, however, is that the workflow developed here is simple enough for new researchers with some experience processing seismic data to quickly grasp and apply it to their own untagged data with minimal training. Indeed, our workflow is already being successfully applied to the 23 SUSTech instruments included in the SPPIM deployment-albeit applied to the same type of data in this case but, importantly, matched by a different researcher.  Fig. 10 of the main text but considering only the subset of data representing events in the MERMAID catalog that occurred while at least one traditional and one Raspberry Shake instrument were also installed-i.e., the catalog of events common to all instrument classes. This comparison is made here because some of the larger events present in the data included in Figs 9 and 10 of the main text are not in the Raspberry Shake catalog because those stations had not yet been installed. Figs S3-S5 each plot the 12 highest-SNR signals in this catalog of common events for traditional island stations, MERMAID and Raspberry Shake island stations, respectively. It is readily apparent that Raspberry Shake instruments are generally noisier than either of the other two instrument classes because the variance of the gray noise segment that precedes the colored signal segment is often visible, whereas for the other two instrument classes this is not the case (the noise looks flat at this range of ordinate values). Also, the uncertainties associated with Raspberry Shake seismograms are generally higher than those of the other two instrument classes. Differences in epicentral distance are not considered here. Seismic Analysis Code (SAC) pole-zero (SACPZ) files specify the frequency response of a digital seismic instrument. They describe how a seismometer converts ground motion to digital counts. The output in digital counts of a seismometer is a record of true ground motion multiplied by the response of the instrument in the frequency domain. With a SACPZ file, one may recover an accurate record of ground motion via deconvolution (division in the frequency domain) of the seismogram with the frequency response. This process is referred to as "removing the instrument response," and it can be accomplished in the SAC program with the TRANSFER command (Helffrich et al. 2013). See also Burky et al. (2021). SACPZ files contain poles, zeros, and a constant. The first two are the complex roots (poles: denominator; zeros: numerator) of the transfer function of the analog instrument, and the last is a constant that describes the gain of the entire system. By analog we mean the physical seismic instrument-for example, an inertial mass held in place by a varying electric current-that intakes ground motion (e.g., m/s) and outputs voltage (V). Following the analog stage, seismometers pass their data through multiple stages of digitization where the voltage is converted to counts. Ignoring any frequency effects during digitization (which SACPZ files do not include), the poles and zeros are sufficient to describe the phase response of the system-i.e., with no constant, the ungained seismogram after deconvolution will have the proper shape but incorrect amplitude. Phase shifts acquired during the digital stages are usually negligible and can be ignored, as noted (in bold) on p. 152 of the Standards for the Exchange of Earthquake Data (SEED) Reference Manual Version 2.4 (2012, https: //fdsn.org/pdf/SEEDManual_V2.4.pdf; hereafter referred to as the SEED manual), p. 409 of the Seismic Analysis Code Users Manual Version 101.6a (2014, https://ds.iris.edu/files/sac-manual/sac_manual.pdf; hereafter referred to as the SAC manual), and as has been independently verified by the authors by comparing waveforms deconvolved with SACPZ and RESP files (the latter of which take into account all digitization stages). We include this comment to make the point that our method of removing the instrument response using the SACPZ file (and not other file standards like RESP or StationXML, which encode information concerning the full cascade of digital filters) is sufficient to recover an accurate record of ground motion for the Réseau Sismique Polynésien (RSP) instruments used in this study.
A SACPZ file may be specified in terms of displacement, velocity, or acceleration. Very commonly a seismometer will physically measure ground velocity, in which case the poles and zeros will likely be reported for the velocity transfer function of the analog stage, and the gain constant (also called the "sensitivity") will be in units like counts/(m/s). SACPZ files were not available for the six stations used in this study from the RSP. However, we were provided the poles and zeros, and a gain constant at a specific frequency, which is enough to write our own SACPZ files. It is important to note that a gain constant at a single frequency is not the same thing as the constant of a SACPZ file. To be unambiguous we will hereafter refer to the former as the sensitivity. Indeed, the sensitivity describes a gain factor at a single frequency, while the SACPZ constant describes the gain factor at all frequencies. Using the notation of the SEED manual, and the pole-zero representation of the transfer function, the frequency response at any stage of the system is (eq. 4 p. 158), where R(f ) is some function of frequency and the S d is the sensitivity. For the analog stage, where A0 is a normalization factor at frequency fs in Hz (note that the normalization factor may be derived at a frequency, fn, different from fs, but this is goes against the SEED convention [p. 157], and is not done here), and Hp(s) is the transfer function at s = 2⇡if rad/s. Note that we assumed that the poles and zeros of Hp(s) are in rad/s (SEED type "A"), and not in Hz (SEED type "B"), as is the convention used in the SEED manual, and which has been our experience when retrieving RESP and StationXML files from both IRIS and International Federation of Digital Seismograph Networks (FDSN) Web Services (https://www.fdsn.org/webservices/). At all stages R(f ) is defined such that its modulus is unity at the specified frequency of the sensitivity, f = fs, leading to the relationship at the analog stage, A0 = 1/Hp(ss).
Therefore, ignoring frequency effects beyond the analog stage, and defining SD to be the multiplicative combination of sensitivities at all stages, the complete frequency response of the entire system at any frequency in Hz is = CHp(s), where C is the constant included in the SACPZ file. With the delivery of seismic data from the Réseau Sismique Polynésien (RSP), we were also provided poles, zeros, and a sensitivity corresponding to each station. Equal for all six stations were their two zeros (0;0)(0;0) and two poles ( 4.44; 4.44)( 4.44;+4.44). The sensitivity of stations PAE and TVO was given as 0.5236 (nm/s)/LSB at 1 Hz, and for stations PMOR, VAH, TBI, and RKT as 0.212 (nm/s)/LSB at 1 Hz. Here, LSB stands for "least significant bit," and in this case refers to digital counts. Therefore, for all six stations, the poles, zeros, and sensitivity frequency of fs = 1 Hz are identical, but the sensitivities differ. As provided, these data required slight transformation before computation of the the constants of eq. (S6). First, the sensitivities were given in terms of velocity per counts, whereas the convention used in the SEED manual (p. 12) and the IRIS and FDSN Web Services specifies those data in terms of counts per unit of ground motion. Therefore, the sensitivities were inverted to convert them to counts/(nm/s). Next, they were converted from nm to m by multiplication with 10 9 to conform to the SEED convention that the transfer function be given in SI units.
Finally, we converted the pole and zero data from velocity (describing the transformation from digital counts to m/s) to displacement (counts to m). This was done to conform to the SAC standard that a TRANSFER to NONE (deconvolution in the SAC program) results in a displacement seismogram. Otherwise, if left as is, TRANSFER to NONE produces a velocity seismogram-when using SACPZ files the SAC TRANSFER function does not automatically convert ground motion units to displacement, if required, as it does with RESP files (SAC Manual,p. 406). To that end, the sensitivities were multiplied by 2⇡fs (recalling that the sensitivities hold at a specific frequency in Hz, but were computed from a transfer function in rad/s), and a zero was added to the set of poles and zeros (resulting from the integration of the complex transfer function). Note that SAC does not use SI units, but rather assumes (and populates the relevant header variables accordingly) that a TRANSFER to NONE results in a displacement seismogram in units of nm/s. However, we chose to prioritize SEED standards over SAC standards (true for the IRIS and FDSN Web Services in our experience), and thus we were careful to apply the proper multiplication factor of 10 9 in the SAC program after deconvolution such that the resultant waveforms were in nm (or nm/s for a TRANSFER to VEL, as was done with all data in the main text from nearby island stations), to properly match the units written to the SAC header variable "IDEP." We therefore report the following displacement SACPZ files in SI units for RSP stations PAE and TVO written using our sacpzconstant.m and cpptsacpzconstant.m routines, ZEROS 3 +0.000000e+00 +0.000000e+00 +0.000000e+00 +0.000000e+00 +0.000000e+00 +0.000000e+00 POLES 2 -4.440000e+00 -4.440000e+00 -4.440000e+00 +4.440000e+00 CONSTANT 2.699191e+09, and for stations PMOR, VAH, TBI, RKT, ZEROS 3 +0.000000e+00 +0.000000e+00 +0.000000e+00 +0.000000e+00 +0.000000e+00 +0.000000e+00 POLES 2 -4.440000e+00 -4.440000e+00 -4.440000e+00 +4.440000e+00 CONSTANT 6.666493e+09.
The functions relevant to this section that compute the SACPZ constants, C, and normalization factors, A0 are accessible at github.
com/joelsimon/omnia/. Included there as well is transfunc.m, a function which may be of use to those interested in the conversion between SACPZ, RESP, and StationXML files, as well as their transformation between displacement, velocity, and acceleration responses. Supplemental Material: Sampling the mantle under the South Pacific with MERMAID S13 Figure S6. Unfiltered seismograms from RSP.PAE (purple), RSP.PMOR (red), G.PPTF (gray) of a nearby great earthquake. The SACPZ files corresponding to the two RSP stations were written by the authors, and that corresponding to G.PPTF was provided by IRIS. The similarity of the waveforms, both in phase and amplitude, proves that our SACPZ files are correct.

S6.3 Verification
Fig. S6 proves that the displacement SACPZ files we wrote for RSP stations are correct. It compares the unfiltered (apart from those corner frequencies specified during deconvolution, see the Appendix of the main text) seismograms, plotted in displacement (nm), corresponding to a great earthquake in the Fiji Islands region that was recorded by three nearby stations. The traces are each aligned on the theoretical arrival time of the first-arriving P wave computed in the ak135 velocity model. Two of the seismograms were recorded by stations in the RSP (PAE and PMOR, in purple and red seismograms), each serving as the archetypal station for their respective group's SACPZ file written by the authors, and the other by station G.PPTF (in gray), for which the displacement SACPZ file was available from IRIS. The distance between each RSP station and G.PPTF is listed inside the axis (8.1 km for PAE and 338.5 km for PMOR), and they are near enough to one another that we would expect the ground motion at the three stations to be very similar, given the magnitude of earthquake. We see that the waveforms agree very well, both in amplitude and phase, both before and after the first-arrival, but especially for the first fifteen seconds after the first arrival. Therefore, we conclude that the two SACPZ files we wrote in Section S6.2 corresponding to six stations in the RSP are correct. We include as further verification an example in the header of sacpzconstant.m that rederives the values printed on an IRIS help page that explains how to convert a velocity RESP file to a displacement SACPZ file (ds.iris.edu/ds/support/faq/24/whatare-the-fields-in-a-resp-file/). That example shows that our SACPZ constant agrees with the one provided by IRIS to within 0.003%, well within acceptable error.