The S-PLUS Transient Extension Program: Imaging Pipeline, Transient Identification, and Survey Optimization for Multi-Messenger Astronomy

We present the S-PLUS Transient Extension Program (STEP): a supernova and fast transient survey conducted in the southern hemisphere using data from the Southern Photometric Local Universe Survey (S-PLUS) Main Survey and the T80-South telescope. Transient astrophysical phenomena have a range of interest that goes through different fields of astrophysics and cosmology. With the detection of an electromagnetic counterpart to the gravitational wave (GW) event GW170817 from a binary neutron stars merger, new techniques and resources to study fast astrophysical transients in the multi-messenger context have increased. In this paper, we present the STEP overview, the SN follow-up data obtained, data reduction, analysis of new transients and deep learning algorithms to optimize transient candidate selection. Additionally, we present prospects and optimized strategy for the search of Gravitational Wave counterparts in the current LIGO/Virgo/Kagra observational run (O4) in the context of T80-South telescope.


INTRODUCTION
Transient surveys have become increasingly important in astronomy, astrophysics and cosmology due to their ability to provide valuable information on astrophysical time-dependent phenomena such as supernovae, variable stars, active galactic nuclei (AGN), tidal disruption events, gamma-ray bursts, and kilonovae (e.g.Andreoni et al. 2022b;Stein et al. 2023;Aleo et al. 2023;Yang et al. 2023).These events span a wide range of luminosities, spectral types, and timescales, often requiring survey strategies uniquely tailored to find specific classes of transients.By systematically observing and analyzing these transient phenomena, astrophysicists can unravel the dynamic process of those events and gain a deeper understanding of their intricacies.
As transient surveys have advanced, the development of technologies such as charge-coupled devices (CCDs) and robotic telescopes has greatly facilitated the systematic search for astrophysical transients (Perlmutter 1989;Filippenko 1992;Richmond et al. 1993).
★ E-mail: andsantos@cbpf.brHistorically, one of the earlier studies about a systematic approach for searching extragalactic sources was developed by Zwicky (1938), increasing the number of discovered events over the next decades.Searches in the southern hemisphere started around 1980 by (Maza 1980).These advancements enable more efficient and automated detection of transient events, significantly increasing the discovery rate, and allowing detailed follow-up observations and analysis.With the combined power of wide-field observations, advanced instruments, and data analysis techniques, transient surveys push the boundaries in astrophysics research, unveiling new phenomena and providing a comprehensive view of the ever-evolving universe.
Additionally, the detection, classification, and analysis of Supernova type Ia changed our understanding of the Universe, by serving as "standard candles" (Branch & Tamman 1992).With consistent luminosity distances, Type Ia supernovae allow astronomers to accurately measure cosmic distances, which ultimately led to the groundbreaking discovery of the universe's accelerated expansion and therefore insights about a novel component known as dark energy (Hubble 1929).Transient surveys have been instrumental in discovering and characterizing these important cosmic explosions arising from mas-sive stars, compact objects, and stellar interactions (e.g., Hamuy et al. 1993;Filippenko 2005;Kulkarni et al. 2007;Bellm et al. 2019;Karambelkar et al. 2023).Light curve data from those celestial objects plays a crucial role in tailoring fundamental aspects of explosive events, such as their energy levels and mass ejecta (Woosley 1988;Gal-Yam et al. 2009), constraints of the presence of circumstellar matter in their environments (Smith 2014), and the formation of compact objects (MacFadyen et al. 2001;Soderberg et al. 2010).It's also possible to use explosive transients for cosmology cases (Riess et al. 1998;Perlmutter et al. 1999).
Beyond the supernovae, transient surveys also contribute to the study of gravitational waves (GWs).The detection of GW events, such as the groundbreaking GW170817, which was initially detected in GW emission (Abbott et al. 2017b) and precisely localized at optical wavelengths (Soares- Santos et al. 2017) opened new opportunities to study the compact objects that lead to LIGO/Virgo GW events.GWs provide unique insights into some of the most energetic and cataclysmic events, including their contribution to Galactic chemical enrichment (Eichler et al. 1989;Argast et al. 2004;Arnould et al. 2007;Metzger et al. 2010;Kasen et al. 2017) and formation of heavier neutron starts (NS) and black holes (BH) (Belczynski et al. 2002;Dominik et al. 2015).Furthermore, the use of gravitational waves in cosmology allows for precise measurements of the Hubble Constant (Abbott et al. 2017a;Soares-Santos et al. 2019;Chen et al. 2018;Bom & Palmese 2023) and provides valuable constraints on cosmological models.With the increased focus on optical transient surveys for studying the physics of explosive transients, multi-messenger astronomy and cosmology, individual surveys are increasingly tailored to focus on some subset of depth, cadence, total area covered and wavelength coverage in order to maximize their own science returns.For example, the Vera C. Rubin's Legacy Survey of Space and Time (LSST; LSST Science Collaboration et al. 2009;Ivezić et al. 2019) will observe the Southern hemisphere with unprecedented depth for a wide-field optical time-domain survey, but without the cadence necessary to find and characterize rapid transients such as kilonovae (Andreoni et al. 2022a) and saturating at ≈17 mag.The synergy between LSST and other wide-field optical survey and follow-up telescopes, especially when their data streams are combined using the latest transient alert brokers and target and observation managers (TOMs; e.g., Volgenau et al. 2022;Coulter et al. 2023), will enable new science that either survey cannot perform alone.
Here we describe an optical time-domain survey using the T80-South telescope located at Cerro Tololo, using imaging from the main S-PLUS Survey, which we call the S-PLUS Transient Extension Program (STEP, PIs: Bom & Kilpatrick).We describe our pipeline and image processing procedures for retrieving and processing image data as well as our methods for identifying and reporting transient candidates.We then describe several science cases that highlight the utility of STEP, particularly for discovering supernovae, follow up and characterization based on their light curves, and follow up of GW events during Ligo-Virgo-KAGRA (LVK) observing runs.
The paper is structured as follows: in Section 2 we describe our scientific goals with this project and how STEP is designed to achieve those goals.In Section 3, we describe each step in our imaging and analysis pipeline starting from obtaining images from the telescope, our reduction process, our difference imaging analysis procedure and finally candidate selection.In Section 4 we detail our deep learning algorithm designed to reject false positives during difference image analysis.In Section 5 we outline extragalactic transients detected during our project.In section 6 we overview the light curve cadence of followed SNe.In section 7 we talk about our strategy and criteria to follow GW events.Finally, in Section 8 we review our conclusions and describe our future plans for this project in the era of LSST time-domain discovery and follow up.

S-PLUS Main Survey Overview and Discovery of
Transients in the Southern Hemisphere We outline the light curve of three followed SNe in appendix A, as our first data release (DR1).
The MS employs a single epoch observation of each field per filter, with a range of seeing from 0.8 to 2 arcsec.In this survey, three exposures are taken for each filter with an exposure time that goes between ∼ 30 and ∼ 60 seconds for the griz broad bands (see Table 5 of Mendes de Oliveira et al. 2019 for exact exposure time of all T80cam filters), dithering a few arcseconds to the east and west of the field center.The primary objective of this strategy is to enhance the accuracy of photometric redshift estimation for objects within the field, targeting those with absolute magnitudes down to   ∼ 20.Furthermore, the redshift information provided by this approach aims to achieve a precision of Δ/(1 + ) = 0.02.The individually captured images are overscan-subtracted, trimmed, bias-subtracted, flat-divided, and fringing subtracted (-band only; see Mendes de Oliveira et al. 2019 for more details on the reduction process and pipeline).Besides these steps, the images are cosmeticcorrected to identify and mask satellite tracks and cosmic rays and then astrometrically solved(see Almeida-Fernandes et al. 2022 for more on the astrometric precision).
Given the previous description of S-PLUS and T80cam, its wide field-of-view camera, and an excellent calibration of its photometric system (Mendes de Oliveira et al. 2019), S-PLUS emerges as a suitable survey for investigating transient phenomena in the Southern sky.Moreover, the survey benefits from its overlap with other timedomain surveys including DES, ATLAS, and PAN-STARRS, which collectively cover various portions of the Southern sky.This survey also fills a critical gap in time-domain coverage of the Southern sky before the Vera C. Rubin Observatory (Ivezić et al. 2019) and La Silla Schmidt Southern Survey (LS4) come online to cover this sky area with high cadence.Given all this information, we argue that S-PLUS is an appropriate instrument for investigating transient phenomena in the Southern sky.

Supernova Follow Up
Supernova follow up in the era of the Rubin Observatory will rely on careful selection of transients from the thousands of potentially valuable transients that will be discovered each night.Even restricting those samples to events accessible by smaller aperture telescopes such as T80S requires selecting from potentially hundreds of events and following those necessary for specific science cases, such as measuring cosmological parameters from SN Ia light curves, constructing SN luminosity functions, testing light curve classifiers, or constraining the incidence of strong circumstellar interaction in core-collapse SNe.All of these science cases can be addressed by synthesizing data from multiple sources using transient brokers (e.g., Narayan et al. 2018;Förster et al. 2021;Möller et al. 2021), photometric classification of light curves (Boone 2019;Muthukrishna et al. 2019;Villar et al. 2019;Sravan et al. 2020), and anomaly identification to prioritize transients with intrinsically peculiar properties (e.g., total ejecta mass or energy, nickel mass, incidence of circumstellar interaction) or those at critical or scientifically useful phases.
Obtaining multi-band light curves for the thousands of transients needed for light curve parameter estimation will require coordination between a global network of telescopes.Telescopes such as T80S can improve the effective cadence of Rubin Observatory by observing between visits from the survey.Moreover, large samples of transient light curves are needed now to train photometric classifiers and anomaly detection.We present supernova follow up in Section 6 and further discuss prospects for follow up in the Rubin era in Section 8.

Gravitational Wave Follow Up During
LIGO/Virgo/KAGRA Observing Run 4 The kilonova counterpart to GW170817 was rapidly declining and required rapid, high-cadence, multi-band follow up to constrain its total ejecta mass and composition, including observations from the T80S (see, e.g., Abbott et al. 2017b;Coulter et al. 2017;Díaz et al. 2017;Kilpatrick et al. 2017;Soares-Santos et al. 2017).Due to the increased sensitivity of LIGO since that time, NS mergers found during observing run 4 (O4) are systematically more distant than GW170817 at 40 Mpc (Abbott et al. 2017b;Soares-Santos et al. 2017) and may in general be more poorly localized.As these events may occur and be detected anywhere in the sky, a global network of robotic telescopes with wide fields of view and multi-band follow up are needed to localize and obtain optical follow up of these events.Within this context, we consider the feasibility of future GW follow up to identify new kilonovae with the T80S.Numerous dedicated efforts have been undertaken since the localization of the GW170817 kilonova to identify additional EM counterparts associated with similar NS mergers (e.g., for events discovered during observing run 3; Andreoni et al. 2020Andreoni et al. , 2019;;Coughlin et al. 2019;Dobie et al. 2019;Goldstein et al. 2019;Gomez et al. 2019;Hosseinzadeh et al. 2019;Lundquist et al. 2019;Ackley et al. 2020;Antier et al. 2020;Coughlin et al. 2020;Morgan et al. 2020;Paterson et al. 2020;Pozanenko et al. 2020;Thakur et al. 2020;Vieira et al. 2020;Watson et al. 2020;Alexander et al. 2021;de Wet et al. 2021;Kilpatrick et al. 2021;Tucker et al. 2022).To improve the observing efficiency of these search efforts and aid in rapidly finding kilonova to maximize the scientific opportunities that are possible with their early light curves (e.g., Arcavi 2018), we consider a new observing strategy optimized for the sensitivity, field of view, and broadband filters on the T80S.Considering this strategy in combination with realistic selection criteria for O4 GW events, we consider the observations necessary to find the next kilonova counterpart in Section 7.

STEP PIPELINE OVERVIEW AND DATA ANALYSIS
The STEP pipeline uses two types of images: archive images and on-the-fly observations.Archive images were obtained before the STEP project started and found in their final reduction form (updated calibration frames, astrometry, and quality checked).These are the images co-added to compose the S-PLUS Main Survey.The on-thefly images are those run through the STEP pipeline as soon as they are observed to enable rapid identification of transients.They are usually "last night images" and are not yet the final form of the reduction process, using older bias and flat frames.They also do not include fringe corrections.Although not the final version, they enable STEP to access the most up-to-date picture of the observed sky and identify possible transient phenomena as they are ongoing, which allows us to follow up on the explosion to acquire further information about the phenomena.
Our methodology to discover transient candidates involves utilizing the comprehensive coverage of the main survey fields in the griz filters to extract information on potential transients through a stateof-the-art pipeline described below and summarized in Figure 1.Our objective is to document and characterize transients identified within these images, leveraging the photometric redshift to derive the luminosity distance and other intrinsic properties of the object, as well as complementary spectroscopy and host galaxy data, when available.The S-PLUS Main Survey is specifically designed to encompass a wide expanse of the southern hemisphere, reaching high declination angles.This expansive coverage facilitates the identification of transient phenomena in the region.
The STEP pipeline operates by processing nightly data from S-PLUS, taking advantage of its ongoing survey to look for transients.S-PLUS does not revisit the same field, thus to obtain template images, we use templates from other surveys that overlap with S-PLUS footprint (e.g.DECam, Pan-STARRS, and SkyMapper).We transfer and digitally subtract our collected images from S-PLUS with the templates derived from those other surveys.By employing this technique, we can identify transient sources within approximately 1 hour of image acquisition.Each identified source is then carefully evaluated using source properties within each image frame, contextual data extracted from the star, minor planet, and galaxy catalogs, and machine learning classification.We describe our use of the Gaia DR3 and minor planet checker in Section 3.3, while the galaxy catalogs used are those that provide context to the distance and thus luminosity scale of each transient detection via spectroscopic or photometric redshifts.Specifically, we make use of the Pan-STARRS Source Types and Redshifts with Machine Learning (STRM) catalog (Beck et al. 2021), the Legacy photometric redshift catalog (Zhou et al. 2021), and the S-PLUS photometric redshift catalog (Mendes de Oliveira et al. 2019).This comprehensive approach ensures the thorough vetting of each transient source.The ultimate goal of our pipeline is to optimize follow-up observations for multi-messenger astronomy, making use of the multi-band photometry available from the S-PLUS Main Survey.The subsequent sections in this paper provide detailed explanations of each step within our pipeline, starting from image acquisition and extending through the transient identification process.

Reduction Processing of T80S and Template Imaging
Before the reduction process and analysis, we begin with the daily collection of science images from the S-PLUS Main Survey (and galactic survey, when available), captured using the griz broad-band filters.The retrieved images have undergone a pre-reduction process, which encompasses several crucial steps, including bias subtraction, flat field correction, overscan subtraction, and trimming.These steps are implemented to enhance the overall quality of the images.Subsequently, the images are transferred to our servers, where an initial pre-processing stage is conducted before they are fed into the reduction process.For astrometric calibration of the raw images collected, we utilize the command-line tools provided by astrometry.net(Lang et al. 2010).
Following these pre-processing steps, the main STEP pipeline takes charge of the subsequent stages, encompassing imaging, photometry, and data analysis.Throughout the forthcoming sections, we provide detailed descriptions of each stage of our software, elucidating their functionalities, and highlighting their integration with essential open-source software tools prevalent within the astronomical community.
Template images are constructed for each unique field and imaging band by communicating the observed fields to a separate server.Our pipeline utilizes a daily cron job that first downloads images saved as FITS files (Wells et al. 1981) from the T80S telescope database for the previous night and communicates the field center and imaging band continuously to the template server.The template server then determines if the field has coverage in Pan-STARRS, DECam, or SkyMapper, and downloads all available imaging covering that field from the Mikulski Archive for Space Telescopes (MAST) for Pan-STARRS (Kaiser et al. 2002), the NOIRLab archive for DECam, or the SkyMapper image cutout service1 (Wolf et al. 2018).To make the photometric calibration and point-spread function (PSF) shape as consistent as possible in each template, our pipeline uses only a single survey to generate each template.From the input images, we perform sky subtraction, normalization using their reported photometric calibration, and convolution to a uniform PSF shape using SWarp (Bertin 2010) to produce an output template image with the input field center and pixel scale of the T80S camera.The final template image is then staged in our pipeline, and a final photometric catalog is generated in the image frame using DoPhot (Schechter et al. 1993) and a zero point empirically estimated for the image by comparing to the Pan-STARRS or SkyMapper griz photometric catalogs.
For the T80S science frames, the pipeline uses IRAF to estimate a geometric distortion solution across the detector using Gaia DR3 astrometric standards and algorithms implemented in msccmatch (Tody 1986).The final distortion solution is saved in TNX format in the image header to regrid each frame to a common pixel scale and field center as the template images at a later stage.
We then use source extractor (sextractor; Bertin & Arnouts 1996) to measure the background levels and generate a catalog of saturated stars.From this catalog and output segmentation maps in sextractor, we mask out saturated stars and the unique diffraction spike pattern for T80S based on the number of saturated pixels within each star.The output file from this stage is a unique mask image created with mana2 .We then use SWarp (Bertin 2010) to regrid the science and mask images, assuring that the science and template images have a common and uniform pixel scale of 0.55 ′′ across the image, a common field center, and a uniform size of 10000×10000 per frame.As with the template images, the final stages of our science image pipeline perform PSF photometry using DoPhot and flux calibration by comparing to the Pan-STARRS DR2 and SkyMapper photometric catalogs.In general, we assume that the bandpasses from each template survey are identical to those from the S-PLUS survey, that is, that S-PLUS  bands are identical to PS1, DECam, and SkyMapper  bands both for calibration and image subtraction.We are exploring updates to our calibration methods for each photometric catalog and source of template images to correct for slight differences, for example, due to mirror coatings, bandpasses, and detector quantum efficiency.Given the calibration systematics achieved from the overall 12-filter S-PLUS system (Mendes de Oliveira et al. 2019), we expect to be able to achieve similar residuals as cross-calibration from other optical surveys (e.g., Finkbeiner et al. 2016;Scolnic et al. 2015), with cross-calibration residuals of a few millimag.

Difference Imaging
After the reduction of both science and template images, we match the template to science images to perform digital image subtraction or difference imaging.This stage is fundamental to finding transients as it enables the automatic detection of significant point sources present in the science image but not in the template (or vice versa).Our difference image procedure utilizes the calibration science and template images, their noise frames, and mask images to perform subtraction in hotpants (Becker 2015), an open-source tool designed for this purpose.hotpants uses common point sources to define a Gaussian convolution kernel for PSF matching between the science and template images then directly subtracts the convolved image from the other image.In general, the template image is deeper than the science image.It has a sharper PSF, red in particular for the PS1 and DECam imaging that makeup >95% of our template imaging.For the remaining ∼5% of our template imaging that comes from the 1.35 m SkyMapper telescope (Wolf et al. 2018), the template image stack is often still deeper than our T80S imaging, but further optimization of the STEP pipeline is ongoing to explore the effect of image depth for these data.In general, we set the convolution direction to convolve the science image.We also use our point source catalog from DoPhot to define the locations of "substamps" in hotpants, that is the regions of the image used to estimate the parameters of the convolution kernel.Finally, given the large field-of-view of the T80S camera, we fit the convolution kernel using a third-order polynomial to estimate the spatial variation of the kernel across the image and a second-order polynomial to subtraction background emission.All other parameters are set to the defaults in hotpants.
After the image subtraction process, we check the quality of the resultant difference image for the average scatter in the fitted kernel.Using the fixed PSF parameters of the template image, we then search for significant point sources and perform photometry using a customized version of DoPhot for this process.This catalog is used to generate cutout images of possible transients to be posted on a webpage for visual inspection and to feed a Neural Network that classifies a given cutoff into real transient or bad subtraction.

Transient Candidate Selection
We identify all candidate transients in our data in every image frame where DoPhot reports a significant detection, clustering these detections for sources within 1 ′′ across separate frames of the same field.
We then perform forced photometry across all frames in the same field at the flux-weighted centroids of each "cluster" using DoPhot.The unforced and forced photometric data, individual centroid positions, incidence of masking within the PSF aperture, cutouts of the science, template, and difference images around the position of the transient are displayed for each candidate in an internal webpage used for visual inspection (not publicly available).(Figure 7).These data were initially visually scanned for real/bogus identification of transients versus subtraction artifacts and other sources of non-astrophysical emission and cosmic rays (flagged as "BAD" sources).Our visual flagging criteria for all other transient sources are stored to generate a training sample for the machine learning process described in Section 4.
From this list of transients, we further vet candidates using methods implemented in (Kilpatrick 2023).We use minor planet catalogs to flag known asteroids within 10 ′′ of the source using the Modified Julian Date and coordinates of each detection ("ASTER-OID" sources) using the Minor Planet Checker 3 .We also check for coincidence (<1 ′′ ) of any source classified as a likely, unblended point source in the Gaia DR3 catalog to flag sources of stellar variability ("VARSTAR" sources; i.e., by crossmatching to the Gaia point-source catalog in Gaia Collaboration et al. 2023).Finally, we crossmatch all candidates to known transients in the Transient Name Server database (Yaron & Gal-Yam 2012) to flag all "CONFIRMED" transients.We incorporate all objects and our manually applied flags into the machine learning process described below.The final reported transients are the result of this deep learning candidate selection.

DEEP LEARNING CANDIDATE SELECTION
In practice, most transient alerts for a given observation night are false positives as the difference imaging process produces various artifacts.The most common example of an artifact we classify as a "Bad Subtraction".When there is a mismatch between the flux from the search and template images, for example, due to slight differences in filter transmission between T80S-Cam and PS1, DECam, or SkyMapper, or if there are variations between the PSFs not properly accounted for by hotpants, the subtraction process results in underand over-subtracted regions in the difference image.The most common bad subtractions are bright stars that appear in the difference image as positive or negative point sources.We acknowledge that this phenomenon is difficult to distinguish from instances of true stellar variability (i.e., "VARSTAR" sources), but in general we find that ∼1% of stars in our images exhibit variability >0.1 mag in optical bands, which is at least two orders of magnitude more common than stars observed by optical time-domain surveys such as Kepler (Mc-Quillan et al. 2012).We infer that the vast majority of these transient alerts are bad subtractions and describe a method for filtering these alerts below.
We are primarily interested in reducing the number of candidates that require visual inspection and automating the transient detection process, for which we adopt a convolutional neural network (CNN) to classify the images as transients or artifacts (i.e., non-transients).Applying CNNs to difference images is a common approach in modern time-domain surveys (Duev et al. 2019;Shandonay et al. 2022) with multiple well-tested procedures in the literature.We present a new method for systematically classifying sources that are optimized to our pipeline and data reduction procedures.
3 https://minorplanetcenter.net To address this classification problem, we made use of a family of CNN models known as Mobilenet (Howard 2017) and prepared the data following the prescription of (Bom et al. 2022).This architecture is based on depth-wise separable convolutions.This architecture demonstrates superior speed and accuracy characteristics in our data set when compared to several different CNN-based models such as VGG (Simonyan & Zisserman 2014), ResNet (He et al. 2015) and Inception models (Szegedy et al. 2015).

Training, Validation, and Test Data Set
The dataset used in this section consists of a set of known real transients observed with T80 and processed by our pipeline, together with different kinds of artifacts produced by the same pipeline after difference imaging.We followed 20 different known supernovae (some of them in the same night) to have enough data to build a set of real transients.The cutout images produced by our transient detection pipeline are made using four SDSS-like  filters, having a cutout per filter.We collect three exposures per band, following the S-PLUS Main Survey observing procedure (Mendes de Oliveira et al. 2019), resulting in 12 different images for each transient candidate observation.Figure 3 shows examples of different image cutouts at the end of our pipeline.
To build the artifact dataset, We randomly selected 4 images (one per filter) over the set of observations of all followed supernovae.From those, we made cutouts of all artifacts (excluding the real transient within the image), having a total sample of 10,000 samples.The Signal-to-noise (SNR) distribution of the total detection within the selected images is shown in figure 2. The supernovae dataset was built more straightforwardly.For each known followed transient, we made a cutout of all exposures per filter, having a total of 233 samples.We downsampled the artifact dataset to match the real transients and made use of data augmentation to have a final dataset.
Following the reduction procedure described in section 3, each image is processed with a search (or science, i.e., the original image from T80S), template (PS1, DECam, or SkyMapper image), and difference images (the subtraction between the two), all of which are simultaneously fed into the network through independent channels in the format of (51×51×3), similar to how the red, green, and blue image arrays that are fed into CNNs in traditional image analyses.To ensure the integrity of our results, we avoid data leakage by assigning all 12 images of each observation to the training, validation, or test dataset.In this way, we prevent any bias in the model's test results that might occur if one of these images was used for training and another for testing.
One of the challenges associated with the training and analysis of this dataset is that the transient candidates are contaminated by large numbers of bad subtractions.This resulted in an unbalanced classification problem.To address this issue, we downsampled the bad subtraction class to have an equal number of transient images.Finally, the total collection of images was partitioned approximately into three subsets, with 67% of the data allocated for training, 13% for validation, and 20% for testing.Finally, an additional set of 96 images was obtained to ensure our model's performance metrics.

Pre-processing
The deep learning methods are sensitive to visual features by construction, so we pre-process the data to enhance visualization, easing the training process.The pre-processing is divided into two main steps: contrast adjustment and normalization.The raw image from S-PLUS goes through pre-processing like overscan, bias and flat field correction, and satellite track removals.We retrieve those images and initiate our reduction pipeline for further analysis.Our pipeline goes through an astrometric solution of all retrieved images.We then proceed to masking of saturated stars, diffraction spikes, and sky background (noise).After the masking stages, We resized all images to a standard of 10.000 x 10.000 points.Finally, our images go through PSF photometry and Flux calibration.The template images go through similar stages (with the exception of diffraction spikes masking and astrometric solution because those steps are not needed in this case.In the end, we perform the difference imaging as a final image product. First, an image contrast adjustment was performed by clipping the image histogram, that is, we chose lower and upper limits and set every pixel above or below those values to be equal to the nearest bound.We choose the upper limited value to be the 99.2 % percentile level for the search, template, and difference images.For the lower bound, we choose the 0.01 percentile.In other terms, every value above the upper limit (percentile 99.2%) would be set to this threshold value, and every value below the lower bound/limit would be set to the percentile 1% of the histogram values of each image.this is done for each channel independently to maintain the unique characteristics of each channel.
Afterward, the images were normalized such that all pixels lie in the range [0,1], where for each image and each channel we subtract the minimum pixel value and divide by the maximum minus the minimum pixel value.This step improves the neural network training convergence.We made use of data augmentation techniques to enhance the training by applying rotations of 90, 180, and 270 degrees, horizontal flips, vertical flips, and combinations of rotations and flips.This increases the dataset by a factor of 8.

Training
A slight modification to the MobilNet architecture was made by adding a flattening layer instead of the two last layers, which were originally an average pooling layer and a fully connected layer with one neuron (Howard 2017).The new architecture was designed to improve the generalization capability of the model.
The previously described model was trained using one of the stateof-art optimizers: the Adam (Kingma & Ba 2014), with a binary-cross entropy loss.We initialize the networks with pre-trained weights from the ImageNet dataset to improve the computing time needed for convergence, overall performance, and stability of training, as observed in (Bom et al. 2021).
The model was trained in a Multi-GPU cluster with 8 RTX 3090 with 24 GB of GPU memory each.The model training was implemented in TensorFlow2 (Abadi et al. 2015).To determine the optimal set of weights for the model, we train the network for an unlimited number of epochs until the Validation Loss value stops improving for 25 epochs.We selected the weights from the epoch with the lowest validation Loss, this method of weight selection maximizes performance on the validation set, which serves as a good indicator of how the model will perform on unseen data.

Results
In a binary classification problem, various metrics can be used to evaluate model performance, but for our particular problem of filtering artifacts while preserving transients, three main metrics are of particular interest: precision, recall, and false positive rate.
The precision (or purity) is defined as the proportion of true positive predictions made by the model among all positive predictions made.It represents the ability of the model to correctly identify true transients among all images classified as transients.The recall (or completeness or true positive rate), is defined as the proportion of true positive predictions made by the model among all actual positives.it represents the ability of the model to correctly identify all true transients.The false positive rate is the proportion of artifacts that are misclassified as transients.
The output of our CNN model is a score that can be interpreted as the probability of a given image being transient.Therefore, it is necessary to define a threshold, , which is used to differentiate between true transients and artifacts.An image set is classified as a transient if the model's probability output exceeds , and as an Artifact if the probability falls below .The threshold that maximizes the F1 score (the harmonic mean of precision and recall) is used as the operating threshold for determining positive and negative results.
With this threshold, the false positive rate was determined to be 3.1%, and the recall was 92.7%.This means that only 3.1% of the images classified as transient were actual Artifacts, and 92.7% of the true transient images were correctly identified as such.A confusion matrix was generated to further analyze the performance, which is shown in Figure 4.
The Receiver Operating Characteristic (ROC) curve shows the trade-off between the True Positive Rate (TPR) and False Positive Rate (FPR) of the model as the threshold is varied in the range [0, 1].The Area in the image refers to the area under the curve (AUC), our model achieved AUC=0.992(Figure 5).Additionally, our model achieved an AUPRC (Area Under Precision-Recall Curve) of 0.99 (Figure 6), which aligns with the AUC from the ROC curve.This indicates the model's capability to accurately identify transients with high purity and maintain a high completeness.Such results hold significant value in our analysis, particularly in scenarios where identifying positive examples is crucial, such as detecting potential multimessengers.

EXTRAGALACTIC TRANSIENTS DISCOVERED IN STEP IMAGING
After classifying transient candidates using the flagging process described in Sections 3 and 4, we remove all likely image artifacts, asteroids, and variable stars before visually inspecting sources that we classify as likely extragalactic transients.In particular, transients close to a source identified as a host galaxy (i.e., with a photometric or spectroscopic redshift from S-PLUS, Legacy, PS1, or the NASA/IPAC Extragalactic Database; Mendes de Oliveira et al. 2019;Zhou et al. 2021;Beck et al. 2021) or at high Galactic latitudes are inspected.
Here we report the identification of two transients found through our analysis method described in Sections 3 and 4. Our first supernova candidate detected within the S-PLUS Main Survey data was in 2022-05-01 at coordinates =10:31:09.221and =+00:19:36.25 (J2000) and reported to Transient Name Server (TNS) at 2022-08-21. 4 .It was discovered at Magnitude 18.18 ± 0.047 (AB system, without MW Extinction correction) on -Sloan band.Figure 7 shows our difference imaging data in the Sloan griz bands for the candidate.
The second candidate found on our or list is the SN 2022tiv5 .It was initially reported by ATLAS (Tonry et al. 2022) in 2022-09-02 at coordinates =21:23:57.450and =-22:44:07 (J2000) with magnitude 18.65 (AB System) in orange-ATLAS filter.It was rediscovered within our project 1 month after the S-PLUS Main Survey visited the same field.It was classified as a type Ia Supernova with redshift  = 0.03 with spectrum data approximately 2 weeks after first reported (Toy et al. 2022).Figure 8 shows our findings for this candidate.

SUPERNOVA FOLLOW-UP
During the observation run of this project we had followed supernovae that we flag as scientific interesting, either by its unique physical nature, rarirty or distance.We collect photometry data with 3 day cadence follow up of those targets with S-PLUS using the same strategy as the Main Survey, when the time between the observations  were available.We collected data for 4 different supernova: 2022crv6 , 2022ann7 , 2022qxu8 and 2022acko9 .We summarize some information on those supernovae in Table 1 and follow-up photometry (apart from SN 2022ann, which was originally presented in Davis et al. 2023) from our pipeline is provided in appendix A.
We show the follow up light curves of SN 2022crv, 2022qxu, and 2022acko in Fig. 9 with comparison to model light curves for photometric classification.As a proof of concept, we utilize the Recommender Engine For Intelligent Transient Tracking (REFITT; Sravan et al. 2020) tool along with PanSNIP (Boone 2019) to photometrically classify our follow up light curves using methods implemented in (Garretson & Milisavljevic 2023).Each of our targets has an easily identifiable host galaxy with a spectroscopic redshift, which we  incorporate into our fit to aid in classification.Although each target has a spectroscopic classification (see Andrews et al. 2022;Izzo et al. 2022;Li et al. 2022), we do not use any information from the transient other than follow-up data from STEP, including sky position and the in-band magnitudes reported in appendix A.
The photometric classification scheme accurately recovers the SN type in each case despite the limited coverage of the overall light curve in some cases.As described in (Boone 2019) and (Garretson & Milisavljevic 2023), the combination of multi-band follow up, high cadences such as those from ZTF (Bellm et al. 2019)    classified transients for cosmology, SN rates, and anomaly detection by providing high cadence (<5 day), multi-band photometry for the large number of transients discovered in the Legacy Survey of Space and Time.Since we can't cover the whole list of targets from LSST given our field-of-view capability, our future plans involve following selected targets, based on our science cases, and changing our untargeted survey to a targeted facility in the optical bands.Those targets can be selected by ranking tiles by the number of targets present on the fields, weighted with enhanced data we have at our disposal for further analysis.

STRATEGY FOR GRAVITATIONAL WAVE FOLLOW-UP DURING LVK O4
To perform a systematic search of optical counterparts of gravitational wave (GW) events, we design a strategy following the prescription presented by (Bom et al. 2023) using the same formalism.The strategy is defined as a set of parameters, namely the exposure times, broadband filter combinations g,r,i,z total probability area coverage, and time at the observation after the burst to detect the Kilonova with two independent passes over the GW probability region.For a given set of observational conditions parametrized by the effective exposure time factor,     a set of observational parameters Θ consisting of filters and exposure times, a choice of probability area coverage Ω in a moment  after the GW alert, we calculate the probability of discovery, (discovery| Ω, ,  eff , Θ, z) defined as: where   is the luminosity distance and    are weights over the simulated light curves at redshift   representing the best fit and its uncertainties (Kilpatrick et al. 2017) of the GW170817, described in terms of a Kasen kilonova model as a blue component (Kasen et al. 2017) (see section 2.3 of Bom et al. 2023, for a description of the kN GW170817-like bright and blue and light curve model) that are within our limiting magnitude   (Θ,     ).Similarly, the light curve simulations were performed by snana (Kessler et al. 2009).We calculate the probability of confirmation   , i.e. the probability of two independent passes to detect the transient, this step is relevant to eliminate spurious detections during the GW counterpart search.We define our strategy by allocating a maximum of 2 nights per GW event.In order to define a strategy, we use the same definitions from (Bom et al. 2023) to select a low telescope time strategy, i.e., a strategy that takes into account a balance between total telescope time used per night of observation while maintaining a high discovery probability.From a grid of possible exposure times (60,90,120,200,300,600,1200,3600) and confidence level areas (0.9, 0.8, 0.7).We define and select a low telescope time strategy the one which uses the minimum telescope time around 10% of the maximum probability of discovery from our simulations run.
For well-localized events, one can describe the sky probability region of a gw event as an ellipse10 .Each confidence interval is described as a smaller ellipse in comparison with an ellipse of higher confidence interval.In our strategy design, we left open the possibility of using deeper exposures on the core confidence levels 0.3 and 0.5 (or "inner" ellipses).We assume the observation can take a maximum of 8 hours per night, 30 readout/slew time per pointing, and dark/grey night.It is also important to remark that our confirmation probability does not include day/night rhythm (we set a fixed time for night duration) and time lost for bad weather.

Gravitational Wave Simulations
For the GW simulations described in this section, we used the BAYESTAR11 (Singer & Price 2016) ecosystem which uses LAL-Suite tools (LIGO Scientific Collaboration 2018).The method used in this section to create the simulated events follows the work described in (Petrov et al. 2022).
We begin our simulation with an injection of 10, 000 binary neutron stars (BNS) merger events with a TaylorF2 waveform limited to a maximum luminosity distance of 500 Mpc uniformly distributed over a sensitive volume where we presumed a Planck18 (Planck Collaboration et al. 2020) cosmology.The sensitivity functions for the Advanced LIGO, VIRGO, and KAGRA detectors were determined using the noise curve data available in the latest LIGO document12 .All detectors have a duty cycle of 70%, following the discussions in (Abbott et al. 2020).The BNS injections follow a Gaussian distribution for the masses, with mean 1.5M ⊙ and standard deviation 1.1M ⊙ .We truncated the neutron star masses to be in the interval 1.1 < M NS < 3.0, taking in consideration the limits for Neutron Stars masses.The spins for both stars are uniformly distributed in  the interval [−0.05, 0.05] in the  axis.After the injection, we use a Matched filter search to obtain detected events, using the same waveform in the injection process for reconstruction, resulting in 1076 detections.Here we define as detection those events which have a SNR > 4 in at least two detectors and a net SNR threshold > 12, where we added the measured SNR with a Gaussian noise.In the end, skymaps are produced using the BAYESTAR algorithm for the searched-out events.Figure 10 summarizes our findings, showing the 90% credible area interval over the 90% credible interval for Lu-minosity Distance, and the discovery probability of each simulated event.
To properly design an observation strategy with the T80-South telescope, we selected events that have a sq-degree coverage in the 90% credible region < 300 sq-deg.791 of the simulated events satisfies this condition.Given the Field of View from the telescope, we can cover up to 100 sq-deg of the sky per night while saving telescope time using our method.The strategy adopted to search for an electromagnetic counterpart during LVK Fourth Observing , and 2022acko as described in Section 6.We photometrically classify our targets using methods implemented in (Garretson & Milisavljevic 2023).In each case, the classifier accurately recovers the spectral type determined by optical spectroscopy and reports to the Transient Name Server.All three sources were calibrated using the Pan-STARRS catalog, and so we use the classifications trained on Pan-STARRS photometry from (Garretson & Milisavljevic 2023), indicated by the "ps1" label for each band.run (hereafter, O4), follows closely the strategy designed in (Bom et al. 2023), adapting the telescope specifics to our project.Figure 11 summarizes our discovery probability given a sample of BNS events, with simulated kilonovae light curves obtained with snana (Kessler et al. 2009).The Low Telescope Time aims to increase the discovery probability while trying to minimize the amount of telescope time used in two passes.
It's important to remark that the simulations used in this work are based on assumptions about the planned sensitivity of LVK detectors.The predicted sensitives are close to those anticipated in mid-2022 for the O4 cycle.However, the current operation for O4 at the time of this writing diverges from the expected designed sensitivity.The Virgo detector, for example, hasn't joined the LIGO Handford and Livingston detectors from its beginning and remains uncertain when it will achieve the proposed sensitivity.The KAGRA detector will participate in O4 for a limited period, with a range for BNS merger below the estimated in this work (80Mpc).The estimate that comes from our analysis retains significant value, in terms of methodology and strategy for observation as an optimistic scenario in O4, although it does not reflect the reality of O4 so far.
In the end, our strategy analysis outputs the best exposure time and filter that maximizes our chances of finding an electromagnetic counterpart.We further refine our tiles list using teglon (Kilpatrick et al. 2021;Coulter et al. 2023), open-source software designed to optimally tile the LVK localization volume by crossmatching known galaxies in the GLADE galaxy catalog, under the assumption that the EM counterpart occurred in a galaxy within in that region.

Kilonovae Sensitivity
We present our probability of discovery in our low-telescope time strategy in figure 10 which shows the area and distance of each event binned by its probability of discovery in the first pass.Considering our blue kN model with absolute magnitude in  band peaks at ∼ −16 (Bom et al. 2023), most events have a discovery probability of ∼ 70 − 80.Most of the strategies chosen by each event selected filters  in both passes (68%) or  (20%), or a combination of both  (12%).The typical exposures selected by our optimization were 60 for the outer probability region and 90 in 37%(33%) of events in the first (second) pass.Another relevant configuration of exposure times was 200 or 300 for 22%(26%) of the core probability region in the first pass (second).The preferred configuration in terms of coverage was 90% probability coverage and deeper exposures in the 30% confidence level core.

Candidate Vetting and Analysis
To help mitigate the presence of false candidates, we employ a candidate vetting pipeline as outlined in section 3, which involves crossmatching our candidates with the minor planet center and transient name server, together with a neural network to help decrease false positives.Furthermore, conducting a secondary pass on the fields on a second night could aid in the identification and exclusion of other categories of transients.Whenever possible, we plan to validate candidates previously reported by the Zwicky Transient Facility (ZTF) via the FINK Broker API.Ultimately, we verify whether the targets in our final list have already been reported to the General Coordinates Network (GCN) and announce those that have not.
In the case of Binary Black Holes (BBHs) mergers, some models predict it's possible to find an electromagnetic counterpart as flares in Active Galactic Nuclei (AGNs) activities (see, e.g.Rodríguez-Ramírez et al. 2023;Bartos et al. 2017;Tagawa et al. 2023;Graham et al. 2023).The flares from those events can last from days to weeks.We select a list of possible targets by crossmatching the 90% credible region from BBHs mergers during O4 with AGN catalogs together with cuts on redshifts from possible host galaxies, decreasing the amount of sky area needed to be covered.

Summary
The STEP Projects works as a transient survey in the Southern Hemisphere, using the capability of T80South Telescope with its 2 sq-deg field of view and the Main Survey observation strategy to report supernovae candidates to the astronomical community.Our project also works as a complementary source of photometric data of scientific interesting supernovae, helping understand the nature of those massive explosions.As in the era of multi-messenger astronomy, STEP will follow up well-localized GW events that are inside T80 coverage, contributing to the discovery of kilonovae candidates and all the physics that comes within.Using a data reduction pipeline for difference imaging analysis, we successfully found and reported the transient AT 2022rri and also recovered the known SN 2022tiv when S-PLUS was visiting the same field, showing our capability in discovering new transients.We estimate whether this is comparable to current wide-field surveys such as ZTF (Bellm et al. 2019) by noting that the average number of ZTF transient discoveries per area of sky surveyed is ≈0.016 transients deg −2 .Our survey covered 1272 deg 2 with a slightly shallower depth on average than ZTF of 0.5 mag, from which we estimate an expected 1272 × 0.016 × 2.512 −0.5 = 12.84 ± 3.58 transient discoveries.Given that our discoveries are not at this rate yet, we expect that improvements in our template strategy, calibration, and machine learning methods will yield a much greater efficiency comparable to modern time-domain surveys.

STEP as a GW Follow-up Program
With the current sensitivity on LVK detectors, new NS mergers are being discovered at distances far exceeding that of 40 Mpc, consistent with the projection of a binary NS merger range >130 Mpc during O4 (Abbott et al. 2020).With the vastly increased search volume, many events will require coordinated follow up between multiple facilities to search the entire two-dimensional localization region to a depth consistent with kilonovae at the luminosity distance provided by the LVK.Moreover, optimizing search strategy in real time requires candidate vetting and follow up and reporting on rapid timescales such that kilonovae can be rapidly identified and observed with other facilities.Our GW strategy (Section 7) and participation in community-wide coordination efforts via NASA GCN (Santos et al. 2023) as well as the GW Treasure Map (Wyatt et al. 2020) will maximize the impact of STEP's search and follow up capabilities.
Throughout O4 and future LVK observing runs, the large localization regions of new compact object mergers will continue to tax the follow up capabilities of large footprint surveys such as ZTF (which is also restricted to  ≳ −30 • ; Coughlin et al. 2019;Graham et al. 2020;Kasliwal et al. 2020) andDECam (Cowperthwaite et al. 2017;Soares-Santos et al. 2017;Andreoni et al. 2019;Goldstein et al. 2019;Garcia et al. 2020;Morgan et al. 2020).By coordinating T80S with these and other GW follow up facilities and optimizing for the science cases outlined above, we will maximize the impact of T80S observations for future multi-messenger events.

The Next STEP in the Rubin Era
With first light at the Rubin Observatory planned in 2024, thousands of new transients will be discovered per night reaching a typical per visit depth of  = 24.5 mag (Ivezić et al. 2019).However, the current nominal strategy will leave ∼20 day gaps in the single-filter light curves for Rubin-discovered transients and saturate on sources at  ≲ 17 mag.This leaves significant room for small aperture, widefield surveys to fill in the light curves of Rubin transients and observe those that peak at brighter magnitudes.
With a complementary filter set to Rubin and located at a similar observing site, STEP is ideally situated to obtain light curves of Rubin-discovered transients at <21 mag.In particular, transients discovered within the S-PLUS main survey footprint have the benefit of multi-band host galaxy information and photometric redshifts to set the basic energy scale of the transient emission (Mendes de Oliveira et al. 2019).Combining this host galaxy information for photometric classification of transients has been shown to significantly improve the accuracy of transient classifiers (Baldeschi et al. 2020;Gagliano et al. 2021).Although Rubin will eventually produce its own photometric redshift catalogs, this will take time and Rubin lacks narrow band UV filters used in S-PLUS that are optimized for photometric redshifts in the local Universe (i.e., at  < 0.1).By focusing on transient searches within the S-PLUS volume, STEP can leverage its ability to obtain high-cadence, multi-band light curves to maximize the science return for nearby Rubin-discovered transients.

Figure 1 .
Figure1.A schematic overview of the reduction pipeline.The raw image from S-PLUS goes through pre-processing like overscan, bias and flat field correction, and satellite track removals.We retrieve those images and initiate our reduction pipeline for further analysis.Our pipeline goes through an astrometric solution of all retrieved images.We then proceed to masking of saturated stars, diffraction spikes, and sky background (noise).After the masking stages, We resized all images to a standard of 10.000 x 10.000 points.Finally, our images go through PSF photometry and Flux calibration.The template images go through similar stages (with the exception of diffraction spikes masking and astrometric solution because those steps are not needed in this case.In the end, we perform the difference imaging as a final image product.

Figure 2 .
Figure 2. SNR distribution per randomly selected image to build an artifact dataset for the machine learning algorithm.

Figure 3 .
Figure 3. Top row: A search image (top left), template image (top middle), and difference between the search and template image highlighting the position of a source of transient emission for a real transient source.Second row: Same as the top, but for a transient embedded in a compact host galaxy.Third row: A bad subtraction for a poorly aligned star, resulting in anomalous sources of emission in the difference image.Bottom row: A diffraction spike that was not fully masked in the science image, resulting in an extended source of emission in the difference image.

Figure 4 .
Figure 4. Confusion Matrix for the CNN responsible for filtering possible artifacts on our final stages.

Figure 5 .
Figure 5. Receiver operator characteristic curve for the trained network using the test sample.The area under the curve (AUC) is 0.992.
and STEP follow up, and host galaxy redshifts (such as those from the S-PLUS Main Survey; Mendes de Oliveira et al. 2019) provide significant leverage in photometric classification, especially once classifiers are trained on full  light curves.In the era of Rubin, transient follow up surveys such as STEP can help increase the number of photometrically

Figure 6 .
Figure 6.Precision-Recall (PR) Curve for the trained model using the test sample.The area under the curve (AUPRC) is 0.99.

Figure 7 .
Figure 7. From left to right: Template image, Science Raw image, and difference image from our candidate at2022rri.

Figure 8 .
Figure 8. From left to right: Template image, science raw image, and difference image from our candidate.This image corresponds to SN 2022tiv.

Figure 9 .
Figure9.STEP light curves of SN 2022crv, 2022qxu, and 2022acko as described in Section 6.We photometrically classify our targets using methods implemented in(Garretson & Milisavljevic 2023).In each case, the classifier accurately recovers the spectral type determined by optical spectroscopy and reports to the Transient Name Server.All three sources were calibrated using the Pan-STARRS catalog, and so we use the classifications trained on Pan-STARRS photometry from(Garretson & Milisavljevic 2023), indicated by the "ps1" label for each band.

Figure 10 .
Figure 10.The 90% credible sky coverage distribution over luminosity distance for O4-like BNS detected events used in this paper.The different marks point to different Discovery probability ranges.

Figure 11 .
Figure 11.Discovery Probability using a Low Telescope Time strategy with 2 passes.The first red vertical line at 190Mpc shows the current BNS Range sensitivity for LVK O4.The second line, at 330Mpc, is the limit for neutron star-black hole range sensitivity.

Table 1 .
List of targeted supernovae during STEP DR1.

Table A1 .
Light curve data for SN 2022crv.

Table A2 .
Light curve data for SN 2022acko.

Table A3 .
Light curve data for SN 2022qxu.