ABSTRACT

Known for their efficiency in analysing large data sets, machine learning-based classifiers have been widely used in wide-field sky survey pipelines. The upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will generate millions of real-time alerts every night, enabling the discovery of large samples of rare events. Identifying such objects soon after explosion will be essential to study their evolution. Using ∼5400 transients from the Zwicky Transient Facility (ZTF) Bright Transient Survey as training and test data, we develop NEEDLE (NEural Engine for Discovering Luminous Events), a novel hybrid (convolutional neural network + dense neural network) classifier to select for two rare classes with strong environmental preferences: superluminous supernovae (SLSNe) preferring dwarf galaxies, and tidal disruption events (TDEs) occurring in the centres of nucleated galaxies. The input data includes (i) cutouts of the detection and reference images, (ii) photometric information contained directly in the alert packets, and (iii) host galaxy magnitudes from Pan-STARRS (Panoramic Survey Telescope and Rapid Response System). Despite having only a few tens of examples of the rare classes, our average (best) completeness on an unseen test set reaches 73 per cent (86 per cent) for SLSNe and 80 per cent (87 per cent) for TDEs. While very encouraging for completeness, this may still result in relatively low purity for the rare transients, given the large class imbalance in real surveys. However, the goal of NEEDLE is to find good candidates for spectroscopic classification, rather than to select pure photometric samples. Our system will be deployed as an annotator on the UK alert broker, Lasair, to provide predictions of real-time alerts from ZTF and LSST to the community.

1 INTRODUCTION

Thanks to modern time-domain sky surveys, such as the Zwicky Transient Facility (ZTF; Bellm et al. 2018), Asteroid Terrestrial impact Last Alert System (ATLAS; Tonry et al. 2018), Panoramic Survey Telescope and Rapid Response System (Pan-STARRS; Chambers et al. 2016), and All-sky Automated Search for Supernovae (ASAS-SN; Shappee et al. 2014), increasing numbers of transients have been discovered, catalogued, and studied. The diversity of their spectra, and even photometric properties such as absolute magnitude, rise and decline rate, and duration, have led to the identification of new and rare classes of events. Even more encouraging is that the upcoming Legacy Survey of Space and Time (LSST; Ivezić et al. 2019) survey will significantly increase the transient discovery rate through deeper observations, wide field of view and colour information from six filters.

In recent years, novel and rare superluminous supernovae (SLSNe) and tidal disruption events (TDEs) have been intensively studied, although their intrinsic physical mechanisms remain unclear. SLSNe are around ∼10 times brighter than a Type Ia supernova (SN) and ∼100 times brighter than core-collapse SNe. Their rise time-scales of greater than ≳ 15–30 d are also longer than typical SNe (Quimby et al. 2011; Gal-Yam 2019; Nicholl 2021). The hydrogen-poor events, also termed SLSNe Type I, mostly occur in low-mass dwarf galaxies with high specific star formation rates and low metallicities (Lunnan et al. 2014; Leloudas et al. 2015; Angus et al. 2016; Perley et al. 2016; Chen et al. 2017; Schulze et al. 2017), which provides important hints for finding and identifying such events. Studying SLSNe allows researchers to fill gaps in our understanding of stellar evolution, particularly core-collapse SNe in low-metallicity environments, and explore the extreme mass loss and rotation of possible massive progenitor stars.

A TDE occurs when a star’s orbit gets close enough to be disrupted by the massive black hole (MBH) at the centre of a galaxy, leading to accretion onto the MBH with luminous emission and possibly jets (Hills 1975; Rees 1988; Gezari 2021). Such rare events provide researchers with an opportunity to conveniently investigate accretion flows on quiescent black holes (at the low end of the MBH mass distribution), with accretion rates that change by orders of magnitude on human time-scales. Compared with SNe and SLSNe, TDEs can be differentiated by their locations in centres of their host galaxies as well as light curves that show a constant temperature. This provides helpful information for machine learning algorithms to learn their unique features.

Modern research on transients is mainly conducted on their spectra in the frequency domain and photometric information in the time domain. Spectra are essential to reveal their chemical compositions and physical properties (mass, velocity, redshift, etc.). However, as photometric images require much shorter exposure time than spectra, they are preferred for observing missions that pursue large night sky coverage and long-term repeated detection. As the number of transients discovered in wide-field imaging sky surveys grows exponentially, it is no longer possible to obtain spectra for most transients due to the expensive exposure times required.

The Vera C. Rubin Observatory is planning to conduct LSST starting in 2025 (Ivezić et al. 2019). LSST will observe the whole Southern sky and part of the Northern sky, including a Wide-Fast-Deep field (90 per cent of the observing time) with seasonal cadence and a Deep-Drilling field with dense and deep detection. Alert brokers, such as the UK alert broker Lasair (Smith et al. 2019), will provide researchers with real-time (within minutes to days) access to transient data. LSST is predicted to detect about 10 million transient alerts (defined as detections of time-varying flux) per night (Kantor 2014). These alerts will include ∼104 SLSNe (Villar, Nicholl & Berger 2018) and 3500–8000 TDEs (Bricman & Gomboc 2020) per year. However, the number of conventional SNe detected each year will be ≳ 106, meaning that only a small fraction of events will ever be observed spectroscopically. It is therefore essential to identify the most interesting candidates photometrically, in order to prioritize them for spectroscopy.

Machine learning algorithms will play an important role in classifying and filtering these alerts in real time. This project aims to build up a hybrid classifier that fully takes advantage of various machine learning algorithms and combines different astronomical resources to identify candidate rare transients, such as SLSNe and TDEs, at or before their luminosity peak. For this reason, we are motivated to use only the properties available at the time of an early photometric detection: the early light curve, the associated discovery and reference images, and any catalogued host galaxy, but no information (such as redshift) that would require additional observations. We call this classifier the NEural Engine for Discovering Luminous Events (NEEDLE).

The paper outline is below: Section 2 reviews some of the existing techniques in machine learning classification and why SLSNe and TDEs are promising targets. Section 3 illustrates the data sources from ZTF Bright Transient Survey (BTS) and Pan-STARRS, and analyses the correlations between different features and transients. Section 4 describe the image and metadata pre-processing methods, including a binary classifier to assess image quality. Section 5 shows the model architecture, training and test sets, and development details. Section 6 shows the performance of the classifiers by confusion matrix (CM) as well as their completeness and purity diagrams, and illustrates the pipeline of NEEDLE to provide classifications publicly on Lasair. Then, Section 7 discusses the transient labelling issues, and comparisons with currently popular classifiers, and difficulties and improvements. Finally, Section 8 is the summary of this paper.

2 CONTEXTUAL AND MACHINE LEARNING CLASSIFICATION

2.1 The host galaxy matters

The environment of transients have shown strong correlations with transients properties. For example, the rates of typical Type Ia and core-collapse SNe scale with host galaxy stellar mass (Sullivan et al. 2006; Li et al. 2011). The relative fractions of different SN types vary between galaxies with different masses (Graur et al. 2017) and star formation rates (Botticella et al. 2017). The locations within their hosts vary, with some types of SNe also showing strong preferences for occurring in the brightest or bluest parts of their hosts (Fruchter et al. 2006; Kelly & Kirshner 2012; Blanchard, Berger & Fong 2016). The rare transient classes that we are interested in, SLSNe and TDEs, are prime candidates for selection via their environments, as each shows strong biases in their host galaxies.

SLSNe are very unusual in that they show a strong preference (shared only by long gamma-ray bursts) for dwarf star-forming galaxies. SLSN samples also show a high fraction of irregular or interacting galaxies (Chen et al. 2017; Ørum et al. 2020), but overall occur in low-density environments rather than groups or clusters (Cleland, McGee & Nicholl 2023). The locations of SLSNe within their hosts broadly track an exponential disc profile, but many events also occur at large offsets or in regions of low UV flux (Hsu et al. 2023).

TDEs, like active galactic nuclei (AGNs), occur at the centres of galaxies hosting MBHs. However, TDEs are rarely observed in galaxies with masses above ∼few × 1010 M (van Velzen et al. 2021; Ramsden et al. 2022), since for very MBHs the disruption occurs inside the event horizon. TDE hosts in particular show a large over-representation of recently quenched (French, Arcavi & Zabludoff 2016) galaxies with green colours (Hammerstein et al. 2023; Yao et al. 2023). Compared to typical galaxies, their light profiles tend to be strongly peaked towards the nucleus (Law-Smith et al. 2017; Graur et al. 2018).

Some existing codes employ the context of where a transient appears to aid classification. For example, sherlock, applied on Lasair is an integrated massive database system that classifies transients by cross-matching the position of a transient with all major astronomical catalogues (Smith et al. 2020). By associating transients with galaxies, galaxy nuclei, known AGN, variables or very bright stars, sherlock provides a top-level classification of any transient as a likely SN, nuclear transient, AGN, etc. Similarly, using contextual information, the Automatic Learning for the Rapid Classification of Events (ALeRCE) Stamp Classifier takes the first images and alert metadata for an object to provide a preliminary classification of AGN, SN, variable star, asteroid or bogus (Carrasco-Davis et al. 2021). Baldeschi et al. (2020) present a Random Forest (RF) classifier for galaxy classification based on recent star formation history and morphology, and applies it to the hosts of core-collapse and thermonuclear SNe, indicating that the colours and shapes of hosts can help the separation between two classes, better than random guessing.

Other codes go further and attempt to predict the spectroscopic subtype of transient. Foley & Mandel (2013) and Kisley et al. (2023) use purely host galaxy photometry to provide the probabilities of different types of SNe. ghost (Gagliano et al. 2021) employs a novel gradient ascent method to find the associated host galaxies, and based on the features of hosts and angular offset, they apply an RF to distinguish SLSNe, Type Ia SNe, and core-collapse SNe. Gagliano et al. (2023) take host properties and light curves as inputs to classify SNe Ia, SNe II, and SNe Ib/c, and obtain increasing accuracy with later phases. Catalogs of post-starburst galaxies can be used to pre-select for possible TDEs (French & Zabludoff 2018). In summary, different transients have unique preferences for where they occur, and these can help reveal their likely nature.

2.2 Machine learning architectures on transient classification

Machine learning has been widely applied to astrophysical transients for classification tasks, such as BDT (SNGuess, Miranda et al. 2022; Avocado, Boone 2019; sherlock, Smith et al. 2020), RF (FLEET, Gomez et al. 2020; Baldeschi et al. 2020; ALeRCE light-curve classifier, Sánchez-Sáez et al. 2021), and neural networks. Particularly, deep learning algorithms (deep neural networks) have shown powerful performance in extracting features of data to improve classification without manual feature selection.

Recurrent neural networks (RNN) are capable of learning the correlations among close and distant time-steps among time-series and are designed for classification, modelling, and prediction. RNNs can extract features from light curves of different classes of transients to distinguish them. Examples of such codes include Real-time Automated Photometric IDentification (RAPID, Muthukrishna et al. 2019), SuperRAENN (Villar et al. 2020), Superphot (Hosseinzadeh et al. 2020), Classifier for the Gravitational-wave Optical Transient Observer (GOTO, Burhanudin et al. 2021), and Early-time transient Classifier (Gagliano et al. 2023). Attention mechanism has also been applied, such as TimeModAttn (Pimentel, Estévez & Förster 2022).

On the other hand, convolutional neural networks (CNN) are mainly designed for visual imagery classification. They can generate feature maps of the input data while training, and attempt to associate these features with class labels. Image-based classifiers have not yet attained the widespread use of light-curve classifiers, but experiments to date have shown that this approach is a very promising alternative, as it can take into account the transient position and host galaxy morphology, as discussed in Section 2.1. Transients researchers have been implementing CNNs for transient classification with codes such as ALeRCE (Carrasco-Davis et al. 2021), DELIGHT (Förster et al. 2022), and recent work on light curves by Burhanudin & Maund (2023), proving that CNNs are able to achieve high accuracy in identifying various types of transients.

The above architectures have also shown promising performance in classifying rare events like SLSNe and TDEs. For SLSNe, classifiers using light curves achieve completeness ∼0.69–0.83, and in one case up to 1.00 completeness (Muthukrishna et al. 2019; Sánchez-Sáez et al. 2021; Qu & Sako 2022). For TDEs, existing codes achieve 0.40 completeness (Gomez et al. 2023) at early phases, or better than 0.80 with full light curves (Stein et al. 2024). A detailed review is shown in Table B1.

However, there are still difficulties and limitations remaining. CNNs may struggle with low image quality due to issues of signal-to-noise, resolution, bright nearby objects or detector cosmetics, resulting in mislabelling. Light curves with sparse cadences and few observations are difficult for RNNs to extract the correlations. For training and test data sets, the number of spectroscopically confirmed SLSNe and TDEs make up only 1 per cent–2 per cent of all transient samples, leading to possible underrepresented learning. Many classifiers, such as Hložek et al. (2023), have been trained on simulated data sets (e.g. PlasTiCC; Kessler et al. 2019), which avoids the difficulties that must be overcome when dealing with real data. Finally, any classifiers that require redshift information or the declining part of a light curve may not be suitable for early-time classification.

Novel architectures are required to gain better accuracy. In recent years, hybrid neural networks have become more popular. CNNs with artificial neural networks (ANN, fully connected neural networks) are able to fully use images and metadata (position, redshift, etc) together to provide high accuracy predictions (e.g. GaZNets, Li et al. 2022; ALeRCE, Carrasco-Davis et al. 2021). Other architectures, such as transformers like ASTROMER (Donoso-Oliva et al. 2023), use an Autoencoder with positional embedding and self-attention blocks to gain the representation of transients’ light curves, which can be further applied to classification and modelling.

In short, more deep learning applications for astronomical study are expected to digest multivariate data. This might include magnitudes or fluxes in the time dimension, images (in one or more filters), and contextual information from existing catalogues. Our goal here is to take the first steps in realizing a hybrid classifier that tries to maximize the information used from images, simple light-curve features, and host galaxy features, and apply this to the case of finding SLSNe and TDEs in wide-field surveys.

3 DATA SET

For this project, we require a training and test set of transients with known classes (based on spectroscopic classifications). In this section, we outline the sources of data used to train and validate our code.

3.1 ZTF bright transients database

Although our ultimate goal is to develop a classifier for LSST, for our initial training and test set before that survey begins we use the ZTF BTS (Bellm et al. 2018; Fremling et al. 2020; Perley et al. 2020). The ZTF public survey covers the entire Northern sky to a depth of ≈20–20.5 mag every 2–3 nights in g and r filters. The BTS has been spectroscopically classifying all ZTF-detected SNe brighter than ≈19 mag since June of 2018. We choose this data set as it is the largest homogeneous set of labelled transients available, and the data are comparable to LSST in terms of imaging cadence and the format of the real-time alerts.

We downloaded the entire ZTF BTS sample brighter than 19 mag, up to 2022 March, and use this as the basis of our sample. This contains 5703 spectroscopically classified transients. Information, such as ZTF object ID, coordinates, discovery date, and spectroscopic type can be found from ZTF Bright Transient Survey Sample Explorer.1 After removing duplicates and missing objects in the ZTF database, 5388 ZTF objects are obtained. This includes over 5000 SNe, but only 37 SLSNe and 18 TDEs. We therefore supplement the BTS data set with any SLSNe or TDEs published in ZTF sample papers. This includes TDEs from van Velzen et al. (2021) and Hammerstein et al. (2023), and SLSNe from Chen et al. (2023). Given that some of these objects are already in the BTS data, our total numbers of SLSNe and TDEs are 87 and 64, respectively.

All transients fall into five general categories, shown in Table A1:

  • ‘Common’ supernovae (including all spectroscopic Type Ia, core-collapse, and interacting SNe).

  • Superluminous supernovae (considering here only the hydrogen-poor SLSNe Type I).

  • Tidal disruption events.

  • Possible SNe/transients of ambiguous nature (calcium-rich, gap transients).

  • Non-SN (novae and stellar outbursts).

The latter two categories make up only ∼1 per cent of the sample, and can generally be filtered out by their fast light curves before being passed to the machine learning classifier. We do not include them in our training or test set, but include them in the Appendix B for completeness. The first category is very broad, containing 97.2  per cent of events. However, as the task of NEEDLE is to distinguish among SNe, TDEs, and SLSNe, we avoid subdividing the SN class so that more attention can be focused on the rare classes of interest. This is the first version of NEEDLE, our aim is that future versions with improved architecture and more data will be able to perform more fine-grained classification of the various SN subtypes. Table 1 provides counts of objects with image and magnitude data from ZTF, as well as the numbers that also have cataloged host data in g and r bands from deeper surveys.

Table 1.

The number of objects in each class with images and light-curve information from ZTF, as well as those with host galaxy matches in existing catalogues from sherlock. The information is provided separately for the g and r bands.

Bandgr
LabelSNSLSNTDESNSLSNTDE
Object5185806252378764
Object & Host4959376250164164
Bandgr
LabelSNSLSNTDESNSLSNTDE
Object5185806252378764
Object & Host4959376250164164
Table 1.

The number of objects in each class with images and light-curve information from ZTF, as well as those with host galaxy matches in existing catalogues from sherlock. The information is provided separately for the g and r bands.

Bandgr
LabelSNSLSNTDESNSLSNTDE
Object5185806252378764
Object & Host4959376250164164
Bandgr
LabelSNSLSNTDESNSLSNTDE
Object5185806252378764
Object & Host4959376250164164

3.2 Images

We wrote python scripts to download ZTF cutout images centered at the transient positions, using the ZTF image database Application Programming Interface (API). For each ZTF object, starting from its discovery date to 200 d later, we downloaded all available images in the g and r bands. This includes: the science image – the image taken in each visit, containing the transient flux; the reference image – a stacked image from before the event’s discovery, providing a template of the host galaxy and surrounding field; and the difference image – the subtraction of the above two images, containing only transient flux. The requested image size is 1 arcmin. This size is large enough to include most host galaxies, while larger images would include more unrelated sources in the field. Fig. 1 shows examples of ZTF images obtained for three classes of transients we consider.

Examples of the science, reference, and difference images of three general classes: SLSN, TDE, and SN.
Figure 1.

Examples of the science, reference, and difference images of three general classes: SLSN, TDE, and SN.

3.3 Image metadata

We label each image with metadata including ZTF object ID, class label (SN, SLSN, or TDE), RA, Dec., image size, and date. We retrieve separately for each filter the start and end Julian dates where the object is detected. This information is stored in a JSON file for each object. Although we download all images and their associated metadata for these objects, we find better performance in training when we give our network only one image per object, therefore when training the model we use the image metadata to select one image from close to the time of light-curve peak.

3.4 Light-curve metadata

For each object, we also retrieve its photometry for each available detection through Lasair. Our aim is to include some simple light-curve parameters (features) as additional data for our classifier. We use the Lasair API to query the light curve using the ZTF object ID. After cross-matching with the image data, we found that not every image has a corresponding magnitude, as Lasair contains only the public ZTF photometry. Although more light-curve data would be available by querying the ZTF forced photometry, using the data from Lasair ensures consistent formatting between our training data set and future real-time alert classifications we wish to perform with our trained model. This light-curve data (detection dates and magnitudes) is appended to the metadata file for each object. The simple features extracted are listed in Table 2. For a given detection, NEEDLE takes only the latest science and reference images and the light-curve features from the date of discovery up to the time of detection, that is, it does not have access to information from the complete (future) light curve. This applies also in our training process, to be consistent with the procedure on real-time alerts, as our goal is early classification.

Table 2.

Summary of light curve and host galaxy features included in our metadata.

MetadataFeatureDefinition
Transients alertsmpeakPeak magnitude among all existing observations
mdiscoveryMagnitude of the discovery observation
ΔTdiscoveryTime difference between current observation and discovery observation
ΔmdiscoveryMagnitude difference between current observation and discovery observation
|$\frac{\Delta m_{\mathrm{ discovery}}}{\Delta T_{\mathrm{ discovery}}}$|Ratio of the magnitude difference over time difference since discovery
|$\frac{\Delta m_{\mathrm{ recent}}}{\Delta T_{\mathrm{ recent}}}$|Ratio of the magnitude difference over time difference since last detection
Host galaxymgApHost magnitude in g band
mrApHost magnitude in r band
miApHost magnitude in i band
mzApHost magnitude in z band
myApHost magnitude in y band
mgrHost magnitude difference between g and r bands
mriHost magnitude difference between r and i bands
separationArcsecThe distance between the host centre and transient on the image
Δmhost − peakMagnitude difference between the host magnitude and Mpeak
MetadataFeatureDefinition
Transients alertsmpeakPeak magnitude among all existing observations
mdiscoveryMagnitude of the discovery observation
ΔTdiscoveryTime difference between current observation and discovery observation
ΔmdiscoveryMagnitude difference between current observation and discovery observation
|$\frac{\Delta m_{\mathrm{ discovery}}}{\Delta T_{\mathrm{ discovery}}}$|Ratio of the magnitude difference over time difference since discovery
|$\frac{\Delta m_{\mathrm{ recent}}}{\Delta T_{\mathrm{ recent}}}$|Ratio of the magnitude difference over time difference since last detection
Host galaxymgApHost magnitude in g band
mrApHost magnitude in r band
miApHost magnitude in i band
mzApHost magnitude in z band
myApHost magnitude in y band
mgrHost magnitude difference between g and r bands
mriHost magnitude difference between r and i bands
separationArcsecThe distance between the host centre and transient on the image
Δmhost − peakMagnitude difference between the host magnitude and Mpeak
Table 2.

Summary of light curve and host galaxy features included in our metadata.

MetadataFeatureDefinition
Transients alertsmpeakPeak magnitude among all existing observations
mdiscoveryMagnitude of the discovery observation
ΔTdiscoveryTime difference between current observation and discovery observation
ΔmdiscoveryMagnitude difference between current observation and discovery observation
|$\frac{\Delta m_{\mathrm{ discovery}}}{\Delta T_{\mathrm{ discovery}}}$|Ratio of the magnitude difference over time difference since discovery
|$\frac{\Delta m_{\mathrm{ recent}}}{\Delta T_{\mathrm{ recent}}}$|Ratio of the magnitude difference over time difference since last detection
Host galaxymgApHost magnitude in g band
mrApHost magnitude in r band
miApHost magnitude in i band
mzApHost magnitude in z band
myApHost magnitude in y band
mgrHost magnitude difference between g and r bands
mriHost magnitude difference between r and i bands
separationArcsecThe distance between the host centre and transient on the image
Δmhost − peakMagnitude difference between the host magnitude and Mpeak
MetadataFeatureDefinition
Transients alertsmpeakPeak magnitude among all existing observations
mdiscoveryMagnitude of the discovery observation
ΔTdiscoveryTime difference between current observation and discovery observation
ΔmdiscoveryMagnitude difference between current observation and discovery observation
|$\frac{\Delta m_{\mathrm{ discovery}}}{\Delta T_{\mathrm{ discovery}}}$|Ratio of the magnitude difference over time difference since discovery
|$\frac{\Delta m_{\mathrm{ recent}}}{\Delta T_{\mathrm{ recent}}}$|Ratio of the magnitude difference over time difference since last detection
Host galaxymgApHost magnitude in g band
mrApHost magnitude in r band
miApHost magnitude in i band
mzApHost magnitude in z band
myApHost magnitude in y band
mgrHost magnitude difference between g and r bands
mriHost magnitude difference between r and i bands
separationArcsecThe distance between the host centre and transient on the image
Δmhost − peakMagnitude difference between the host magnitude and Mpeak

3.5 Host galaxy metadata

The sherlock software package (Smith et al. 2020) is integrated into Lasair, and automatically provides a contextual classification by cross-matching with a library of historical and ongoing sky survey catalogues. This provides preliminary classifications as transients, variables, artefacts, etc., based on association with nearby galaxies, known cataclysmic variable stars, AGNs, or bright stars. For this project, we query the sherlock table on Lasair to find the coordinates of the most likely host galaxy for each of our transients.

We use these coordinates to retrieve host galaxy magnitudes from Data Release 2 (DR2) of the Pan-STARRS survey. Pan-STARRS is a wide-field imaging system that observes 30 000 deg2 of the Northern sky in five broadband filters (g, r, i, z, and y). The stacked depth is up to 23.3, 23.2, 23.1, 22.3, and 21.3 mag, respectively (Chambers et al. 2016; Flewelling et al. 2020). The Pan-STARRS footprint completely overlaps the ZTF coverage, and DR2 contains data taken between 2010 and 2014, prior to the ZTF survey. Therefore, Pan-STARRS should contain all ZTF transient host galaxies brighter than ∼23 mag.

Possible hosts were found by sherlock for most transients, but about half of the SLSN hosts are missed, as they are likely fainter than the Pan-STARRS DR2 limiting magnitude. This is unsurprising given that most SLSNe explode in distant dwarf galaxies. We used the Pan-STARRS DR2 API to obtain the Aperture magnitude in g, r, i, z, and y bands. The colour of the host can be measured by gr, and ri, and is correlated with the age and star formation rate of the stellar population.

The full list of metadata used in this study is given in Table 2.

3.6 Data analysis

Before building a model, we check for correlations or clusters within our metadata. Fig. 2 shows simple features obtained from the ZTF r-band light curves. We show the apparent magnitude around the peak mpeak, the magnitude contrast between transient and host, the elapsed time between first detection and light-curve peak, as well as a measure of the light-curve slope during the rise, the ‘rising ratio’ |$\frac{m_{\rm peak}-m_{\rm discovery}}{t_{\rm peak}-t_{\rm discovery}}$|⁠.

Corner plot showing distributions of selected light-curve features in the data set. To show the difference between distributions more clearly, the light-curve slope (mpeak − mdisc/(tpeak − tdisc) and rise time tpeak − tdisc have been scaled logarithmically.
Figure 2.

Corner plot showing distributions of selected light-curve features in the data set. To show the difference between distributions more clearly, the light-curve slope (mpeakmdisc/(tpeaktdisc) and rise time tpeaktdisc have been scaled logarithmically.

It can be seen that the distributions of SLSN and TDE apparent magnitudes in our sample skew dimmer than the distribution for other SNe. This is due in part to the lack of nearby events in these rare classes. The need to include examples from outside of the magnitude-limiting BTS sample may also bias these events towards fainter magnitudes, but is unavoidable given the class size imbalance.

SLSNe show a much larger contrast at peak with their host galaxies, standing out from SNe and TDEs. Moreover, their rising time-scales are longer than other transients. Normal SNe show the fastest rise, with TDEs showing a broad distribution peaking in between the other classes. The median SLSN rising ratio is similar to TDEs, but the deviation is smaller. To compare some of the key parameters more clearly, Fig. 3 presents the cumulative distributions of host galaxy contrast (mpeakmhost) and approximate rise time (tpeaktdiscovery), where the three classes show clear differences.

Cumulative distribution functions for transient contrast with host galaxy mpeak − mhost and approximate rise time tpeak − tdiscovery.
Figure 3.

Cumulative distribution functions for transient contrast with host galaxy mpeakmhost and approximate rise time tpeaktdiscovery.

Similarly, Fig. 4 shows a corner plot for host galaxy metadata, including magnitudes and colours in g, r, and i bands, and the offset in arcseconds between the transient coordinates and the host galaxy centroid. Again the plot shows that SLSNe tend to have the faintest hosts, with a slight bias to bluer colours in gr. As expected, the host offset for most TDEs is clustered around ∼0.0–1.0 arcsec. SLSNe, with their compact hosts, tend to show small offsets of a few arcseconds, whereas the distribution is much broader for typical SNe in extended galaxies.

Corner plot showing host galaxy magnitudes in g, r, and i bands, as well as galaxies' colours in g − r and r − i. We also show the offset, meaning the projected distance between the transient and its likely host in arcseconds.
Figure 4.

Corner plot showing host galaxy magnitudes in g, r, and i bands, as well as galaxies' colours in gr and ri. We also show the offset, meaning the projected distance between the transient and its likely host in arcseconds.

4 DATA PRE-PROCESSING

In this section, we illustrate the steps used to clean and prepare our data set before training our model.

4.1 Image pre-processing

Some of the images we obtained from the ZTF database were found to have irregular sizes, shapes, and missing pixels. Such issues can be caused by a transient position close to the edge of the detector field of view, or nearby bright stars that are masked out (but can still leave diffraction spikes or subtraction artefacts). Examples are shown in Fig. 5. These poor quality images can severely impact the training process. Therefore, there are a range of ways to identify, modify or delete them before training.

Bad and good images samples with the quality check classifier predictions. Images in the first two rows are judged as ‘bad images’ and removed before being fed to NEEDLE. Image in the last two rows are judged as 'good images' and fed to NEEDLE.
Figure 5.

Bad and good images samples with the quality check classifier predictions. Images in the first two rows are judged as ‘bad images’ and removed before being fed to NEEDLE. Image in the last two rows are judged as 'good images' and fed to NEEDLE.

4.1.1 Image size cutout

An image with a shape slightly smaller than 60 × 60 pixels (e.g. 58 × 58 pixels) will be expanded to 60 × 60 pixels by repeating the last row or column on each side. However, for those with very small sizes, they will be removed. On the other hand, those larger than 60 × 60 pixels are reduced to 60 × 60 size.

4.1.2 Quality check model

Images with missing or unreliable pixels are tricky to deal with, and those bad images greatly harm the training process. One common feature is that such images often have very large standard deviations (σ), much larger than normal images. However, our experiments showed that a quality cut based only on σ still cannot get rid of a small number of problematic images that have reasonable standard deviations. Therefore, a binary CNN is developed to determine whether an image is good or bad.

First, we label those image with σ > 1000 as ‘bad’, and manually label a number of bad-quality samples (in g and r bands). We label the others as ‘good’. Then we feed them into a simple two-layer CNN classifier for training and testing. We use a sample of 2207 objects for training this model, 9.3 per cent of which are bad images. We shuffle this set and randomly choose 200 objects as the validation set.

The outputs give the probability of being a good image, shown in Fig. 5. Those good-quality images with a confidence greater than 0.5 are allowed for further processing, and those bad images are excluded. Figs 6(a) and (b) show the CM and receiver operating characteristic curve (ROC). The closer the curve is to the upper left corner, the more accurate the classifier is. The model rejects 98.4 per cent of bad images, and so we apply it as the first stage of data pre-processing process. In following experiments, about 12 peak images of ZTF objects are removed, taking 0.22  per cent of the whole image set.

Results of our image quality classifier to reject images with large numbers of missing pixels. An image is assigned a binary classification as ‘Good’ or ‘Bad’.
Figure 6.

Results of our image quality classifier to reject images with large numbers of missing pixels. An image is assigned a binary classification as ‘Good’ or ‘Bad’.

4.1.3 Z-scaling and normalization

Astronomical pixel data can span a large dynamic range within a single image, which can cause problems for classifiers that need to learn faint features. The iraf Z-scale algorithm,2 designed for displaying images as pixel intensity maps, is widely used to pick out features close to the background level of the image. The algorithm determines a minimum (z-min) and maximum (z-max) pixel value to display (pixels with values outside this range are displayed with the intensity at zero or saturated).

In our case, we apply the same z-scaling algorithm to replace any NaN values or anomalously faint pixels with the z-min value. We do not apply a mask for z-max, to avoid treating real features (such as a bright transient) as saturated. Min–max normalization is then applied to the scaled data, limiting the values to [0,1].

4.1.4 Data augmentation

Data augmentation is a technique to create additional artificial samples within a training set. This is particularly helpful when dealing with classes containing few examples, such as our SLSNe and TDEs. Augmentation techniques for images include resizing, random flipping (horizontally or vertically), and random rotation (between 0° and 360°, with any missing data at the edges filled with neighbouring values). It is developed as a custom layer built after the input layer for convenience. While training, the images will be randomly modified through this layer for each epoch. Flips and rotations mean that the model is not encouraged to incorrectly learn specific location or orientation features. We do not apply resizing, in order to preserve the pixel scale of the data.

4.2 Metadata pre-processing

Metadata consists of the light-curve features and the host galaxy magnitudes, colours, and offsets. Details are shown in Table 2. Currently, any missing metadata is replaced with zeros. Although a magnitude, time difference or offset of zero does have a physical meaning in this case, we find that adding zeros does not influence classifier performance in our experiments. Alternative methods will be considered in the next version of NEEDLE.

Data standardization is applied for data scaling. Every feature is assumed to follow a Gaussian distribution among all samples, and individual values are scaled by its mean and standard deviation. In this way, the model can learn different feature distributions of the three classes, individually. Such scaling data are stored with the model.

4.3 Data compression and indexing

In order to feed a large amount of pre-processed data into the classifier for training on any computing platform, one convenient method is to store and fetch the data with the hierarchical data format version 5 (HDF5, Collette 2013). This allows users to transfer data among different facilities easily, and accelerate the training time for parameter optimization. In addition, a custom index has been added to each sample participating in training and testing, which can help users easily trace their ZTF IDs, thereby assisting case studies. Here, we store the image set, metadata set, labels, and sample index set in HDF5 format. Training/test set separation is conducted after loading the HDF5 data.

5 CLASSIFIER ARCHITECTURE AND TRAINING

In this section, we introduce the design of our NEEDLE code and discuss the details of the model architecture.

We build our model within the Tensorflow Keras framework. We implement a custom class called NEEDLE that inherits from Keras.Model. This class includes the basic user-defined model functions (train, test, build, predict, loss function, etc.), as well as model plotting and model visualization.

5.1 Hybrid neural network

To fully utilize the image and metadata, a hybrid model is required. Inspired by Carrasco-Davis et al. (2021), we build up a model that involves a block of convolutional layers for image inputs and a block of fully connected layers for metadata inputs.

Fig. 7 shows the model architecture. The image block consists of a data augmentation layer (random flipping and rotations) and two convolutional layers, each followed by a MaxPooling layer. The output of the last pooling layer is flattened into a 1D vector and fed into a fully connected dense layer with 64 neurons. The metadata block consists of two fully connected dense layers with 128 neurons each. The two types of outputs are then concatenated and fed into two dense layers (192, 32 neurons, respectively). Finally the outputs are fed into the output layer.

Model architecture for the full NEEDLE classifier. The only difference between the NEEDLE- T and NEEDLE- TH variants is the length of the metadata input.
Figure 7.

Model architecture for the full NEEDLE classifier. The only difference between the NEEDLE- T and NEEDLE- TH variants is the length of the metadata input.

Each layer uses a ReLU acitivation function. The exception to this is in the final output layer, which uses a softmax activation function to provide the probablities that an object belongs to each class.

5.2 Training and test sets

Since the samples of SLSNe and TDEs are very small compared to the large number of normal SNe, training becomes difficult when more than |$\sim 20~{{\ \rm per\ cent}}$| of them are put into the test set. Through experiments, we found that some objects are easily classified correctly with high probability, regardless of whether they are in the training set or the test set, however, some objects are difficult and return poor predictions. Therefore, a fair approach that still allows us to train our model on a reasonable number of objects is to give a unique random seed, shuffle the data set, and randomly select test objects, repeating this process and training the model several times to average the results. We choose to include 15 SLSNe, 15 TDEs, and 15 SNe in the test set each time, and repeat this process 10 times to calculate the average model performance.

5.3 Weighted loss function

We start by importing the loss function SparseCategoricalCrossentropy from Keras. This is designed for multiclass tasks. As our training data have extremely unbalanced labels and the majority are SNe, the model will naturally learn more SN features to quickly decrease the loss function, resulting in poor predictions for other classes. One solution is that we give more weight to rare labels and less weight to common labels. In this way, our model can extract features of different classes equally. Our weighted loss function is

(1)

Here N is the number of objects in one batch/epoch. Nm and nm mean the number of objects from the mth class in the batch/epoch and in the whole training set, respectively. Wm is the class weight for the mth class. yi and |$\hat{y_{i}}$| refer to the true class and the model prediction for one input, respectively.

5.4 Training optimization

To set the learning rate, we employ the ExponentialDecay method, which decreases the learning rate exponentially with growing steps while training. Equation (2) shows the algorithm. lr0 is the initial learning rate. α means the user-defined decay rate. N and n mean the training and user-defined decay steps, respectively. Through experimentation, we find acceptable performance using the following parameters: |$l_{r_0} = 0.0002$|⁠, α = 0.95, and n = 100.

(2)

An optimizer is the strategy for updating the weights and biases of neural networks, in order to help reduce the loss function to the desired minimum. For this model, we apply Adaptive Moment Estimation (Adam; Kingma & Ba 2014), which is a special stochastic gradient descent algorithm that updates weights using two exponential decay rates.

Overfitting is a non-negligible problem when training, which means that the model extracts noise3 rather than real features from the training data, resulting in high accuracy on the training set but low accuracy on the validation and test sets. To reduce the overfitting, in particular given our small training and validation sets, we use tensorflow.keras.callbacks.EarlyStopping to monitor the training process. When the loss for the validation set is not smaller than that at the previous three epochs, the training will stop. Fig. 8 shows the training and validation loss trends during the training process for 25 randomly selected NEEDLE- TH (transient + host) models (with the test set shuffled each time). The training stops when the validation loss is flat.

Loss trend of 25 randomly selected models for NEEDLE- TH classifier.
Figure 8.

Loss trend of 25 randomly selected models for NEEDLE- TH classifier.

5.5 Optimal network architecture

We have tried and adjusted a variety of architectures and parameters. Given the size of the training set and the limited information in the images, a deep network is not suitable as it will likely lead to rapid overfitting and large fluctuations in the loss function. We therefore experiment with networks with only a few CNN layers.

KerasTuner (O’Malley et al. 2019) is an easy-to-use, scalable hyperparameter optimization framework that allows users to set ranges of neurons, activation function, and learning rates. It will automatically run every combination of configurations and search for optimal solutions. We apply this method to adjust the architecture and hyperparameters in NEEDLE.

The results show that a model with two Convolutional layers (each 128 3 × 3 kernels) for image inputs, and two fully connected layers (each 64, 128 neurons) for metadata inputs, is able to perform the best predictions. The detailed architecture is shown in Fig. 7. The learning rate, batch size, and number of epochs are 3e−5, 128, and 300, respectively.

6 EXPERIMENTS AND RESULTS

In this section, we investigate model performance on the ZTF BTS sample. In particular, we aim to determine which metadata are important to include in our training and test sets, the expected purity and completeness, and how confidently we can predict the type of an object at early phases. We also present the NEEDLE pipeline that we are implementing on Lasair.

6.1 Classifier performance with and without host metadata

Initially, we train only with information available from the real-time transient alerts: the science image (here assumed close to the time of maximum light), the reference image, and the transient metadata such as magnitude and time since discovery. For convenience we call this version NEEDLE-T (for transient). We then retrain the model, this time including the cataloged host galaxy properties obtained from sherlock and Pan-STARRS, and we label this version NEEDLE- TH.

Fig. 9 shows the CM with the completeness of the three models on the test set. The prediction is decided by the maximal probability among three classes. The values given in the CM are the averages of 50 model realizations with randomly shuffled test sets (containing 15 objects per class each time, with remaining objects in the training set). The initial NEEDLE-T classifier (Fig. 9a) can recognize 80 per cent of normal SNe and 73 per cent of SLSNe in the test set on average. It is worth recalling that more than half of SLSNe in our sample do not have catalogued hosts, therefore only 41/87 SLSNe can be included in the full NEEDLE-TH model.

CM of NEEDLE-T and NEEDLE-TH (without and with host photometry) classifiers on an unseen test set. For each classifier, the random seed for initializing parameters is unique, and the test/training sets are randomly shuffled before training. The values reported in each CM are the median values for 50 realizations of this process, and the 16 per cent and 86 per cent values of each distribution across all 50 models are shown in brackets.
Figure 9.

CM of NEEDLE-T and NEEDLE-TH (without and with host photometry) classifiers on an unseen test set. For each classifier, the random seed for initializing parameters is unique, and the test/training sets are randomly shuffled before training. The values reported in each CM are the median values for 50 realizations of this process, and the 16 per cent and 86 per cent values of each distribution across all 50 models are shown in brackets.

If we train NEEDLE-T only on these objects with detected hosts (enabling a fair comparison later with NEEDLE-TH), the averaged true positives of SLSNe decrease slightly to 60 per cent, and the large range shows that the predictions are less stable with the smaller sample. This is shown in Fig. 9(b). Adding the host metadata in NEEDLE-TH (Fig. 9c) improves the performance for SLSNe to 73 per cent on average despite the smaller sample size, showing the importance of including host galaxy information. Even in the worst-performing model, at least 65 per cent SLSNe are correctly identified with the help of host magnitudes and colour information, and the highest completeness reaches 86 per cent. We also tested NEEDLE-T (model in Fig. 9a) on hostless SLSNe (hosts are identified for essentially all TDEs). The completeness is 80  per cent.

This effect is even more pronounced for TDEs. For NEEDLE-TH, the average true positive rate for TDEs grows from 60 per cent to 80 per cent with the addition of host information. While galaxy colours do differ for TDEs compared to the other transient types, more likely this improvement reflects the fact that all TDEs in the sample have a small offset because they occur in the nuclei of their hosts.

6.2 Completeness and purity

Fig. 10(a) shows the completeness and purity trends on the unseen test set with increasing probability thresholds for classification, in both NEEDLE-TH (corresponding to Fig. 9c) and NEEDLE-T (for only those objects having catalogued hosts, corresponding to Fig. 9b). Here a class is only assigned if p(class) > x for the most probable class. In each case, we show the average and standard deviation of 10 trained models.

Left: completeness and purity for NEEDLE-T and NEEDLE-TH for test sets, averaged among 50 models. Right: one versus the rest ROC for the full data set, accumulated by 50 models, with classification confidence p(class) > 0.75.
Figure 10.

Left: completeness and purity for NEEDLE-T and NEEDLE-TH for test sets, averaged among 50 models. Right: one versus the rest ROC for the full data set, accumulated by 50 models, with classification confidence p(class) > 0.75.

On average, for SLSNe we attain a completeness 74.9 per cent (57.1 per cent) and a purity 84.1 per cent (92.3 per cent) for a threshold p(SLSN) > =0.5(0.75). For TDEs, we obtain completeness 77.2 per cent (50.4 per cent) and purity 84.1 per cent (92.3 per cent). The results are fairly competitive with other popular classifiers (shown in Table B1, especially considering that we only use single images and limited light-curve information.

We note that the purity achieved with our balanced test set will likely not reflect the purity obtainable in a real survey, due to the large imbalance in rates between SLSNe/TDEs and normal SNe. Therefore, when selecting objects in real time, one may wish to choose a high probability threshold to minimize the absolute numbers of normal SNe misclassified as SLSNe or TDEs.

Fig. 11 shows CMs for transients classified with probability p(class) > =0.75. We show the completeness for NEEDLE-TH on the balanced, unseen test sets (Fig. 11a) and completeness and purity matrices for the full data set (Figs 11b and c). With p(class) > =0.75, NEEDLE-TH can correctly classify 97 per cent TDEs and 93 per cent SLSNe in the full data set. Fig. 10(b) also shows that assuming for each class is a binary classifier (one versus the rest), their ROC areas all exceed 97 per cent. However, for even just a few per cent SN contamination, this results in a real-world purity of around 20 per cent for the rare classes, showing the importance of choosing a probability threshold carefully. NEEDLE is designed to select young SLSN and TDE candidates for spectroscopic follow-up, rather than to produce large photometric samples. Therefore, a purity of a few |$\times 10~{{\ \rm per\ cent}}$| is an acceptable price for the high completeness.

CM of NEEDLE-TH restricted to objects with classification confidence p(class) > 0.75, for the test set and the full data set. The format is the same as Fig. 9. The first two CMs depict the completeness in the test set (Fig. 11a) and the full set (Fig. 11b). The third CM (Fig. 11c) shows the purity of each class in the full set, which is representative of the expected balance in real-time alerts.
Figure 11.

CM of NEEDLE-TH restricted to objects with classification confidence p(class) > 0.75, for the test set and the full data set. The format is the same as Fig. 9. The first two CMs depict the completeness in the test set (Fig. 11a) and the full set (Fig. 11b). The third CM (Fig. 11c) shows the purity of each class in the full set, which is representative of the expected balance in real-time alerts.

We also investigate the importance of including host galaxy metadata. The diagrams show that SLSNe and TDEs essentially always gain higher completeness and purity when host metadata is included.

6.3 Classification from early detections

As NEEDLE is designed to provide a probability for each label after only a few early detections, we also test the average performance of NEEDLE-TH over time since explosion by attempting to classify a time-series of pre-peak detections of 30 randomly selected objects in each class. We show the predicted p(SLSN) against time before peak for 30 SLSNe, and p(TDE) for 30 TDEs, in Fig. 12. As mentioned in Section 3.4, the photometric information is only considered from the discovery date to the selected detection date, to simulate real-time discovery.

Probability heat maps for (a) SLSNe and (b) TDEs using NEEDLE-TH, averaged by 50 models. In each class, 30 objects are randomly sampled for presentation. The x-axis is the date starting from 60 days before the peak of the event to the peak date. The y-axis is the ZTF objects’ names. The colour bar corresponds to the probability range. ZTF objects are sorted in descending order of their peak probability.
Figure 12.

Probability heat maps for (a) SLSNe and (b) TDEs using NEEDLE-TH, averaged by 50 models. In each class, 30 objects are randomly sampled for presentation. The x-axis is the date starting from 60 days before the peak of the event to the peak date. The y-axis is the ZTF objects’ names. The colour bar corresponds to the probability range. ZTF objects are sorted in descending order of their peak probability.

For most SLSNe (Fig. 12a) and TDEs (Fig. 12b), the probability assigned to the correct class grows as the events approach the peak. This is likely due to the longer baseline over which the light-curve features can be evaluated, indicating that properties such as light-curve rise time and slope and host galaxy contrast are important features in NEEDLE. This is particularly apparent in the case of SLSNe, where magnitude contrast with the host galaxy (which is maximized at light-curve peak) is also an important feature. We notice that there are a few TDEs with probabilities that decrease around peak, and predicted to be more likely an SLSN or normal SN. The reason for this is under investigation.

6.4 Performance on untouched SLSNe and TDEs

To provide a fully independent test of the performance of our model, we download the data for any new SLSNe and TDEs discovered since 2022 March. These objects have been never used in training, validating or testing the current version of NEEDLE, therefore they can present an unbiased measurement of NEEDLE’s performance. Assuming that we try to identify them before the peak, the r-band light curve up to the peak luminosity and the science and reference images for the peak detection are considered.

Table 3 shows the ensemble predictions made by the current versions of NEEDLE-T and NEEDLE-TH for these objects. From the table, it can be seen that 67.8 per cent of SLSNe, and 87.5 per cent of TDEs are predicted correctly with over 50 per cent confidence. With the threshold increasing from 0.50 to 0.75, this completeness decreases to around 50 per cent, but with a slight reduction in the rate of false negatives (i.e. a confident but incorrect classification). Overall this suggests NEEDLE has successfully learned features of SLSNe and TDEs that generalize well from our limited training set to the broader population, and that we can use the code for classifying new objects from the ZTF alert stream.

Table 3.

NEEDLE predictions on known untouched ZTF objects.

ZTFIDClass0.750.50HighestProbability
*ZTF21aaqawpdSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.988, TDE: 0.008
*ZTF22aadeuwuSLSN-IUnclearUnclearSNSN: 0.495, SLSN-I: 0.411, TDE: 0.094
*ZTF22aadqgoaSLSN-IUnclearSNSNSN: 0.622, SLSN-I: 0.066, TDE: 0.312
*ZTF22aaljlzqSLSN-IUnclearSLSN-ISLSN-ISN: 0.007, SLSN-I: 0.718, TDE: 0.275
*ZTF22aalzjdcSLSN-IUnclearSLSN-ISLSN-ISN: 0.121, SLSN-I: 0.732, TDE: 0.148
*ZTF22aapjqpnSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.009, SLSN-I: 0.887, TDE: 0.104
*ZTF22aausnwrSLSN-IUnclearUnclearTDESN: 0.277, SLSN-I: 0.225, TDE: 0.498
*ZTF22abkbmobSLSN-IUnclearSLSN-ISLSN-ISN: 0.065, SLSN-I: 0.693, TDE: 0.243
*ZTF22abvarjhSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.003, SLSN-I: 0.853, TDE: 0.144
*ZTF22abvngdrSLSN-IUnclearSLSN-ISLSN-ISN: 0.033, SLSN-I: 0.732, TDE: 0.235
*ZTF22abynkpzSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.993, TDE: 0.003
*ZTF23aagdbbvSLSN-IUnclearSLSN-ISLSN-ISN: 0.009, SLSN-I: 0.727, TDE: 0.264
*ZTF23aawhcjbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.972, TDE: 0.017
*ZTF23aaznlgbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.854, TDE: 0.135
*ZTF23abjarpvSLSN-IUnclearSLSN-ISLSN-ISN: 0.043, SLSN-I: 0.701, TDE: 0.256
*ZTF23aboebghSLSN-ISNSNSNSN: 0.813, SLSN-I: 0.146, TDE: 0.041
*ZTF23abofwbaSLSN-IUnclearTDETDESN: 0.094, SLSN-I: 0.311, TDE: 0.595
ZTF21aalkhotSLSN-IUnclearSLSN-ISLSN-ISN: 0.093, SLSN-I: 0.551, TDE: 0.355
ZTF21acrbbwiSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.000, SLSN-I: 1.000, TDE: 0.000
ZTF22aarqrxfSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.017, SLSN-I: 0.981, TDE: 0.003
ZTF22abcvfgsSLSN-IUnclearSLSN-ISLSN-ISN: 0.080, SLSN-I: 0.521, TDE: 0.399
ZTF22abvcnnlSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.103, SLSN-I: 0.893, TDE: 0.004
ZTF23aaarvxjSLSN-IUnclearSLSN-ISLSN-ISN: 0.026, SLSN-I: 0.504, TDE: 0.470
ZTF23aaccacySLSN-IUnclearSNSNSN: 0.611, SLSN-I: 0.076, TDE: 0.313
ZTF23aazodojSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.102, SLSN-I: 0.897, TDE: 0.001
ZTF23abawhqlSLSN-ISNSNSNSN: 0.999, SLSN-I: 0.001, TDE: 0.000
ZTF23abjhbcrSLSN-ITDETDETDESN: 0.087, SLSN-I: 0.069, TDE: 0.844
ZTF23abjuxsoSLSN-IUnclearUnclearSLSN-ISN: 0.220, SLSN-I: 0.450, TDE: 0.330
ZTF23aadcbayTDETDETDETDESN: 0.101, SLSN-I: 0.012, TDE: 0.887
ZTF23aamsetvTDETDETDETDESN: 0.073, SLSN-I: 0.100, TDE: 0.827
ZTF23aapyidjTDETDETDETDESN: 0.200, SLSN-I: 0.010, TDE: 0.790
ZTF23aaqdjhiTDETDETDETDESN: 0.094, SLSN-I: 0.005, TDE: 0.901
ZTF23abaujuyTDEUnclearTDETDESN: 0.363, SLSN-I: 0.034, TDE: 0.603
ZTF23abgnxfvTDEUnclearSNSNSN: 0.558, SLSN-I: 0.016, TDE: 0.426
ZTF23abkixdbTDEUnclearTDETDESN: 0.209, SLSN-I: 0.074, TDE: 0.717
ZTF23abohtqfTDETDETDETDESN: 0.199, SLSN-I: 0.031, TDE: 0.770
ZTFIDClass0.750.50HighestProbability
*ZTF21aaqawpdSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.988, TDE: 0.008
*ZTF22aadeuwuSLSN-IUnclearUnclearSNSN: 0.495, SLSN-I: 0.411, TDE: 0.094
*ZTF22aadqgoaSLSN-IUnclearSNSNSN: 0.622, SLSN-I: 0.066, TDE: 0.312
*ZTF22aaljlzqSLSN-IUnclearSLSN-ISLSN-ISN: 0.007, SLSN-I: 0.718, TDE: 0.275
*ZTF22aalzjdcSLSN-IUnclearSLSN-ISLSN-ISN: 0.121, SLSN-I: 0.732, TDE: 0.148
*ZTF22aapjqpnSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.009, SLSN-I: 0.887, TDE: 0.104
*ZTF22aausnwrSLSN-IUnclearUnclearTDESN: 0.277, SLSN-I: 0.225, TDE: 0.498
*ZTF22abkbmobSLSN-IUnclearSLSN-ISLSN-ISN: 0.065, SLSN-I: 0.693, TDE: 0.243
*ZTF22abvarjhSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.003, SLSN-I: 0.853, TDE: 0.144
*ZTF22abvngdrSLSN-IUnclearSLSN-ISLSN-ISN: 0.033, SLSN-I: 0.732, TDE: 0.235
*ZTF22abynkpzSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.993, TDE: 0.003
*ZTF23aagdbbvSLSN-IUnclearSLSN-ISLSN-ISN: 0.009, SLSN-I: 0.727, TDE: 0.264
*ZTF23aawhcjbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.972, TDE: 0.017
*ZTF23aaznlgbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.854, TDE: 0.135
*ZTF23abjarpvSLSN-IUnclearSLSN-ISLSN-ISN: 0.043, SLSN-I: 0.701, TDE: 0.256
*ZTF23aboebghSLSN-ISNSNSNSN: 0.813, SLSN-I: 0.146, TDE: 0.041
*ZTF23abofwbaSLSN-IUnclearTDETDESN: 0.094, SLSN-I: 0.311, TDE: 0.595
ZTF21aalkhotSLSN-IUnclearSLSN-ISLSN-ISN: 0.093, SLSN-I: 0.551, TDE: 0.355
ZTF21acrbbwiSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.000, SLSN-I: 1.000, TDE: 0.000
ZTF22aarqrxfSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.017, SLSN-I: 0.981, TDE: 0.003
ZTF22abcvfgsSLSN-IUnclearSLSN-ISLSN-ISN: 0.080, SLSN-I: 0.521, TDE: 0.399
ZTF22abvcnnlSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.103, SLSN-I: 0.893, TDE: 0.004
ZTF23aaarvxjSLSN-IUnclearSLSN-ISLSN-ISN: 0.026, SLSN-I: 0.504, TDE: 0.470
ZTF23aaccacySLSN-IUnclearSNSNSN: 0.611, SLSN-I: 0.076, TDE: 0.313
ZTF23aazodojSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.102, SLSN-I: 0.897, TDE: 0.001
ZTF23abawhqlSLSN-ISNSNSNSN: 0.999, SLSN-I: 0.001, TDE: 0.000
ZTF23abjhbcrSLSN-ITDETDETDESN: 0.087, SLSN-I: 0.069, TDE: 0.844
ZTF23abjuxsoSLSN-IUnclearUnclearSLSN-ISN: 0.220, SLSN-I: 0.450, TDE: 0.330
ZTF23aadcbayTDETDETDETDESN: 0.101, SLSN-I: 0.012, TDE: 0.887
ZTF23aamsetvTDETDETDETDESN: 0.073, SLSN-I: 0.100, TDE: 0.827
ZTF23aapyidjTDETDETDETDESN: 0.200, SLSN-I: 0.010, TDE: 0.790
ZTF23aaqdjhiTDETDETDETDESN: 0.094, SLSN-I: 0.005, TDE: 0.901
ZTF23abaujuyTDEUnclearTDETDESN: 0.363, SLSN-I: 0.034, TDE: 0.603
ZTF23abgnxfvTDEUnclearSNSNSN: 0.558, SLSN-I: 0.016, TDE: 0.426
ZTF23abkixdbTDEUnclearTDETDESN: 0.209, SLSN-I: 0.074, TDE: 0.717
ZTF23abohtqfTDETDETDETDESN: 0.199, SLSN-I: 0.031, TDE: 0.770

Notes. Objects with * indicate no host information and are therefore predicted by NEEDLE-T; and others are predicted by NEEDLE-TH. Highest represents the label with the largest probability, ‘0.50’ and ‘0.75’ represent different thresholds. The completeness for three levels of thresholds are 0.75, 0.72, and 0.42.

Table 3.

NEEDLE predictions on known untouched ZTF objects.

ZTFIDClass0.750.50HighestProbability
*ZTF21aaqawpdSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.988, TDE: 0.008
*ZTF22aadeuwuSLSN-IUnclearUnclearSNSN: 0.495, SLSN-I: 0.411, TDE: 0.094
*ZTF22aadqgoaSLSN-IUnclearSNSNSN: 0.622, SLSN-I: 0.066, TDE: 0.312
*ZTF22aaljlzqSLSN-IUnclearSLSN-ISLSN-ISN: 0.007, SLSN-I: 0.718, TDE: 0.275
*ZTF22aalzjdcSLSN-IUnclearSLSN-ISLSN-ISN: 0.121, SLSN-I: 0.732, TDE: 0.148
*ZTF22aapjqpnSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.009, SLSN-I: 0.887, TDE: 0.104
*ZTF22aausnwrSLSN-IUnclearUnclearTDESN: 0.277, SLSN-I: 0.225, TDE: 0.498
*ZTF22abkbmobSLSN-IUnclearSLSN-ISLSN-ISN: 0.065, SLSN-I: 0.693, TDE: 0.243
*ZTF22abvarjhSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.003, SLSN-I: 0.853, TDE: 0.144
*ZTF22abvngdrSLSN-IUnclearSLSN-ISLSN-ISN: 0.033, SLSN-I: 0.732, TDE: 0.235
*ZTF22abynkpzSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.993, TDE: 0.003
*ZTF23aagdbbvSLSN-IUnclearSLSN-ISLSN-ISN: 0.009, SLSN-I: 0.727, TDE: 0.264
*ZTF23aawhcjbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.972, TDE: 0.017
*ZTF23aaznlgbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.854, TDE: 0.135
*ZTF23abjarpvSLSN-IUnclearSLSN-ISLSN-ISN: 0.043, SLSN-I: 0.701, TDE: 0.256
*ZTF23aboebghSLSN-ISNSNSNSN: 0.813, SLSN-I: 0.146, TDE: 0.041
*ZTF23abofwbaSLSN-IUnclearTDETDESN: 0.094, SLSN-I: 0.311, TDE: 0.595
ZTF21aalkhotSLSN-IUnclearSLSN-ISLSN-ISN: 0.093, SLSN-I: 0.551, TDE: 0.355
ZTF21acrbbwiSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.000, SLSN-I: 1.000, TDE: 0.000
ZTF22aarqrxfSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.017, SLSN-I: 0.981, TDE: 0.003
ZTF22abcvfgsSLSN-IUnclearSLSN-ISLSN-ISN: 0.080, SLSN-I: 0.521, TDE: 0.399
ZTF22abvcnnlSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.103, SLSN-I: 0.893, TDE: 0.004
ZTF23aaarvxjSLSN-IUnclearSLSN-ISLSN-ISN: 0.026, SLSN-I: 0.504, TDE: 0.470
ZTF23aaccacySLSN-IUnclearSNSNSN: 0.611, SLSN-I: 0.076, TDE: 0.313
ZTF23aazodojSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.102, SLSN-I: 0.897, TDE: 0.001
ZTF23abawhqlSLSN-ISNSNSNSN: 0.999, SLSN-I: 0.001, TDE: 0.000
ZTF23abjhbcrSLSN-ITDETDETDESN: 0.087, SLSN-I: 0.069, TDE: 0.844
ZTF23abjuxsoSLSN-IUnclearUnclearSLSN-ISN: 0.220, SLSN-I: 0.450, TDE: 0.330
ZTF23aadcbayTDETDETDETDESN: 0.101, SLSN-I: 0.012, TDE: 0.887
ZTF23aamsetvTDETDETDETDESN: 0.073, SLSN-I: 0.100, TDE: 0.827
ZTF23aapyidjTDETDETDETDESN: 0.200, SLSN-I: 0.010, TDE: 0.790
ZTF23aaqdjhiTDETDETDETDESN: 0.094, SLSN-I: 0.005, TDE: 0.901
ZTF23abaujuyTDEUnclearTDETDESN: 0.363, SLSN-I: 0.034, TDE: 0.603
ZTF23abgnxfvTDEUnclearSNSNSN: 0.558, SLSN-I: 0.016, TDE: 0.426
ZTF23abkixdbTDEUnclearTDETDESN: 0.209, SLSN-I: 0.074, TDE: 0.717
ZTF23abohtqfTDETDETDETDESN: 0.199, SLSN-I: 0.031, TDE: 0.770
ZTFIDClass0.750.50HighestProbability
*ZTF21aaqawpdSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.988, TDE: 0.008
*ZTF22aadeuwuSLSN-IUnclearUnclearSNSN: 0.495, SLSN-I: 0.411, TDE: 0.094
*ZTF22aadqgoaSLSN-IUnclearSNSNSN: 0.622, SLSN-I: 0.066, TDE: 0.312
*ZTF22aaljlzqSLSN-IUnclearSLSN-ISLSN-ISN: 0.007, SLSN-I: 0.718, TDE: 0.275
*ZTF22aalzjdcSLSN-IUnclearSLSN-ISLSN-ISN: 0.121, SLSN-I: 0.732, TDE: 0.148
*ZTF22aapjqpnSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.009, SLSN-I: 0.887, TDE: 0.104
*ZTF22aausnwrSLSN-IUnclearUnclearTDESN: 0.277, SLSN-I: 0.225, TDE: 0.498
*ZTF22abkbmobSLSN-IUnclearSLSN-ISLSN-ISN: 0.065, SLSN-I: 0.693, TDE: 0.243
*ZTF22abvarjhSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.003, SLSN-I: 0.853, TDE: 0.144
*ZTF22abvngdrSLSN-IUnclearSLSN-ISLSN-ISN: 0.033, SLSN-I: 0.732, TDE: 0.235
*ZTF22abynkpzSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.004, SLSN-I: 0.993, TDE: 0.003
*ZTF23aagdbbvSLSN-IUnclearSLSN-ISLSN-ISN: 0.009, SLSN-I: 0.727, TDE: 0.264
*ZTF23aawhcjbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.972, TDE: 0.017
*ZTF23aaznlgbSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.011, SLSN-I: 0.854, TDE: 0.135
*ZTF23abjarpvSLSN-IUnclearSLSN-ISLSN-ISN: 0.043, SLSN-I: 0.701, TDE: 0.256
*ZTF23aboebghSLSN-ISNSNSNSN: 0.813, SLSN-I: 0.146, TDE: 0.041
*ZTF23abofwbaSLSN-IUnclearTDETDESN: 0.094, SLSN-I: 0.311, TDE: 0.595
ZTF21aalkhotSLSN-IUnclearSLSN-ISLSN-ISN: 0.093, SLSN-I: 0.551, TDE: 0.355
ZTF21acrbbwiSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.000, SLSN-I: 1.000, TDE: 0.000
ZTF22aarqrxfSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.017, SLSN-I: 0.981, TDE: 0.003
ZTF22abcvfgsSLSN-IUnclearSLSN-ISLSN-ISN: 0.080, SLSN-I: 0.521, TDE: 0.399
ZTF22abvcnnlSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.103, SLSN-I: 0.893, TDE: 0.004
ZTF23aaarvxjSLSN-IUnclearSLSN-ISLSN-ISN: 0.026, SLSN-I: 0.504, TDE: 0.470
ZTF23aaccacySLSN-IUnclearSNSNSN: 0.611, SLSN-I: 0.076, TDE: 0.313
ZTF23aazodojSLSN-ISLSN-ISLSN-ISLSN-ISN: 0.102, SLSN-I: 0.897, TDE: 0.001
ZTF23abawhqlSLSN-ISNSNSNSN: 0.999, SLSN-I: 0.001, TDE: 0.000
ZTF23abjhbcrSLSN-ITDETDETDESN: 0.087, SLSN-I: 0.069, TDE: 0.844
ZTF23abjuxsoSLSN-IUnclearUnclearSLSN-ISN: 0.220, SLSN-I: 0.450, TDE: 0.330
ZTF23aadcbayTDETDETDETDESN: 0.101, SLSN-I: 0.012, TDE: 0.887
ZTF23aamsetvTDETDETDETDESN: 0.073, SLSN-I: 0.100, TDE: 0.827
ZTF23aapyidjTDETDETDETDESN: 0.200, SLSN-I: 0.010, TDE: 0.790
ZTF23aaqdjhiTDETDETDETDESN: 0.094, SLSN-I: 0.005, TDE: 0.901
ZTF23abaujuyTDEUnclearTDETDESN: 0.363, SLSN-I: 0.034, TDE: 0.603
ZTF23abgnxfvTDEUnclearSNSNSN: 0.558, SLSN-I: 0.016, TDE: 0.426
ZTF23abkixdbTDEUnclearTDETDESN: 0.209, SLSN-I: 0.074, TDE: 0.717
ZTF23abohtqfTDETDETDETDESN: 0.199, SLSN-I: 0.031, TDE: 0.770

Notes. Objects with * indicate no host information and are therefore predicted by NEEDLE-T; and others are predicted by NEEDLE-TH. Highest represents the label with the largest probability, ‘0.50’ and ‘0.75’ represent different thresholds. The completeness for three levels of thresholds are 0.75, 0.72, and 0.42.

6.5 Redshift and magnitude dependence

The magnitude and redshift distributions of the training set vary between different types of objects, therefore it is important to test for biases this could introduce in classification. Fig. 13 shows the the probability assigned by NEEDLE to every SLSN in the training data set and unseen validation set. In general we see only a weak dependence on apparent magnitude, though the scatter in probability appears to reduce slightly at fainter magnitudes. This is likely because these volumetrically rare events show a magnitude distribution clustered close to the survey detection limit. Moreover, there are few objects in the ZTF BTS data set below m ∼ 19. However, reassuringly we see no obvious difference in classification confidence between the seen and unseen data sets. The same is true for redshifts, though in this case an overall trend in classification confidence is clearer, with few high-redshift SLSNe or TDEs receiving low classification confidence. Since we do indeed expect to find more of these bright transients at high redshift, relative to normal SNe, this trend may suggest the code has learned that some features associated with redshift are useful for classification (e.g. apparent host galaxy size for a given surface brightness).

NEEDLE classification confidence as a function of magnitude and redshift. The ‘old’ and ‘new’ labels refer to the full data set for training and validation, and the untouched objects from after 2022 March (Table 3), respectively. ‘Hosted’ transients are classified by NEEDLE-TH, and ‘hostless’ by NEEDLE-T..
Figure 13.

NEEDLE classification confidence as a function of magnitude and redshift. The ‘old’ and ‘new’ labels refer to the full data set for training and validation, and the untouched objects from after 2022 March (Table 3), respectively. ‘Hosted’ transients are classified by NEEDLE-TH, and ‘hostless’ by NEEDLE-T..

6.6 Real-time annotation on Lasair

We aim to provide NEEDLE classifications in close to real time via the LSST:UK alert broker, Lasair. Our classifier will digest incoming transients from a pre-filtered Kafka stream produced by a simple Lasair query, using data from ZTF (or LSST in the future), and provide the probabilities of different classes for each object. To return our classifications to the broker, we make use of the Lasair annotator feature, which allows verified users to add information to the transients database in a format that is query-able by another user.

Fig. 14 shows the process in detail. NEEDLE is trained and tested using the ZTF alerts coming from Lasair. New alerts will be filtered by a customized SQL query to provide only young, reliable, extragalactic, and non-repeating transients. Specifically, we retain events:

  • discovered within the last 60 d,

  • with more than three confident detections (to reduce the chance of bad subtractions),

  • predicted to be an SN or nuclear transient by sherlock (i.e. not a known AGN or Galactic variable).

NEEDLE pipeline design for the alert broker Lasair. NEEDLE receives ZTF (and ultimately LSST) alerts from Lasair via a customized SQL filter to remove old or bogus objects. The science and reference images are contained in the ZTF alerts, or requested from the Rubin Science Platform. If they pass the quality checker, the host metadata will be fetched from sherlock and Pan-STARRS. Finally, NEEDLE will return the probabilities for the three classes to the Lasair annotation database, allowing them to be used in subsequent alert filters by any user.
Figure 14.

NEEDLE pipeline design for the alert broker Lasair. NEEDLE receives ZTF (and ultimately LSST) alerts from Lasair via a customized SQL filter to remove old or bogus objects. The science and reference images are contained in the ZTF alerts, or requested from the Rubin Science Platform. If they pass the quality checker, the host metadata will be fetched from sherlock and Pan-STARRS. Finally, NEEDLE will return the probabilities for the three classes to the Lasair annotation database, allowing them to be used in subsequent alert filters by any user.

Then, NEEDLE selects the brightest available detection as the input image, if it passes the quality image checker. NEEDLE then collects the host coordinates and photometry from sherlock and Pan-STARRS, computes the predicted probabilities from the trained network, and sends them back to Lasair as annotations.

We have tested this process end-to-end with a preliminary version of NEEDLE.

The latest version of NEEDLE-TH is currently running on a remote server daily and releasing the results as a public stream via Lasair.4 Applying a confidence threshold of 0.75 and a limit for follow-up of m < 20.0 mag in g or r bands, we find ∼6 TDE candidates and ∼7 SLSN candidates per week, over the first month of operations. Applying only basic cuts on magnitude and a most recent detection within the last two weeks, we find ∼136 events per week that would need to be assessed for follow-up, showing that NEEDLE cuts the alerts down to a manageable level, while still providing a reasonable number of targets for expert eyeballers to assess for follow-up. We note that at least one confirmed SLSN has already been correctly identified by NEEDLE in real-time operations (SN2023aase; Wise et al. 2024).

7 DISCUSSION

7.1 Individual discrepancy among rare transients

As mentioned in Section 5.2, the difficulty of classifying each individual object in our data set varies. One reason for this may be issues with the host galaxy metadata. In the Pan-STARRS survey, very nearby resolved galaxies may be broken into multiple sources by the survey photometry pipeline, resulting in underestimated host magnitudes. Failed host association may also cause issues, leading to the wrong photometry being retrieved. This is a particular problem for SLSNe, where many of the true hosts are not detected.

We also identify several real features of our objects that influence the ease of classification. For SLSNe, we found those that are easily classified often have relatively high mdiscovery and ΔTdiscovery, low Δmdiscovery, in a slightly bluer and faint (low gr and ri) host galaxy, that is, they are bright with a slow rise and a star-forming host, consistent with classic SLSNe in the literature. SLSNe in slightly more massive galaxies, or with short rise times, are more difficult to separate from normal SNe.

For TDEs, objects are most easily classified if they have a bright mdiscovery and a shorter ΔTdiscovery than typical SLSNe. This could also occur because these events occur in the nuclei of galaxies, and so tend to be found closer to peak unless the flux contrast with the host is large. In future work, we will investigate in more detail how to optimize the training process to account for these variations.

7.2 Comparisons to previous classifiers

In recent years, several transient classifiers have been designed that can recognize TDEs and in particular SLSNe. Some of them gain excellent accuracy for SLSNe by making use of their uniquely slow light curves. For the same reason, many of these classifiers show better performance when more light-curve data are available at later phases. Table B1 shows comparisons of these classifiers with our NEEDLE classifier.

The advantage of NEEDLE is that we do not require multiple detections or host redshifts as input because at LSST depth, few galaxies will have spectroscopic redshifts. We only use single-stamp images, alert photometry, and catalogued host magnitudes (when available), enabling an informed real-time prediction from as little as one detection. Furthermore, all data used in training is from real survey detections rather than simulations. It is likely that we could gain an even better performance by making use of more detailed light-curve information, and this is the aim for future development. However, the goal of NEEDLE is not to produce pure samples of photometrically classified events, but to provide probabilities of potential SLSNe and TDE at an early stage to guide spectroscopic follow-up. From this perspective, completeness may be more important than purity.

7.3 Remaining difficulties and future improvement

While the NEEDLE algorithm is performing well on the ZTF data set, we are continuing to develop the code and plan a number of future improvements to deal with current limitations, including:

  • Unbalanced classes. The rare transients we focus on, including SLSNe and TDE, have less than 100 samples for each. After being split into training and test sets, fewer samples are actually used for model training. Weighted loss functions can solve this problem to a certain extent, but feature extraction of rare classes requires more samples and smarter algorithms, such as small-sample learning.

  • Overfitting. The training process is limited by small training and validation sets, resulting in potential overfitting. In other words, the current feature extraction of SLSNs and TDEs might be biased. Detailed discussion is shown in Section 7.1.

  • NaN value replacement and padding. Replacing NaN values with zero poses difficulties for classification, since zero has physical meaning for magnitudes and image pixels. However, given the input requirements of neural networks, some kind of padding is inevitable. The possible solution is to fill in the missing values based on context and modelling.

  • There is a large fraction of SLSNe without catalogued hosts, and in the early years of LSST, this fraction will increase at higher redshifts, and affect the other classes too. To mitigate this, we will continue to develop and apply the NEEDLE-T version of the code in parallel for such cases.

  • Including more contaminants in our training set. Currently we assume that contaminants such as AGN and variable stars can be rejected by simple Lasair filters before they reach NEEDLE. This may not be the case in future, deeper surveys like LSST.

  • LSST alert cutouts in real-time are much smaller than for ZTF, and full-size images will only be available after 80 h. To achieve real-time prediction, older images might need to be included in classification and training, rather than just the most recent detection.

Additionally, we have further plans for new features, and analyses to improve our training process. These include:

  • A detailed study of misclassified objects. The next step will be to visualize the model behaviour for such objects individually, and try to understand the reasons for misclassification.

  • Including more time-domain information. Rather than one image and a set of simple light-curve features, using more advanced features, including the light curve directly, or even providing a time-series of images, may help to improve performance. For the next classifier, Conv3D and other relevant networks, such as RNNs, will be considered.

  • Early stage classification. The ultimate goal of our classifier is to identify rare events in their early stages, even before their peak. With the addition of images at multiple epochs, we will analyse the trends in accuracy as more observations are added.

8 CONCLUSION

This paper introduces a novel context-based hybrid neural network, capable of providing probabilistic classifications of transients as SLSNe, TDEs or SNe, at early stages in their evolution. The literature suggests that SLSNe are typically found in faint, star-forming dwarf galaxies, and TDEs are located at the centre of the host galaxies that are often green and centrally concentrated. Based on the understanding of their unique characteristics, the NEEDLE classifier is specifically developed to exploit this information and identify these sources using only single science and reference image stamps of a transient and its environment, as well as simple photometric information from ZTF (and in future LSST) alert packets, and catalogued host galaxy magnitudes.

Since half of the hosts of SLSNe are not cataloged, two versions of NEEDLE are developed, differentiated according to whether they contain host information. Results show that even without a catalogued host galaxy, we are able to identify 79 per cent of SNe, 76  per cent SLSNe, and 62 per cent TDEs, averaged among 10 test sets. As host information is added, the true positive rate of TDEs increases to approximately 72 per cent, and the highest true positive rate of SLSNe increases from 87 per cent to 93 per cent. To mitigate the issue of contamination from common SNe, we recommend a threshold probability before assigning a classification of p ≳ 0.75. Under these conditions, we can achieve over 95 per cent completeness for SLSNe and TDEs (on the full data set), at the cost of around 20 per cent purity.

Furthermore, photometric information has a greater impact on the predictions of SLSN and TDE compared with ordinary SNe, in particular because of their longer rise times. Experiments have shown that the fraction of SLSNe and TDEs classified correctly increases as they rise towards the light-curve peak.

Currently, NEEDLE is being implemented on the Lasair alert broker, and is able to process ZTF streaming alerts and submit an annotation containing the probabilities for three classes back to the broker. These public classifications will help to inform spectroscopic follow-up for these rare events. We are continuing to develop NEEDLE, and expect that image-based classification of transients will be a powerful tool in the era of LSST.

ACKNOWLEDGEMENTS

We thank members of the QUB transients discovery team, the QUB Virtual Institute for Data Intensive Research, and the Turing Institute for many helpful conversations. In particular, we thank Aleksandar Novakovic, Richard Gault, and Miguel Arana for their advice on neural networks. We also thank Sean McGee and Sebastian Gomez for helpful feedback on the project.

MN and XS are supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 948381). MN also acknowledges support from an Alan Turing Fellowship and UK Space Agency grant no. ST/Y000692/1. Lasair is supported by the UKRI Science and Technology Facilities Council and is a collaboration between the University of Edinburgh (grant ST/N002512/1) and Queen’s University Belfast (grant ST/N002520/1) within the LSST:UK Science Consortium.

DATA AVAILABILITY

This paper is based on publicly available data. We are making all results from this work publicly available via the LSST:UK Lasair broker, and the data repository including the trained models, scripts, and HDF5 format data will be made available via Github.

Footnotes

3

This is a particularly important problem for astronomical data as they are inherently noisy.

References

Angus
C. R.
,
Levan
A. J.
,
Perley
D. A.
,
Tanvir
N. R.
,
Lyman
J. D.
,
Stanway
E. R.
,
Fruchter
A. S.
,
2016
,
MNRAS
,
458
,
84

Baldeschi
A.
,
Miller
A.
,
Stroh
M.
,
Margutti
R.
,
Coppejans
D. L.
,
2020
,
ApJ
,
902
,
60

Bellm
E. C.
et al. ,
2018
,
PASP
,
131
,
018002

Blanchard
P. K.
,
Berger
E.
,
Fong
W.-f.
,
2016
,
ApJ
,
817
,
144

Boone
K.
,
2019
,
AJ
,
158
,
257

Botticella
M. T.
et al. ,
2017
,
A&A
,
598
,
A50

Bricman
K.
,
Gomboc
A.
,
2020
,
ApJ
,
890
,
73

Burhanudin
U. F.
et al. ,
2021
,
MNRAS
,
505
,
4345

Burhanudin
U. F.
,
Maund
J. R.
,
2023
,
MNRAS
,
521
,
1601

Carrasco-Davis
R.
et al. ,
2021
,
AJ
,
162
,
231

Chambers
K. C.
et al. ,
2016
,
The Pan-STARRS1 Surveys
()

Chen
T.-W.
,
Smartt
S. J.
,
Yates
R. M.
,
Nicholl
M.
,
Krühler
T.
,
Schady
P.
,
Dennefeld
M.
,
Inserra
C.
,
2017
,
MNRAS
,
470
,
3566

Chen
Z. H.
et al. ,
2023
,
ApJ
,
943
,
41

Cleland
C.
,
McGee
S. L.
,
Nicholl
M.
,
2023
,
MNRAS
,
524
,
3559

Collette
A.
,
2013
,
Python and HDF5
,
O'Reilly Media, Inc
,
Sebastopol, CA, USA

Donoso-Oliva
C.
,
Becker
I.
,
Protopapas
P.
,
Cabrera-Vives
G.
,
Vishnu
M.
,
Vardhan
H.
,
2023
,
A&A
,
670
,
A54

Flewelling
H. A.
et al. ,
2020
,
ApJS
,
251
,
7

Foley
R. J.
,
Mandel
K.
,
2013
,
ApJ
,
778
,
167

Förster
F.
et al. ,
2022
,
AJ
,
164
,
195

Fremling
C.
et al. ,
2020
,
ApJ
,
895
,
32

French
K. D.
,
Arcavi
I.
,
Zabludoff
A.
,
2016
,
ApJ
,
818
,
L21

French
K. D.
,
Zabludoff
A.
,
2018
,
ApJ
,
868
,
99
,

Fruchter
A. S.
et al. ,
2006
,
Nature
,
441
,
463

Gagliano
A.
,
Contardo
G.
,
Foreman Mackey
D.
,
Malz
A. I.
,
Aleo
P. D.
,
2023
,
ApJ
,
954
,
6

Gagliano
A.
,
Narayan
G.
,
Engel
A.
,
Carrasco Kind
M.
,
LSST Dark Energy Science Collaboration,
2021
,
ApJ
,
908
,
170

Gal-Yam
A.
,
2019
,
ARA&A
,
57
,
305

Gezari
S.
,
2021
,
ARA&A
,
59
,
21

Gomez
S.
,
Berger
E.
,
Blanchard
P. K.
,
Hosseinzadeh
G.
,
Nicholl
M.
,
Villar
V. A.
,
Yin
Y.
,
2020
,
ApJ
,
904
,
74

Gomez
S.
,
Villar
V. A.
,
Berger
E.
,
Gezari
S.
,
van Velzen
S.
,
Nicholl
M.
,
Blanchard
P. K.
,
Alexander
K. D.
,
2023
,
ApJ
,
949
,
113

Graur
O.
,
Bianco
F. B.
,
Modjaz
M.
,
Shivvers
I.
,
Filippenko
A. V.
,
Li
W.
,
Smith
N.
,
2017
,
ApJ
,
837
,
121

Graur
O.
,
French
K. D.
,
Zahid
H. J.
,
Guillochon
J.
,
Mandel
K. S.
,
Auchettl
K.
,
Zabludoff
A. I.
,
2018
,
ApJ
,
853
,
39

Guillochon
J.
,
Parrent
J.
,
Kelley
L. Z.
,
Margutti
R.
,
2017
,
ApJ
,
835
,
64

Hammerstein
E.
et al. ,
2023
,
ApJ
,
942
,
9

Hills
J. G.
,
1975
,
Nature
,
254
,
295

Hložek
R.
et al. ,
2023
,
ApJS
,
267
,
25

Hosseinzadeh
G.
et al. ,
2020
,
ApJ
,
905
,
93

Hsu
C.-J.
,
Tan
J. C.
,
Holdship
J.
,
Duo
,
Xu
,
Viti
S.
,
Wu
B.
,
Gaches
B.
,
2023
,
preprint
()

Ivezić
Ž.
et al. ,
2019
,
ApJ
,
873
,
111

Kantor
J.
,
2014
, in
Wozniak
P. R.
,
Graham
M. J.
,
Mahabal
A. A.
,
Seaman
R.
eds,
The Third Hot-wiring the Transient Universe Workshop
. p.
19

Kelly
P. L.
,
Kirshner
R. P.
,
2012
,
ApJ
,
759
,
107

Kessler
R.
et al. ,
2019
,
PASP
,
131
,
094501

Kingma
D. P.
,
Ba
J.
,
2014
,
preprint
()

Kisley
M.
,
Qin
Y.-J.
,
Zabludoff
A.
,
Barnard
K.
,
Ko
C.-L.
,
2023
,
ApJ
,
942
,
29

Law-Smith
J.
,
Ramirez-Ruiz
E.
,
Ellison
S. L.
,
Foley
R. J.
,
2017
,
ApJ
,
850
,
22

Leloudas
G.
et al. ,
2015
,
MNRAS
,
449
,
917

Li
R.
et al. ,
2022
,
A&A
,
666
,
A85

Li
W.
,
Chornock
R.
,
Leaman
J.
,
Filippenko
A. V.
,
Poznanski
D.
,
Wang
X.
,
Ganeshalingam
M.
,
Mannucci
F.
,
2011
,
MNRAS
,
412
,
1473

Lunnan
R.
et al. ,
2014
,
ApJ
,
787
,
138

Miranda
N.
et al. ,
2022
,
A&A
,
665
,
A99

Muthukrishna
D.
,
Narayan
G.
,
Mandel
K. S.
,
Biswas
R.
,
Hložek
R.
,
2019
,
PASP
,
131
,
118002

Nicholl
M.
,
2021
,
Astron. Geophys.
,
62
,
5

O’Malley
T.
, et al. ,
2019
,
KerasTuner
, https://github.com/keras-team/keras-tuner

Ørum
S. V.
,
Ivens
D. L.
,
Strandberg
P.
,
Leloudas
G.
,
Man
A. W. S.
,
Schulze
S.
,
2020
,
A&A
,
643
,
A47

Perley
D. A.
et al. ,
2016
,
ApJ
,
830
,
13

Perley
D. A.
et al. ,
2020
,
ApJ
,
904
,
35

Pimentel
Ó.
,
Estévez
P. A.
,
Förster
F.
,
2022
,
AJ
,
165
,
18

Qu
H.
,
Sako
M.
,
2022
,
AJ
,
163
,
57

Quimby
R. M.
et al. ,
2011
,
Nature
,
474
,
487

Ramsden
P.
,
Lanning
D.
,
Nicholl
M.
,
McGee
S. L.
,
2022
,
MNRAS
,
515
,
1146

Rees
M. J.
,
1988
,
Nature
,
333
,
523

Sánchez-Sáez
P.
et al. ,
2021
,
AJ
,
161
,
141

Schulze
S.
et al. ,
2017
,
MNRAS
,
473
,
1258

Shappee
B.
et al. ,
2014
,
American Astronomical Society Meeting Abstracts #223
. p.
236.03

Smith
K. W.
et al. ,
2019
,
Res. Notes Am. Astron. Soc.
,
3
,
26

Smith
K. W.
et al. ,
2020
,
PASP
,
132
,
085002

Stein
R.
et al. ,
2024
,
ApJ
,
965
,
L14

Sullivan
M.
et al. ,
2006
,
ApJ
,
648
,
868

Tonry
J. L.
et al. ,
2018
,
PASP
,
130
,
064505

van Velzen
S.
et al. ,
2021
,
ApJ
,
908
,
4

Villar
V. A.
et al. ,
2020
,
ApJ
,
905
,
94

Villar
V. A.
,
Nicholl
M.
,
Berger
E.
,
2018
,
ApJ
,
869
,
166

Wise
J.
,
Hinds
K.
,
Perley
D.
,
Bochenek
O.
,
Rich
R. M.
,
2024
,
Transient Name Server Class. Rep.
,
2024-468
,
1

Yao
Y.
et al. ,
2023
,
ApJ
,
955
,
L6

APPENDIX A: ZTF OBJECTS TYPE

Table A1.

Summary of ZTF transients with their types, features and numbers. Our ‘SN’ class includes the first four labels in this table, but in future versions we aim to resolve these SN subtypes.

LabelTypeFeatureNumber
SN Ia (4113)IaThermonuclear explosion of white dwarf; spectrum lacks hydrogen and helium4095
IaxA faint and fast subclass of SNe Ia11
Ia-CSMSN Ia interacting with nearby circumstellar material7
SN II (899)IICore-collapse explosion of a red supergiant ≳ 8 M899
Stripped envelope SN (363)Ib/cMassive stars that have lost their hydrogen (Ib) or hydrogen and helium layers (Ic)216
IIbIncomplete envelope stripping; initially show hydrogen lines, but quickly change to resemble an SN Ib80
Ic-BLBroad spectral lines due to high velocities, large nickel masses, the only SN type associated with gamma-ray bursts47
IbnSN interacting with a helium-rich CSM20
Interacting SN (211)IInHydrogen emission lines with narrow Doppler widths, indicating low-velocity CSM that has been shock-excited by a collision from the SN ejecta183
SLSN IIIIn brighter than −21 mag28
SLSN (87)SLSN10–100 times brighter than normal SN, no Hydrogen and usually no helium; late spectra resemble SNe Ic. Prefer dwarf galaxies.87
TDE (64)TDEA star approaches close to a supermassive black hole and is pulled apart by tidal forces, leading to fallback and accretion64
Other (18)GapTransient with luminosity intermediate between typical SNe and classical novae13
Ca-richFaint and fast transients of ambiguous nature, with strong calcium lines in spectrum3
Other2
Non-SN (40)NovaeOutburst on the surface of an accreting white dwarf30
ILRTIntermediate luminosity red transient4
LBVLuminous blue variable: very massive star undergoing eruptive mass loss4
LRNLuminous red novae: mergers of low-mass stellar binaries2
Sum5795
LabelTypeFeatureNumber
SN Ia (4113)IaThermonuclear explosion of white dwarf; spectrum lacks hydrogen and helium4095
IaxA faint and fast subclass of SNe Ia11
Ia-CSMSN Ia interacting with nearby circumstellar material7
SN II (899)IICore-collapse explosion of a red supergiant ≳ 8 M899
Stripped envelope SN (363)Ib/cMassive stars that have lost their hydrogen (Ib) or hydrogen and helium layers (Ic)216
IIbIncomplete envelope stripping; initially show hydrogen lines, but quickly change to resemble an SN Ib80
Ic-BLBroad spectral lines due to high velocities, large nickel masses, the only SN type associated with gamma-ray bursts47
IbnSN interacting with a helium-rich CSM20
Interacting SN (211)IInHydrogen emission lines with narrow Doppler widths, indicating low-velocity CSM that has been shock-excited by a collision from the SN ejecta183
SLSN IIIIn brighter than −21 mag28
SLSN (87)SLSN10–100 times brighter than normal SN, no Hydrogen and usually no helium; late spectra resemble SNe Ic. Prefer dwarf galaxies.87
TDE (64)TDEA star approaches close to a supermassive black hole and is pulled apart by tidal forces, leading to fallback and accretion64
Other (18)GapTransient with luminosity intermediate between typical SNe and classical novae13
Ca-richFaint and fast transients of ambiguous nature, with strong calcium lines in spectrum3
Other2
Non-SN (40)NovaeOutburst on the surface of an accreting white dwarf30
ILRTIntermediate luminosity red transient4
LBVLuminous blue variable: very massive star undergoing eruptive mass loss4
LRNLuminous red novae: mergers of low-mass stellar binaries2
Sum5795
Table A1.

Summary of ZTF transients with their types, features and numbers. Our ‘SN’ class includes the first four labels in this table, but in future versions we aim to resolve these SN subtypes.

LabelTypeFeatureNumber
SN Ia (4113)IaThermonuclear explosion of white dwarf; spectrum lacks hydrogen and helium4095
IaxA faint and fast subclass of SNe Ia11
Ia-CSMSN Ia interacting with nearby circumstellar material7
SN II (899)IICore-collapse explosion of a red supergiant ≳ 8 M899
Stripped envelope SN (363)Ib/cMassive stars that have lost their hydrogen (Ib) or hydrogen and helium layers (Ic)216
IIbIncomplete envelope stripping; initially show hydrogen lines, but quickly change to resemble an SN Ib80
Ic-BLBroad spectral lines due to high velocities, large nickel masses, the only SN type associated with gamma-ray bursts47
IbnSN interacting with a helium-rich CSM20
Interacting SN (211)IInHydrogen emission lines with narrow Doppler widths, indicating low-velocity CSM that has been shock-excited by a collision from the SN ejecta183
SLSN IIIIn brighter than −21 mag28
SLSN (87)SLSN10–100 times brighter than normal SN, no Hydrogen and usually no helium; late spectra resemble SNe Ic. Prefer dwarf galaxies.87
TDE (64)TDEA star approaches close to a supermassive black hole and is pulled apart by tidal forces, leading to fallback and accretion64
Other (18)GapTransient with luminosity intermediate between typical SNe and classical novae13
Ca-richFaint and fast transients of ambiguous nature, with strong calcium lines in spectrum3
Other2
Non-SN (40)NovaeOutburst on the surface of an accreting white dwarf30
ILRTIntermediate luminosity red transient4
LBVLuminous blue variable: very massive star undergoing eruptive mass loss4
LRNLuminous red novae: mergers of low-mass stellar binaries2
Sum5795
LabelTypeFeatureNumber
SN Ia (4113)IaThermonuclear explosion of white dwarf; spectrum lacks hydrogen and helium4095
IaxA faint and fast subclass of SNe Ia11
Ia-CSMSN Ia interacting with nearby circumstellar material7
SN II (899)IICore-collapse explosion of a red supergiant ≳ 8 M899
Stripped envelope SN (363)Ib/cMassive stars that have lost their hydrogen (Ib) or hydrogen and helium layers (Ic)216
IIbIncomplete envelope stripping; initially show hydrogen lines, but quickly change to resemble an SN Ib80
Ic-BLBroad spectral lines due to high velocities, large nickel masses, the only SN type associated with gamma-ray bursts47
IbnSN interacting with a helium-rich CSM20
Interacting SN (211)IInHydrogen emission lines with narrow Doppler widths, indicating low-velocity CSM that has been shock-excited by a collision from the SN ejecta183
SLSN IIIIn brighter than −21 mag28
SLSN (87)SLSN10–100 times brighter than normal SN, no Hydrogen and usually no helium; late spectra resemble SNe Ic. Prefer dwarf galaxies.87
TDE (64)TDEA star approaches close to a supermassive black hole and is pulled apart by tidal forces, leading to fallback and accretion64
Other (18)GapTransient with luminosity intermediate between typical SNe and classical novae13
Ca-richFaint and fast transients of ambiguous nature, with strong calcium lines in spectrum3
Other2
Non-SN (40)NovaeOutburst on the surface of an accreting white dwarf30
ILRTIntermediate luminosity red transient4
LBVLuminous blue variable: very massive star undergoing eruptive mass loss4
LRNLuminous red novae: mergers of low-mass stellar binaries2
Sum5795

APPENDIX B: COMPARISON WITH OTHER CLASSIFIERS

Table B1.

Comparisons among various transient classifiers for SLSNe and TDEs.

CodePaperModelData sourcesInputsPerformance for SLSNe and TDEs
tdescoreStein et al. (2024)XGBoostZTF alerts, Pan-STARRS hosts10 features of full light curves, five features from the context.In unbalanced test sets: TDE completeness of 0.77, purity of 0.803.
FLEET-SLSNGomez et al. (2020)Random forestsSupernovae: Open Supernova Catalog (Guillochon et al.2017), ZTF; Host: SDSS, PS1/3πLight-curve features: width of light curves, the phase offsets, the peak magnitudes from g and r bands; Host features: apparent magnitudes, half-light radius in r band, offset (same as NEEDLE here), offset normalized by galaxy radius, the apparent magnitude difference between transient at peak and host.In unbalanced/observed sets: SLSN purity of about 0.85 and completeness of 0.20.
FLEET-TDEGomez et al. (2023)Random forestsTransients: spectroscopically classified transients from TNS, light curves from ZTF; Host: SDSS, PS1/3πSimilar to FLEET-SLSNe.In unbalanced/observed sets: 20 d after discovery: about 40 per cent completeness and about 0.30 purity; 40 d after discovery: about 40 per cent completeness and about 50 per cent purity.
ALeRCE–light-curve classifier(Sánchez-Sáez et al.2021)Balanced random forestsZTF light curves, with labels from a variety of catalogues.Detection features: 56 features per band, and 12 features computed using g and r bands, yielding a total of 124 detection features; non-detection features: nine features per band defined using all the non-detections associated with a given source.In balanced test sets: 100 % accuracy with 26 % deviation, high accuracy though only 24 SLSNe samples.
SCONEQu & Sako (2022)CNNA set of LSST deep drilling field simulations.2D Gaussian process generating flux heat maps as function of time and filter (wavelength).In balanced test sets: without redshift, SLSN accuracy is 0.69, 0.76 and 0.92 at 0, 5, and 50 d after discovery; with redshift, the values are 0.91, 0.93, and 0.97.
SN classifierBurhanudin & Maund (2023)CNN and transfer learningLight curves from Open Supernova Catalog; the Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC).Referred from Qu & Sako (2022)In unbalanced test sets: for PLAsTiCC-simulated SLSNe, the accuracy with and without redshift are 0.61 and 0.65, respectively.
RAPIDMuthukrishna et al. (2019)RNNSimulated data from PLAsTiCC transient models.A matrix with each row composed of the imputed light-curve fluxes for each band, repeated values of the host galaxy redshift, and the MW dust reddening.In (nearly) balanced test sets: for SLSNe, the accuracy is 0.83 (2 days after trigger) to 0.85 (40 days); for TDEs, the numbers are 0.59 and 0.86.
SuperRAENNVillar et al. (2020)Recurrent autoencoder (RAE) and RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS) with known redshifts.Gaussian-processed light curve for RAE inputs, then use RAE latent features as inputs for random forest.In unbalanced test sets: for SLSNe, the completeness and purity are 0.76 and 0.81, with threshold larger than 0.7, their values increase to 0.83 and 0.91, respectively. Host redshift is considered.
SuperphotHosseinzadeh et al. (2020)RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS)Six principle component analysis coefficients on modelled light curves with known redshifts.In unbalanced sets: for SLSNe, the completeness and purity are 0.82 and 0.67, respectively.
NEEDLEThis workCNN + DNNZTF BTS, sherlock-predicted hosts and Pan-STARRS catalogsScience and reference image in a single band, simple light curve and most galaxy metadata.For SLSNe, averaged completeness is 0.73, averaged purity is 0.84 in the test sets. For TDE, the numbers are 0.80 and 0.80.
CodePaperModelData sourcesInputsPerformance for SLSNe and TDEs
tdescoreStein et al. (2024)XGBoostZTF alerts, Pan-STARRS hosts10 features of full light curves, five features from the context.In unbalanced test sets: TDE completeness of 0.77, purity of 0.803.
FLEET-SLSNGomez et al. (2020)Random forestsSupernovae: Open Supernova Catalog (Guillochon et al.2017), ZTF; Host: SDSS, PS1/3πLight-curve features: width of light curves, the phase offsets, the peak magnitudes from g and r bands; Host features: apparent magnitudes, half-light radius in r band, offset (same as NEEDLE here), offset normalized by galaxy radius, the apparent magnitude difference between transient at peak and host.In unbalanced/observed sets: SLSN purity of about 0.85 and completeness of 0.20.
FLEET-TDEGomez et al. (2023)Random forestsTransients: spectroscopically classified transients from TNS, light curves from ZTF; Host: SDSS, PS1/3πSimilar to FLEET-SLSNe.In unbalanced/observed sets: 20 d after discovery: about 40 per cent completeness and about 0.30 purity; 40 d after discovery: about 40 per cent completeness and about 50 per cent purity.
ALeRCE–light-curve classifier(Sánchez-Sáez et al.2021)Balanced random forestsZTF light curves, with labels from a variety of catalogues.Detection features: 56 features per band, and 12 features computed using g and r bands, yielding a total of 124 detection features; non-detection features: nine features per band defined using all the non-detections associated with a given source.In balanced test sets: 100 % accuracy with 26 % deviation, high accuracy though only 24 SLSNe samples.
SCONEQu & Sako (2022)CNNA set of LSST deep drilling field simulations.2D Gaussian process generating flux heat maps as function of time and filter (wavelength).In balanced test sets: without redshift, SLSN accuracy is 0.69, 0.76 and 0.92 at 0, 5, and 50 d after discovery; with redshift, the values are 0.91, 0.93, and 0.97.
SN classifierBurhanudin & Maund (2023)CNN and transfer learningLight curves from Open Supernova Catalog; the Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC).Referred from Qu & Sako (2022)In unbalanced test sets: for PLAsTiCC-simulated SLSNe, the accuracy with and without redshift are 0.61 and 0.65, respectively.
RAPIDMuthukrishna et al. (2019)RNNSimulated data from PLAsTiCC transient models.A matrix with each row composed of the imputed light-curve fluxes for each band, repeated values of the host galaxy redshift, and the MW dust reddening.In (nearly) balanced test sets: for SLSNe, the accuracy is 0.83 (2 days after trigger) to 0.85 (40 days); for TDEs, the numbers are 0.59 and 0.86.
SuperRAENNVillar et al. (2020)Recurrent autoencoder (RAE) and RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS) with known redshifts.Gaussian-processed light curve for RAE inputs, then use RAE latent features as inputs for random forest.In unbalanced test sets: for SLSNe, the completeness and purity are 0.76 and 0.81, with threshold larger than 0.7, their values increase to 0.83 and 0.91, respectively. Host redshift is considered.
SuperphotHosseinzadeh et al. (2020)RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS)Six principle component analysis coefficients on modelled light curves with known redshifts.In unbalanced sets: for SLSNe, the completeness and purity are 0.82 and 0.67, respectively.
NEEDLEThis workCNN + DNNZTF BTS, sherlock-predicted hosts and Pan-STARRS catalogsScience and reference image in a single band, simple light curve and most galaxy metadata.For SLSNe, averaged completeness is 0.73, averaged purity is 0.84 in the test sets. For TDE, the numbers are 0.80 and 0.80.
Table B1.

Comparisons among various transient classifiers for SLSNe and TDEs.

CodePaperModelData sourcesInputsPerformance for SLSNe and TDEs
tdescoreStein et al. (2024)XGBoostZTF alerts, Pan-STARRS hosts10 features of full light curves, five features from the context.In unbalanced test sets: TDE completeness of 0.77, purity of 0.803.
FLEET-SLSNGomez et al. (2020)Random forestsSupernovae: Open Supernova Catalog (Guillochon et al.2017), ZTF; Host: SDSS, PS1/3πLight-curve features: width of light curves, the phase offsets, the peak magnitudes from g and r bands; Host features: apparent magnitudes, half-light radius in r band, offset (same as NEEDLE here), offset normalized by galaxy radius, the apparent magnitude difference between transient at peak and host.In unbalanced/observed sets: SLSN purity of about 0.85 and completeness of 0.20.
FLEET-TDEGomez et al. (2023)Random forestsTransients: spectroscopically classified transients from TNS, light curves from ZTF; Host: SDSS, PS1/3πSimilar to FLEET-SLSNe.In unbalanced/observed sets: 20 d after discovery: about 40 per cent completeness and about 0.30 purity; 40 d after discovery: about 40 per cent completeness and about 50 per cent purity.
ALeRCE–light-curve classifier(Sánchez-Sáez et al.2021)Balanced random forestsZTF light curves, with labels from a variety of catalogues.Detection features: 56 features per band, and 12 features computed using g and r bands, yielding a total of 124 detection features; non-detection features: nine features per band defined using all the non-detections associated with a given source.In balanced test sets: 100 % accuracy with 26 % deviation, high accuracy though only 24 SLSNe samples.
SCONEQu & Sako (2022)CNNA set of LSST deep drilling field simulations.2D Gaussian process generating flux heat maps as function of time and filter (wavelength).In balanced test sets: without redshift, SLSN accuracy is 0.69, 0.76 and 0.92 at 0, 5, and 50 d after discovery; with redshift, the values are 0.91, 0.93, and 0.97.
SN classifierBurhanudin & Maund (2023)CNN and transfer learningLight curves from Open Supernova Catalog; the Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC).Referred from Qu & Sako (2022)In unbalanced test sets: for PLAsTiCC-simulated SLSNe, the accuracy with and without redshift are 0.61 and 0.65, respectively.
RAPIDMuthukrishna et al. (2019)RNNSimulated data from PLAsTiCC transient models.A matrix with each row composed of the imputed light-curve fluxes for each band, repeated values of the host galaxy redshift, and the MW dust reddening.In (nearly) balanced test sets: for SLSNe, the accuracy is 0.83 (2 days after trigger) to 0.85 (40 days); for TDEs, the numbers are 0.59 and 0.86.
SuperRAENNVillar et al. (2020)Recurrent autoencoder (RAE) and RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS) with known redshifts.Gaussian-processed light curve for RAE inputs, then use RAE latent features as inputs for random forest.In unbalanced test sets: for SLSNe, the completeness and purity are 0.76 and 0.81, with threshold larger than 0.7, their values increase to 0.83 and 0.91, respectively. Host redshift is considered.
SuperphotHosseinzadeh et al. (2020)RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS)Six principle component analysis coefficients on modelled light curves with known redshifts.In unbalanced sets: for SLSNe, the completeness and purity are 0.82 and 0.67, respectively.
NEEDLEThis workCNN + DNNZTF BTS, sherlock-predicted hosts and Pan-STARRS catalogsScience and reference image in a single band, simple light curve and most galaxy metadata.For SLSNe, averaged completeness is 0.73, averaged purity is 0.84 in the test sets. For TDE, the numbers are 0.80 and 0.80.
CodePaperModelData sourcesInputsPerformance for SLSNe and TDEs
tdescoreStein et al. (2024)XGBoostZTF alerts, Pan-STARRS hosts10 features of full light curves, five features from the context.In unbalanced test sets: TDE completeness of 0.77, purity of 0.803.
FLEET-SLSNGomez et al. (2020)Random forestsSupernovae: Open Supernova Catalog (Guillochon et al.2017), ZTF; Host: SDSS, PS1/3πLight-curve features: width of light curves, the phase offsets, the peak magnitudes from g and r bands; Host features: apparent magnitudes, half-light radius in r band, offset (same as NEEDLE here), offset normalized by galaxy radius, the apparent magnitude difference between transient at peak and host.In unbalanced/observed sets: SLSN purity of about 0.85 and completeness of 0.20.
FLEET-TDEGomez et al. (2023)Random forestsTransients: spectroscopically classified transients from TNS, light curves from ZTF; Host: SDSS, PS1/3πSimilar to FLEET-SLSNe.In unbalanced/observed sets: 20 d after discovery: about 40 per cent completeness and about 0.30 purity; 40 d after discovery: about 40 per cent completeness and about 50 per cent purity.
ALeRCE–light-curve classifier(Sánchez-Sáez et al.2021)Balanced random forestsZTF light curves, with labels from a variety of catalogues.Detection features: 56 features per band, and 12 features computed using g and r bands, yielding a total of 124 detection features; non-detection features: nine features per band defined using all the non-detections associated with a given source.In balanced test sets: 100 % accuracy with 26 % deviation, high accuracy though only 24 SLSNe samples.
SCONEQu & Sako (2022)CNNA set of LSST deep drilling field simulations.2D Gaussian process generating flux heat maps as function of time and filter (wavelength).In balanced test sets: without redshift, SLSN accuracy is 0.69, 0.76 and 0.92 at 0, 5, and 50 d after discovery; with redshift, the values are 0.91, 0.93, and 0.97.
SN classifierBurhanudin & Maund (2023)CNN and transfer learningLight curves from Open Supernova Catalog; the Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC).Referred from Qu & Sako (2022)In unbalanced test sets: for PLAsTiCC-simulated SLSNe, the accuracy with and without redshift are 0.61 and 0.65, respectively.
RAPIDMuthukrishna et al. (2019)RNNSimulated data from PLAsTiCC transient models.A matrix with each row composed of the imputed light-curve fluxes for each band, repeated values of the host galaxy redshift, and the MW dust reddening.In (nearly) balanced test sets: for SLSNe, the accuracy is 0.83 (2 days after trigger) to 0.85 (40 days); for TDEs, the numbers are 0.59 and 0.86.
SuperRAENNVillar et al. (2020)Recurrent autoencoder (RAE) and RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS) with known redshifts.Gaussian-processed light curve for RAE inputs, then use RAE latent features as inputs for random forest.In unbalanced test sets: for SLSNe, the completeness and purity are 0.76 and 0.81, with threshold larger than 0.7, their values increase to 0.83 and 0.91, respectively. Host redshift is considered.
SuperphotHosseinzadeh et al. (2020)RFsLight curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS)Six principle component analysis coefficients on modelled light curves with known redshifts.In unbalanced sets: for SLSNe, the completeness and purity are 0.82 and 0.67, respectively.
NEEDLEThis workCNN + DNNZTF BTS, sherlock-predicted hosts and Pan-STARRS catalogsScience and reference image in a single band, simple light curve and most galaxy metadata.For SLSNe, averaged completeness is 0.73, averaged purity is 0.84 in the test sets. For TDE, the numbers are 0.80 and 0.80.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.