NEural Engine for Discovering Luminous Events (NEEDLE): identifying rare transient candidates in real time from host galaxy images

Known for their efficiency in analyzing large data sets, machine learning classifiers are widely used in wide-field sky surveys. The upcoming Vera C. Rubin Observatory Legacy of Time and Space Survey (LSST) will generate millions of alerts every night, enabling the discovery of large samples of rare events. Identifying such objects soon after explosion will be essential to study their evolution. This requires a machine learning framework that makes use of all available transient and contextual information. Using $\sim5400$ transients from the ZTF Bright Transient Survey as input data, we develop NEEDLE, a novel hybrid classifier to select for two rare classes with strong environmental preferences: superluminous supernovae (SLSNe) preferring dwarf galaxies, and tidal disruption events (TDEs) occurring in the centres of nucleated galaxies. The input data includes detection and reference images, photometric information from the alert packets, and host galaxy magnitudes from Pan-STARRS. Despite having only a few tens of examples of the rare classes, our average (best) completeness on an unseen test set reaches 77% (93%) for SLSNe and 72% (87%) for TDEs. This may still result in a large fraction of false positives for the rare transients, given the large class imbalance in real surveys. However, the goal of NEEDLE is to find good candidates for spectroscopic classification, rather than to select pure photometric samples. Our network is designed with LSST in mind and we expect performance to improve further with the higher resolution images and more accurate transient and host photometry that will be available from Rubin. Our system will be deployed as an annotator on the UK alert broker, Lasair, to provide predictions to the community in real time.


INTRODUCTION
Thanks to modern time-domain sky surveys, such as the Zwicky Transient Facility (ZTF; Bellm et al. 2018), Asteroid Terrestrial impact Last Alert System (ATLAS; Tonry et al. 2018), Panoramic Survey Telescope and Rapid Response System (Pan-STARRS; Chambers et al. 2019), and All-sky Automated Search for Supernovae (ASAS-SN; Shappee et al. 2014), increasing numbers of transients have been discovered, catalogued and studied.The diversity of their spectra, and even photometric properties such as absolute magnitude, rise and decline rate and duration, have led to the identification of new and rare classes of events.Even more encouraging is that the upcoming Legacy Survey of Space and Time (LSST; Ivezić et al. 2019) survey will significantly increase the transient discovery rate through deeper observations, wide field of view and colour information from six filters.
In recent years, novel and rare superluminous supernovae (SLSNe) ★ E-mail: xsheng03@qub.ac.uk and tidal disruption events (TDEs) have been intensively studied, although their intrinsic physical mechanisms remain unclear.SLSNe are around ∼ 10 times brighter than a Type Ia supernova (SN) and ∼ 100 times brighter than core-collapse SNe.Their rise timescales of greater than ≳ 15 − 30 days are also longer than typical supernovae (Quimby et al. 2011;Gal-Yam 2019;Nicholl 2021).The hydrogen-poor events, also termed SLSNe Type I, mostly occur in low-mass dwarf galaxies with high specific star-formation rates and low metallicities (Lunnan et al. 2014;Leloudas et al. 2015;Perley et al. 2016;Angus et al. 2016;Schulze et al. 2017;Chen et al. 2017), which provides important hints for finding and identifying such events.Studying SLSNe allows researchers to fill gaps in our understanding of stellar evolution, particularly core-collapse supernovae in low-metallicity environments, and explore the extreme mass loss and rotation of possible massive progenitor stars.
A Tidal Disruption Event (TDE) occurs when a star's orbit gets close enough to be disrupted by the massive black hole (MBH) at the centre of a galaxy, leading to accretion onto the MBH with luminous emission and possibly jets (Hills 1975;Rees 1988;Gezari 2021).Such rare events provide researchers with an opportunity to conveniently investigate accretion flows on quiescent black holes (at the low end of the MBH mass distribution), with accretion rates that change by orders of magnitude on human timescales.Compared with SNe and SLSNe, TDEs can be differentiated by their locations in centres of their host galaxies as well as light curves that show a constant temperature.This provides helpful information for machine learning algorithms to learn their unique features.
Modern research on transients is mainly conducted on their spectra in the frequency domain and photometric information in the time domain.Spectra are essential to reveal their chemical compositions and physical properties (mass, velocity, redshift, etc).However, as photometric images require much shorter exposure time than spectra, they are preferred for observing missions that pursue large night sky coverage and long-term repeated detection.As the number of transients discovered in wide-field imaging sky surveys grows exponentially, it is no longer possible to obtain spectra for most transients due to the expensive exposure times required.
The Vera C. Rubin Observatory (VRO) is planning to conduct LSST starting in 2025 (Ivezić et al. 2019).LSST will observe the whole Southern sky and part of the Northern sky, including a Wide-Fast-Deep field (90% of the observing time) with seasonal cadence and a Deep-Drilling field with dense and deep detection.Alert brokers, such as the UK alert broker Lasair (Smith et al. 2019), will provide researchers with real-time (within minutes to days) access to transient data.LSST is predicted to detect about 10 million transient alerts (defined as detections of time-varying flux) per night (Kantor 2014).These alerts will include ∼ 10 4 SLSNe (Villar et al. 2018) and 3, 500 − 8, 000 TDEs (Bricman & Gomboc 2020) per year.However, the number of conventional SNe detected each year will be ≳ 10 6 , meaning that only a small fraction of events will ever be observed spectroscopically.It is therefore essential to identify the most interesting candidates photometrically, in order to prioritise them for spectroscopy.
Machine learning algorithms will play an important role in classifying and filtering these alerts in real-time.This project aims to build up a hybrid classifier that fully takes advantage of various machine learning algorithms and combines different astronomical resources to identify candidate rare transients, such as SLSNe and TDEs, at or before their luminosity peak.For this reason, we are motivated to use only the properties available at the time of an early photometric detection: the early light curve, the associated discovery and reference images, and any cataloged host galaxy, but no information (such as redshift) that would require additional observations.We call this classifier the NEural Engine for Discovering Luminous Events (NEEDLE).
The paper outline is below: Section 2 reviews some of the existing techniques in machine learning classification and why SLSNe and TDEs are promising targets.Section 3 illustrates the data sources from ZTF bright transient survey and Pan-STARRS, and analyses the correlations between different features and transients.Section 4 describe the image and metadata pre-processing methods, including a binary classifier to assess image quality.Section 5 shows the model architecture, training and test sets, and development details.Section 6 shows the performance of the classifiers by confusion matrix as well as their completeness and purity diagrams, and illustrates the pipeline of NEEDLE to provide classifications publicly on Lasair.Then, Section 7 discusses the transient labelling issues, and comparisons with currently popular classifiers, and difficulties and improvements.Finally, Section 8 is the summary of this paper.

The host galaxy matters
The environment of transients have shown strong correlations with transients properties.For example, the rates of typical Type Ia and core-collapse SNe scale with host galaxy stellar mass (Sullivan et al. 2006;Li et al. 2011).The relative fractions of different SN types vary between galaxies with different masses (Graur et al. 2017) and starformation rates (Botticella et al. 2017).The locations within their hosts vary, with some types of SNe also showing strong preferences for occurring in the brightest or bluest parts of their hosts (Fruchter et al. 2006;Kelly & Kirshner 2012;Blanchard et al. 2016).The rare transient classes that we are interested in, SLSNe and TDEs, are prime candidates for selection via their environments, as each shows strong biases in their host galaxies.SLSNe are very unusual in that they show a strong preference (shared only by long gamma-ray bursts) for dwarf star-forming galaxies.SLSN samples also show a high fraction of irregular or interacting galaxies (Chen et al. 2017;Ørum et al. 2020), but overall occur in low-density environments rather than groups or clusters (Cleland et al. 2023).The locations of SLSNe within their hosts broadly track an exponential disk profile, but many events also occur at large offsets or in regions of low UV flux (Hsu et al. 2023).
TDEs, like active galactic nuclei (AGN), occur at the centres of galaxies hosting MBHs.However, TDEs are rarely observed in galaxies with masses above ∼ few × 10 10 M ⊙ (van Velzen et al. 2021;Ramsden et al. 2022), since for very massive black holes the disruption occurs inside the event horizon.TDE hosts in particular show a large over-representation of recently quenched (French et al. 2016) galaxies with green colours (Hammerstein et al. 2022;Yao et al. 2023).Compared to typical galaxies, their light profiles tend to be strongly peaked towards the nucleus (Law- Smith et al. 2017;Graur et al. 2018).
Some existing codes employ the context of where a transient appears to aid classification.For example, Sherlock, applied on Lasair is an integrated massive database system that classifies transients by cross-matching the position of a transient with all major astronomical catalogues (Smith et al. 2020).By associating transients with galaxies, galaxy nuclei, known AGN, variables or very bright stars, Sherlock provides a top-level classification of any transient as a likely SN, nuclear transient, AGN, etc.Similarly, using contextual information, the ALeRCE Stamp Classifier takes the first images and alert metadata for an object to provide a preliminary classification of AGN, SN, variable star, asteroid or bogus (Carrasco-Davis et al. 2021).Baldeschi et al. (2020) presents a Random Forest (RF) classifier for galaxy classification based on recent star formation history and morphology, and applies it to the hosts of core-collapse and thermonuclear SNe, indicating that the colours and shapes of hosts can help the separation between two classes, better than random guessing.
Other codes go further and attempt to predict the spectroscopic sub-type of transient.Foley & Mandel (2013); Kisley et al. (2022) use purely host galaxy photometry to provide the probabilities of different types of SNe.GHOST (Gagliano et al. 2021) employs a novel gradient ascent method to find the associated host galaxies, and based on the features of hosts and angular offset, they apply a RF to distinguish SLSNe, Type Ia SNe and core-collapse SNe.Gagliano et al. (2023) takes host properties and light curves as inputs to classify SNe Ia, SNe II, and SNe Ib/c, and obtain increasing accuracy with later phases.In summary, different transients have unique preferences for where they occur, and these can help reveal their likely nature.Recurrent Neural Networks (RNN) are capable of learning the correlations among close and distant time steps among time-series and are designed for classification, modelling and prediction.RNNs can extract features from light curves of different classes of transients to distinguish them.Examples of such codes include RAPID (Muthukrishna et al. 2019), SuperRAENN (Villar et al. 2020), Superphot (Hosseinzadeh et al. 2020), Classifier for GOTO (Burhanudin et al. 2021) and Early-time transient Classifier (Gagliano et al. 2023).Attention mechanism has also been applied, such as TimeModAttn (Pimentel et al. 2022).
On the other hand, Convolutional Neural Networks (CNN) are mainly designed for visual imagery classification.They can generate feature maps of the input data while training, and attempt to associate these features with class labels.Image-based classifiers have not yet attained the widespread use of light-curve classifiers, but experiments to date have shown that this approach is a very promising alternative, as it can take into account the transient position and host galaxy morphology, as discussed in 2.1.Transients researchers have been implementing CNNs for transient classification with codes such as ALeRCE (Carrasco-Davis et al. 2021), DELIGHT (Förster et al. 2022), and recent work on light curves by Burhanudin & Maund (2023), proving that CNNs are able to achieve high accuracy in identifying various types of transients.
The above architectures have also shown promising performance in classifying rare events like SLSNe and TDEs.For SLSNe, classifiers using light curves achieve completeness ∼ 0.69 − 0.83, and in one case up to 1.00 completeness (Qu & Sako 2022;Muthukrishna et al. 2019;Sánchez-Sáez et al. 2021).For TDEs, existing codes achieve 0.40 completeness (Gomez et al. 2023) at early phases, or better than 0.80 with full light curves (Stein et al. 2023).A detailed review is shown in Table 3.
However, there are still difficulties and limitations remaining.CNNs may struggle with low image quality due to issues of signalto-noise, resolution, bright nearby objects or detector cosmetics, resulting in mislabelling.Light curves with sparse cadences and few observations are difficult for RNNs to extract the correlations.For training and test data sets, the number of spectroscopically-confirmed SLSNe and TDEs make up only 1-2% of all transient samples, leading to possible underrepresented learning.Many classifiers, such as Hložek et al. (2023), have been trained on simulated data sets (e.g.PlasTiCC; Kessler et al. 2019), which avoids the difficulties that must be overcome when dealing with real data.Finally, any classifiers that require redshift information or the declining part of a light curve may not be suitable for early-time classification.
Novel architectures are required to gain better accuracy.In recent years, hybrid neural networks have become more popular.CNNs with artificial neural networks (ANN, fully connected neural networks) are able to fully use images and metadata (position, redshift, etc) together to provide high accuracy predictions (e.g.GaZNets, Li et al. 2022;ALeRCE, Carrasco-Davis et al. 2021).Other architectures, such as transformers like ASTROMER (Donoso-Oliva et al. 2022), use an Autoencoder with positional embedding and self-attention blocks to gain the representation of transients' light curves, which can be further applied to classification and modelling.
In short, more deep learning applications for astronomical study are expected to digest multivariate data.This might include magnitudes or fluxes in the time dimension, images (in one or more filters), and contextual information from existing catalogues.Our goal here is to take the first steps in realising a hybrid classifier that tries to maximise the information used from images, simple light curve features and host galaxy features, and apply this to the case of finding SLSNe and TDEs in wide-field surveys.

DATA SET
For this project we require a training and test set of transients with known classes (based on spectroscopic classifications).In this section we outline the sources of data used to train and validate our code.

ZTF bright transients database
Although our ultimate goal is to develop a classifier for LSST, for our initial training and test set before that survey begins we use the Zwicky Transient Facility (ZTF) Bright Transient Survey (BTS) (Bellm et al. 2018;Fremling et al. 2020;Perley et al. 2020).The ZTF public survey covers the entire Northern sky to a depth of ≈ 20 − 20.5 mag every 2-3 nights in  and  filters.The BTS has been spectroscopically classifying all ZTF-detected supernovae brighter than ≈ 19 mag since June of 2018.We choose this dataset as it is the largest homogeneous set of labelled transients available, and the data are comparable to LSST in terms of imaging cadence and the format of the real-time alerts.
We downloaded the entire ZTF BTS sample brighter than 19 mag, up to March 2022, and use this as the basis of our sample.This contains 5703 spectroscopically classified transients.Information, such as ZTF object ID, coordinates, discovery date and spectroscopic type can be found from ZTF Bright Transient Survey Sample Explorer 1 .After removing duplicates and missing objects in the ZTF database, 5388 ZTF objects are obtained.This includes over 5000 SNe, but only 37 SLSNe and 18 TDEs.We therefore supplement the BTS data set with any SLSNe or TDEs published in ZTF sample papers.This includes TDEs from Hammerstein et al. (2022);van Velzen et al. (2021) and SLSNe from Chen et al. (2022).Given that some of these objects are already in the BTS data, our total numbers of SLSNe and TDEs are 87 and 64, respectively.
All transients fall into five general categories, shown in Table A1: • "Common" Supernovae (including all spectroscopic Type Ia, core-collapse, and interacting SNe) • Superluminous supernovae (considering here only the hydrogen-poor SLSNe Type I) • Tidal Disruption Events • Possible SNe/transients of ambiguous nature (calcium-rich, gap transients) • Non-SN (novae and stellar outbursts).
The latter two categories make up only 1% of the sample, and can generally be filtered out by their fast light curves before being passed to the machine learning classifier.We do not include them in our training or test set, but include them in the Appendix for completeness.The first category is very broad, containing 97.2% of events.However, as the task of NEEDLE is to distinguish among SNe, TDEs and SLSNe, we avoid sub-dividing the SN class so that more attention can be focused on the rare classes of interest.This is the first version of NEEDLE, our aim is that future versions with improved architecture and more data will be able to perform more fine-grained classification of the various supernova sub-types.Table 1 provides counts of objects with image and magnitude data from ZTF, as well as the numbers that also have cataloged host data in  and  bands from deeper surveys.

Images
We wrote Python scripts to download ZTF cutout images centered at the transient positions, using the ZTF image database API.For each ZTF object, starting from its discovery date to 200 days later, we downloaded all available images in the  and  bands.This includes: the Science image -the image taken in each visit, containing the transient flux; the Reference image -a stacked image from before the event's discovery, providing a template of the host galaxy and surrounding field; and the Difference image -the subtraction of the above two images, containing only transient flux.The requested image size is 1 arcmin.This size is large enough to include most host galaxies, while larger images would include more unrelated sources in the field.Figure 1 shows examples of ZTF images obtained for three classes of transients we consider.

Image metadata
We label each image with metadata including ZTF object ID, class label (SN, SLSN or TDE), RA, Dec, image size, and date.We retrieve separately for each filter the start and end Julian dates where the object is detected.This information is stored in a JSON file for each object.
Although we download all images and their associated metadata for these objects, we find better performance in training when we give our network only one image per object, therefore when training the model we use the image metadata to select one image from close to the time of light curve peak.

Light curve metadata
For each object, we also retrieve its photometry for each available detection through Lasair.Our aim is to include some simple light curve parameters (features) as additional data for our classifier.We use the Lasair API to query the light curve using the ZTF object ID.
After cross-matching with the image data, we found that not every image has a corresponding magnitude, as Lasair contains only the public ZTF photometry.Although more light curve data would be available by querying the ZTF forced photometry, using the data from Lasair ensures consistent formatting between our training data set and future real-time alert classifications we wish to perform with our trained model.This light curve data (detection dates and magnitudes) is appended to the metadata file for each object.The simple features extracted are listed in Table 2.

Host galaxy metadata
The Sherlock software package (Smith et al. 2020) is integrated into Lasair, and automatically provides a contextual classification by cross-matching with a library of historical and on-going sky survey catalogs.This provides preliminary classifications as transients, variables, artefacts, etc, based on association with nearby galaxies, known cataclysmic variable stars, active galactic nuclei, or bright stars.For this project, we query the Sherlock table on Lasair to find the coordinates of the most likely host galaxy for each of our transients.
We use these coordinates to retrieve host galaxy magnitudes from  Possible hosts were found by Sherlock for most transients, but about half of the SLSN hosts are missed, as they are likely fainter than the Pan-STARRS DR2 limiting magnitude.This is unsurprising given that most SLSNe explode in distant dwarf galaxies.We used the Pan-STARRS DR2 API to obtain the Aperture magnitude in , , ,  and  bands.The colour of the host can be measured by  − , and  − , and is correlated with the age and star-formation rate of the stellar population.
The full list of metadata used in this study is given in Table 2.

Data Analysis
Before building a model, we check for correlations or clusters within our metadata.Figure 2 shows simple features obtained from the ZTF -band light curves.We show the apparent magnitude around the peak  peak , the magnitude contrast between transient and host, the elapsed time between first detection and light curve peak, as well as a measure of the light curve slope during the rise, the "rising ratio" ( peak - discovery )/( peak −  discovery ).
It can be seen that the distributions of SLSN and TDE apparent magnitudes in our sample skew dimmer than the distribution for other SNe.This is due in part to the lack of nearby events in 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 these rare classes.The need to include examples from outside of the magnitude-limiting BTS sample may also bias these events towards fainter magnitudes, but is unavoidable given the class size imbalance.
SLSNe show a much larger contrast at peak with their host galaxies, standing out from SNe and TDEs.Moreover, their rising timescales are longer than other transients.Normal SNe show the fastest rise, with TDEs showing a broad distribution peaking in between the other classes.The median SLSN rising ratio is similar to TDEs, but the deviation is smaller.To compare some of the key parameters more clearly, Figure 3 presents the cumulative distributions of host galaxy contrast ( peak - host ) and approximate rise time ( peak - discovery ), where the three classes show clear differences.
Similarly, Figure 4 shows a corner plot for host galaxy metadata, including magnitudes and colours in ,  and  bands, and the offset in arcseconds between the transient coordinates and the host galaxy centroid.Again the plot shows that SLSNe tend to have the faintest hosts, with a slight bias to bluer colours in  − .As expected, the host offset for most TDEs is clustered around ∼ 0.0 − 1.0 arcseconds.SLSNe, with their compact hosts, tend to show small offsets of a few arcseconds, whereas the distribution is much broader for typical SNe in extended galaxies.

DATA PREPROCESSING
In this section, we illustrate the steps used to clean and prepare our data set before training our model.

Image Preprocessing
Some of the images we obtained from the ZTF database were found to have irregular sizes, shapes and missing pixels.Such issues can be caused by a transient position close to the edge of the detector field of view, or nearby bright stars that are masked out (but can still leave diffraction spikes or subtraction artefacts).Examples are shown in Figure 5.These poor quality images can severely impact the training process.Therefore, there are a range of ways to identify, modify or delete them before training.

Image size cutout
An image with a shape slightly smaller than 60x60 pixels (for example, 58x58 pixels) will be expanded to 60x60 pixels by repeating the last row or column on each side.However, for those with very small sizes, they will be removed.On the other hand, those larger than 60x60 pixels are reduced to 60x60 size.

Quality check model
Images with missing or unreliable pixels are tricky to deal with, and those bad images greatly harm the training process.One common feature is that such images often have very large standard deviations (), much larger than normal images.However, our experiments showed that a quality cut based only on  still cannot get rid of a small number of problematic images that have reasonable standard deviations.Therefore, a binary convolutional neural network is developed to determine whether an image is good or bad.Firstly, we label those image with  > 1000 as 'bad', and manually select some examples of these bad images (in  and  bands).We label the others as 'good'.Then we feed them into a simple two-layer CNN classifier for training and testing.The outputs give the probability of being a good image, shown in Figure 5.Those good-quality images with a confidence greater than 0.5 are allowed for further processing, and those bad images are excluded.Figure 6a and Figure 6b show the confusion matrix and Receiver operating characteristic curve (ROC).The closer the curve is to the upper left corner, the more accurate the classifier is.The model rejects 98.4% of bad images, and so we apply it as the first stage of data preprocessing process.In following experiments, about 12 peak images of ZTF objects are removed, taking 0.22% of the whole image set.

Z-scaling and normalization
Astronomical pixel data can span a large dynamic range within a single image, which can cause problems for classifiers that need to learn faint features.The IRAF Z-scale algorithm2 , designed for displaying images as pixel intensity maps, is widely used to pick out features close to the background level of the image.The algorithm determines a minimum (z-min) and maximum (z-max) pixel value to display (pixels with values outside this range are displayed with the intensity at zero or saturated).
In our case, we apply the same z-scaling algorithm to replace any NaN values or anomalously faint pixels with the z-min value.We do not apply a mask for z-max, to avoid treating real features (such as a bright transient) as saturated.Min-max normalization is then applied to the scaled data, limiting the values to [0,1].

Data augmentation
Data augmentation is a technique to create additional artificial samples within a training set.This is particularly helpful when dealing with classes containing few examples, such as our SLSNe and TDEs.Augmentation techniques for images include resizing, random flipping (horizontally or vertically), and random rotation (between 0 and 360 degrees, with any missing data at the edges filled with neighboring values).It is developed as a custom layer built after the input layer for convenience.While training, the images will be randomly modified through this layer for each epoch.Flips and rotations mean that the model is not encouraged to incorrectly learn specific location or orientation features.We do not apply resizing, in order to preserve the pixel scale of the data.

Metadata preprocessing
Metadata consists of the light curve features and the host galaxy magnitudes, colours and offsets.Details are shown in Table 2. Currently, any missing metadata is replaced with zeros.Although a magnitude, time difference or offset of zero does have a physical meaning in this case, we find that adding zeros doesn't influence classifier performance in our experiments.Alternative methods will be considered in the next version of NEEDLE.
Data standardization is applied for data scaling.Every feature is assumed to follow a Gaussian distribution among all samples, and individual values are scaled by its mean and standard deviation.In this way, the model can learn different feature distributions of the three classes, individually.Such scaling data are stored with the model.

Data compression and indexing
In order to feed a large amount of pre-processed data into the classifier for training on any computing platform, one convenient method is to store and fetch the data with HDF5 binary data format (Collette 2013).This allows users to transfer data among different facilities easily, and accelerate the training time for parameter optimization.In addition, a custom index has been added to each sample participating in training and testing, which can help users easily trace their ZTF IDs, thereby assisting case studies.Here we store the image set, metadata set, labels, and sample index set in HDF5 format.Training/test set separation is conducted after loading the HDF5 data.

CLASSIFIER ARCHITECTURE AND TRAINING
In this section, we introduce the design of our NEEDLE code and discuss the details of the model architecture.We build our model within the Tensorflow Keras framework.We implement a custom Class called NEEDLE that inherits from Keras.Model.This Class includes the basic user-defined model functions (train, test, build, predict, loss function, etc), as well as model plotting and model visualization.

Hybrid neural network
To fully utilize the image and metadata, a hybrid model is required.Inspired by Carrasco-Davis et al. (2021), we build up a model that involves a block of convolutional layers for image inputs and a block of fully-connected layers for metadata inputs.
Figure 7 shows the model architecture.The image block consists of a data augmentation layer (random flipping and rotations) and two convolutional layers, each followed by a MaxPooling layer.The output of the last pooling layer is flattened into a 1D vector and fed into a fully connected dense layer with 64 neurons.The metadata block consists of two fully connected dense layers with 128 neurons each.The two types of outputs are then concatenated and fed into two dense layers (192, 32 neurons, respectively).Finally the outputs are fed into the output layer.
Each layer uses a ReLU acitivation function.The exception to this is in the final output layer, which uses a softmax activation function to provide the probablities that an object belongs to each class.

Training and test sets
Since the samples of SLSNe and TDEs are very small compared to the large number of normal SNe, training becomes difficult when more than ∼ 20% of them are put into the test set.Through experiments, we found that some objects are easily classified correctly with high probability, regardless of whether they are in the training set or the test set, however, some objects are difficult and return poor predictions.Therefore, a fair approach that still allows us to train our model on a reasonable number of objects is to give a unique random seed, shuffle the dataset and randomly select test objects, repeating this process and training the model several times to average the results.We choose to include 15 SLSNe, 15 TDEs and 15 SNe in the test set each time, and repeat this process 10 times to calculate the average model performance.

Weighted loss function
We start by importing the loss function SparseCategoricalCrossentropy from Keras.This is designed for multi-class tasks.As our training data have extremely unbalanced labels and the majority are SNe, the model will naturally learn more SN features to quickly decrease the loss function, resulting in poor predictions for other classes.One solution is that we give more weight to rare labels and less weight to common labels.In this way, our model can extract features of different classes equally.Our weighted loss function is Here  is the number of objects in one batch/epoch.  and   mean the number of objects from the -th class in the batch/epoch and in the whole training set, respectively.  is the class weight for the -th class.  and ŷ refer to the true class and the model prediction for one input, respectively.

Training optimization
To set the learning rate, we employ the ExponentialDecay method, which decreases the learning rate exponentially with growing steps while training.Equation 2shows the algorithm. 0 is the initial learning rate. means the user-defined decay rate. and  mean the training step and user-defined decay steps, respectively.Through experimentation, we find acceptable performance using the following parameters:   0 = 0.0002,  = 0.95 and  = 100.
An optimizer is the strategy for updating the weights and biases of neural networks, in order to help reduce the loss function to the desired minimum.For this model, we apply Adaptive Moment Estimation (Adam; Kingma & Ba 2014), which is a special stochastic gradient descent algorithm that updates weights using two exponential decay rates.
Overfitting is a non-negligible problem when training, which means that the model extracts noise 3 rather than real features from the training data, resulting in high accuracy on the training set but low accuracy on the validation and test sets.To avoid this, tensorflow.keras.callbacks.EarlyStopping is called to monitor the training process.When the loss for the validation set is not smaller than that at the previous 3 epochs, the training will stop.
3 This is a particularly important problem for astronomical data as they are inherently noisy

Optimal network architecture
We have tried and adjusted a variety of architectures and parameters.Given the size of the training set and the limited information in the images, a deep network is not suitable as it will likely lead to rapid overfitting and large fluctuations in the loss function.We therefore experiment with networks with only a few CNN layers.
KerasTuner (O'Malley et al. 2019) is an easy-to-use, scalable hyperparameter optimization framework that allows users to set ranges of neurons, activation function, and learning rates.It will automatically run every combination of configurations and search for optimal solutions.We apply this method to adjust the architecture and hyperparameters in NEEDLE.
The results show that a model with two Convolutional layers (each 128 3*3 kernels) for image inputs, and two fully-connected layers (each 64, 128 neurons) for metadata inputs, is able to perform the best predictions.The detailed architecture is shown in Figure 7.The learning rate, batch size and number of epochs are 3 −5 , 128, and 300, respectively.

EXPERIMENTS AND RESULTS
In this section, we investigate model performance on the ZTF BTS sample.In particular, we aim to determine which metadata are im- portant to include in our training and test sets, the expected purity and completeness, and how confidently we can predict the type of an object at early phases.We also present the NEEDLE pipeline that we are implementing on Lasair.

Classifier performance with & without host metadata
Initially, we train only with information available from the real-time transient alerts: the science image (here assumed close to the time of maximum light), the reference image, and the transient metadata such as magnitude and time since discovery.For convenience we call this version NEEDLE-T (for transient).We then retrain the model, this time including the cataloged host galaxy properties obtained from Sherlock and Pan-STARRS, and we label this version NEEDLE-TH (transient+host).
Figure 8 shows the confusion matrix with the completeness of the three models on the test set.The prediction is decided by the maximal probability among three classes.The values given in the confusion matrix are the averages of 10 model realisations with randomly shuffled test sets (containing 15 objects per class each time, with remaining objects in the training set).The initial NEEDLE-T classifier (Figure 8a) can recognize 79% of normal SNe and 76% of SLSNe in the test set on average.It is worth recalling that more than half of SLSNe in our sample do not have cataloged hosts, therefore only 41/87 SLSNe can be included in the full NEEDLE-TH model.If we train NEEDLE-T only on these objects with detected hosts (enabling a fair comparison later with NEEDLE-TH), the averaged true positives of SLSNe decrease slightly to 55%, and the large range shows that the predictions are less stable with the smaller sample.This is shown in Figure 8b.Adding the host metadata in NEEDLE-TH (Figure 8c) improves the performance for SLSNe to 77% on average, despite the smaller sample size, showing the importance of including host galaxy information.Even in the worst-performing model, at least 53% SLSNe are correctly identified with the help of host magnitudes and colour information, and the highest completeness reaches 93%.
This effect is even more pronounced for TDEs.For NEEDLE-TH, the average true positive rate for TDEs grows from 57% to 72% with the addition of host information.While galaxy colours do differ for TDEs compared to the other transient types, more likely this improvement reflects the fact that all TDEs in the sample have a small offset because they occur in the nuclei of their hosts.

Completeness and Purity
Figure 9 shows the completeness and purity trends on the unseen test set with increasing probability thresholds for classification, in both NEEDLE-TH (corresponding to Figure 8c) and NEEDLE-T (for only those objects having cataloged hosts, corresponding to Figure 8b).Here a class is only assigned if (class) >  for the most probable class.In each case we show the average and standard deviation of 10 trained models.
We note that the purity achieved with our balanced test set will likely not reflect the purity obtainable in a real survey, due to the large imbalance in rates between SLSNe/TDEs and normal SNe.Therefore, when selecting objects in real time, one may wish to choose a high probability threshold to minimise the absolute numbers of normal SNe mis-classified as SLSNe or TDEs.
Figure 10 shows confusion matrices for transients classified with probability (class) >= 0.75.We show the completeness for NEEDLE-TH on the balanced, unseen test sets (Figure 10a) and completeness and purity matrices for the full data set (Figures 10b,  10c).With (class) >= 0.75, NEEDLE-TH can correctly classify 95% TDEs and 97% SLSNe-I in the full data set.However, for even just a few % SN contamination, this results in a real-world purity of around 20% for the rare classes, showing the importance of choosing a probability threshold carefully.NEEDLE is designed to select young SLSN and TDE candidates for spectroscopic follow-up, rather than to produce large photometric samples.Therefore, a purity of a few ×10% is an acceptable price for the high completeness.
We also investigate the importance of including host galaxy metadata.The diagrams show that SLSNe and TDEs essentially always gain higher completeness and purity when host metadata is included.10a) and the full set (Figure 10b).The third CM (Figure 10c) shows the purity of each class in the full set, which is representative of the expected balance in real-time alerts.

Classification from early detections
As NEEDLE is designed to provide a probability for each label after only a few early detections, we also test the average performance of NEEDLE-TH over time since explosion by attempting to classify a time series of pre-peak detections of 30 randomly selected objects in each class.We show the predicted (SLSN) against time before peak for 30 SLSNe, and (TDE) for 30 TDEs, in Figure 11.
For most SLSNe (Fig 11a) and TDEs (Fig 11b), the probability assigned to the correct class grows as the events approach the peak.This is likely due to the longer baseline over which the light curve features can be evaluated, indicating that properties such as light curve rise time and slope and host galaxy contrast are important features in NEEDLE.This is particularly apparent in the case of SLSNe, where magnitude contrast with the host galaxy (which is maximised at light curve peak) is also an important feature.

Real-time annotation on Lasair
We aim to provide NEEDLE classifications in close to real time via the LSST:UK alert broker, Lasair.Our classifier will digest incoming transients from a pre-filtered Kafka stream produced by a simple Lasair query, using data from ZTF (or LSST in the future), and provide the probabilities of different classes for each object.To return our classifications to the broker, we make use of the  tator 4 feature, which allows verified users to add information to the transients database in a format that is query-able by another user.
Figure 12 shows the process in detail.NEEDLE is trained and tested using the ZTF alerts coming from Lasair.New alerts will be filtered by a customized SQL query to provide only young, reliable, extragalactic, non-repeating transients.Specifically, we retain events: • discovered within the last 60 days • with more than 3 confident detections (to reduce the chance of bad subtractions) • predicted to be a Supernova or Nuclear Transient by Sherlock (i.e.not a known AGN or Galactic variable).
Then, NEEDLE selects the brightest available detection as the input image, if it passes the quality image checker.NEEDLE then collects the host coordinates and photometry from Sherlock and Pan-STARRS, computes the predicted probabilities from the trained network, and sends them back to Lasair as annotations.
We have tested this process end-to-end with a preliminary version of NEEDLE.Our goal is to run the fully trained NEEDLE model automatically on all ZTF alerts passing our SQL filter, and release the results as a public stream on Lasair, beginning in early 2024.

Individual discrepancy among rare transients
As mentioned in Section 5.2, the difficulty of classifying each individual object in our data set varies.One reason for this may be issues with the host galaxy metadata.In the Pan-STARRS survey, very nearby resolved galaxies may be broken into multiple sources by the survey photometry pipeline, resulting in underestimated host magnitudes.Failed host association may also cause issues, leading to the wrong photometry being retrieved.This is a particular problem for SLSNe, where many of the true hosts are not detected.
We also identify several real features of our objects that influence the ease of classification.For SLSNe, we found those that are easily classified often have relatively high  discovery and Δ discovery , low Δ discovery , in a slightly bluer and faint (low  −  and  − ) host galaxy, i.e. they are bright with a slow rise and a star-forming host, consistent with classic SLSNe in the literature.SLSNe in slightly more massive galaxies, or with short rise times, are more difficult to separate from normal SNe.
For TDEs, objects are most easily classified if they have a bright  discovery and a shorter Δ discovery than typical SLSNe.This could also occur because these events occur in the nuclei of galaxies, and so tend to be found closer to peak unless the flux contrast with the host is large.In future work we will investigate in more detail how to optimise the training process to account for these variations.

Comparisons to previous classifiers
In recent years, several transient classifiers have been designed that can recognize TDEs and in particular SLSNe.Some of them gain excellent accuracy for SLSNe by making use of their uniquely slow light curves.For the same reason, many of these classifiers show better performance when more light curve data are available at later phases.Table 3 shows comparisons of these classifiers with our NEEDLE Classifier.The advantage of NEEDLE is that we do not require multiple detections or host redshifts as input because at LSST depth, few galaxies will have spectroscopic redshifts.We only use single-stamp images, alert photometry and cataloged host magnitudes (when available), enabling an informed real-time prediction from as little as one detection.Furthermore, all data used in training is from real survey detections rather than simulations.It is likely that we could gain an even better performance by making use of more detailed light curve information, and this is the aim for future development.However, the goal of NEEDLE is not to produce pure samples of photometrically classified events, but to provide probabilities of potential SLSNe and TDE at an early stage to guide spectroscopic follow-up.From this perspective, completeness may be more important than purity.

Remaining difficulties and future improvement
While the NEEDLE algorithm is performing well on the ZTF data set, we are continuing to develop the code and plan a number of future improvements to deal with current limitations, including: • Unbalanced classes.The rare transients we focus on, including SLSNe and TDE, have less than 100 samples for each.After being split into training and test sets, fewer samples are actually used for model training.Weighted loss functions can solve this problem to a certain extent, but feature extraction of rare classes requires more samples and smarter algorithms, such as small-sample learning.
•  value replacement and padding.Replacing  values with zero poses difficulties for classification, since zero has physical meaning for magnitudes and image pixels.However, given the input requirements of neural networks, some kind of padding is inevitable.The possible solution is to fill in the missing values based on context and modelling.
• There is a large fraction of SLSNe without cataloged hosts, and in the early years of LSST, this fraction will increase at higher redshifts, and affect the other classes too.To mitigate this, we will continue to develop and apply the NEEDLE-T version of the code in parallel for such cases.
• Including more contaminants in our training set.Currently we assume that contaminants such as AGN and variable stars can be rejected by simple Lasair filters before they reach NEEDLE.This may not be the case in future, deeper surveys like LSST.
• LSST alert cutouts in real-time are much smaller than for ZTF, and full-size images will only be available after 80 hours.To achieve real-time prediction, older images might need to be included in classification and training, rather than just the most recent detection.
Additionally, we have further plans for new features, and analyses to improve our training process.These include: • A detailed study of mis-classified objects.The next step will be to visualize the model behaviour for such objects individually, and try to understand the reasons for mis-classification.
• Including more time-domain information.Rather than one image and a set of simple light curve features, using more advanced features, including the light curve directly, or even providing a time series of images, may help to improve performance.For the next Classifier, Conv3D and other relevant networks, such as Recurrent Neural Networks, will be considered.
• Early stage classification.The ultimate goal of our classifier is to identify rare events in their early stages, even before their peak.With the addition of images at multiple epochs, we will analyze the trends in accuracy as more observations are added.In balanced test sets: TDE completeness of 77.0%, purity of 80.3% .

Figure 1 .
Figure 1.Examples of the science, reference and difference images of three general classes: SLSN, TDE and SN.

2. 2
Machine learning architectures on transient classification Machine learning has been widely applied to astrophysical transients for classification tasks, such as BDT (SNGuess, Miranda et al. 2022; Avocado, Boone 2019; Sherlock, Smith et al. 2020), RF (FLEET, Gomez et al. 2020; Baldeschi et al. 2020; ALeRCE light curve classifier, Sánchez-Sáez et al. 2021), and Neural Networks.Particularly, deep learning algorithms (deep neural networks) have shown powerful performance in extracting features of data to improve classification without manual feature selection.

Figure 4 .
Figure 4. Corner plot showing host galaxy magnitudes in ,  and  bands, as well as the colours in  −  and  − .We also show the offset, meaning the projected distance between the transient and its likely host in arcseconds.

Figure 5 .
Figure 5. Bad and good images samples with the quality check classifier predictions.Images with red probabilities are judged as 'bad images' and removed before feeding into NEEDLE.

Figure 7 .
Figure 7. Model architecture for the full NEEDLE classifier.The only difference between the NEEDLE-T and NEEDLE-TH variants is the length of the metadata input.

Figure 8 .
Figure 8. Confusion matrices of NEEDLE-T and NEEDLE-TH (without and with host photometry) classifiers on an unseen test set.For each classifier, the random seed for initializing parameters is unique, and the test/training sets are randomly shuffled before training.The values reported in each confusion matrix are the averages for 10 realisations of this process, and the ranges across all 10 models are shown in brackets.

Figure 10 .
Figure 10.Confusion matrix (CM) of NEEDLE-TH restricted to objects with classification confidence  (class) > 0.75, for the test set and the full data set.The values reported in each confusion matrix are the averages after training 10 times with randomly shuffled test and training sets, and the ranges across all 10 models are shown in brackets.The first two CMs depict the completeness in the test set (Figure10a) and the full set (Figure10b).The third CM (Figure10c) shows the purity of each class in the full set, which is representative of the expected balance in real-time alerts.

Figure 11 .
Figure11.Probability heatmaps for SLSNe-I (11a) and TDEs (11b).In each class, 30 objects are randomly selected.The x-axis is the date starting from 60 days before the peak of the event to the peak date.The y-axis is the ZTF objects' names.The colour bar corresponds to the probability range.ZTF objects are sorted in descending order of their peak probability.

Table 1 .
The number of objects in each class with images and light curve information from ZTF, as well as those with host galaxy matches in existing catalogs from Sherlock.The information is provided separately for the  and  bands.
1 https://sites.astro.caltech.edu/ztf/bts/explorer.php?f= Corner plot showing distributions of selected light curve features in the data set.To show the difference between distributions more clearly, the light curve slope ( peak −  disc /( peak −  disc ) and rise time  peak −  disc have been scaled logarithmically.

Table 2 .
Cumulative distribution functions for transient contrast with host galaxy  peak −  host and approximate rise time  peak −  discovery .Host magnitude difference between  and  bands   − Host magnitude difference between  and  bands separationArcsec The distance between the host centre and transient on the image Δ ℎ −  Magnitude difference between the host magnitude and   Summary of light curve and host galaxy features included in our metadata.

Table 3 .
TiCC transient models.A matrix with each row composed of the imputed light-curve fluxes for each band, repeated values of the host galaxy redshift, and the MW dust reddening.In (nearly) balanced test sets: for SLSNe, the accuracy is 0.83 (2 days after trigger) to 0.85 (40 days); for TDEs, the numbers are 0.59 and 0.86.In unbalanced test sets: for SLSNe, the completeness and purity are 0.76 and 0.81, with threshold larger than 0.7, their values increase to 0.83 and 0.91, respectively.Host redshift is considered.Principle Component Analysis coefficients on modelled light curves with known redshifts.In unbalanced sets: for SLSNe, the completeness and purity are 0.82 and 0.67, respectively.Science and reference image in a single band, simple light curve and most galaxy metadata.For SLSNe-I, averaged completeness is 0.77, averaged purity is 0.82 in the test sets.For TDE, the numbers are 0.72 and 0.79.Comparisons among various transient classifiers for SLSNe and TDEs.