LSST Cadence Strategy Evaluations for AGN Time-series Data in Wide-Fast-Deep Field

Machine learning is a promising tool to reconstruct time-series phenomena, such as variability of active galactic nuclei (AGN), from sparsely-sampled data. Here we use three Continuous Auto-Regressive Moving Average (CARMA) representations of AGN variability -- the Damped Random Walk (DRW) and (over/under-)Damped Harmonic Oscillator (DHO) -- to simulate 10-year AGN light curves as they would appear in the upcoming Vera Rubin Observatory Legacy Survey of Space and Time (LSST), and provide a public tool to generate these for any survey cadence. We investigate the impact on AGN science of five proposed cadence strategies for LSST's primary Wide-Fast-Deep (WFD) survey. We apply for the first time in astronomy a novel Stochastic Recurrent Neural Network (SRNN) algorithm to reconstruct input light curves from the simulated LSST data, and provide a metric to evaluate how well SRNN can help recover the underlying CARMA parameters. We find that the light curve reconstruction is most sensitive to the duration of gaps between observing season, and that of the proposed cadences, those that change the balance between filters, or avoid having long gaps in the {g}-band perform better. Overall, SRNN is a promising means to reconstruct densely sampled AGN light curves and recover the long-term Structure Function of the DRW process (SF$_\infty$) reasonably well. However, we find that for all cadences, CARMA/SRNN models struggle to recover the decorrelation timescale ($\tau$) due to the long gaps in survey observations. This may indicate a major limitation in using LSST WFD data for AGN variability science.

(e.g.Brockwell & Davis 2002 ;Kelly et al. 2009 ;Kozłowski et al. 2010 ).This was later extended to a more flexible and scalable model -the Continuous Autore gressiv e Mo ving Av erage (CARMA) model (Kelly et al. 2014 ;Feigelson, Babu & Caceres 2018 ;Moreno et al. 2019 ).CARMA models are not physical models, but rather a statistical description that characterizes a time-series stochastic process.CARMA models are notated as CARMA( p , q ) where p gives the order of the Autoregressive (AR) process and q gives the description of the Moving Average (MA) process.The AR response can be thought of as the forecasting part, while the MA model gives the input impulse(s).The Power Spectral Density (PSD),2 the Autocorrelation Function (ACF), 3 and the Structure Function (SF)4 can all be calculated for CARMA models.Moreno et al. ( 2019 ) present a detailed CARMA handbook for optical AGN variability, discuss CARMA models and their associated statistical descriptions in full detail, and illustrate the bridge between discrete ARMA and time-continuous CARMA for fitting the irregular sampling of light curves.
The simplest CARMA model, CARMA(1,0) is the well-known Damped Random Walk (DRW).The DRW, and its ACF and SF can be expressed as (1) (2) where α 1 is the C-AR coefficient and β 0 is the coefficient of the random perturbations.In the case of AGN, x corresponds to the flux or magnitude.W ( t ) is a Wiener process, and d W ( t ) means a white noise process with μ = 0 and variance = 1 (Kelly et al. 2014 ).t is the difference between two MJDs.There are two parameters that can be obtained from DRW to capture the statistics of AGN variability: τ , the characteristic damping (or signal decorrelation) time-scale, and SF ∞ , the long-term variability amplitude.
The DRW model is a good description of long-term quasar variability and as such is often applied to data from large-area sky surv e ys.MacLeod et al. ( 2010 ) confirmed that a DRW model fits well for ∼9000 Sloan Digital Sky Survey (SDSS) Stripe 82 quasar light curves, and analysed correlations between the observed variability parameters and AGN physical parameters including black hole mass, redshift, luminosity, and rest-frame wavelength of emission.Suberlak, Ivezi ć & MacLeod ( 2021 ) built on this work by adding Pan-STARRS1 photometry to the SDSS Stripe 82 data, generating light curves up to 15 yr in length.They found that the variability amplitude is a stronger function of the black hole mass, and that it (and τ ) has a weaker dependence on quasar luminosity than initially found in MacLeod et al. ( 2010 ).Kozłowski ( 2016bKozłowski ( , 2017 ) ) investigated systematic biases (photometric noise, etc.) in SF measurements.They applied Monte Carlo simulations to AGN light curves, and showed that accurate estimation of DRW parameters requires the observation sampling to be at least 10 rest-frame decorrelation time-scales.Kozłowski ( 2017 ) note this is because observations shorter than ∼10 τ are insufficient to fully sample the PSD.Thus, due to the limited observation lengths of astronomical surv e ys, the estimations of DRW parameters may fall into this unconstrained region resulting in biases.Kozłowski ( 2016a ) reported that DRW modelling can also work well for non-DRW processes, and should not be regarded as a proxy for the physical process underlying the variable emission.On shorter time-scales, other models may be more appropriate.Kasliwal, Vogeley & Richards ( 2015 ) studied a number of Kepler AGN light curves (Howell et al. 2014 ).Compared with ground-based surveys, Kepler has higher photometric precision and denser observation cadences, but shorter surv e y length.Kasliwal et al. ( 2015 ) found that AGN with Kepler light-curve information had log-PSD slopes steeper than that of DRW, suggesting that the variability may be better captured by another process other than AR(1).
Combining SDSS and data from Kepler's second mission, K2, Kasliwal, Vogeley & Richards ( 2017 ) discuss DRW and a higher order CARMA(2,1) model, the Damped Harmonic Oscillator (DHO), and indicate that an o v erdamped DHO 5 may be a better description of the AGN Zw 229-15 (see figs 1 and 3 from Kasliwal et al. 2017 ).The PSD time-scale features for Zw 229-15 are also reported by Edelson et al. ( 2014 ) and Williams & Carini ( 2015 ).Kov a če vi ć, Popovi ć & Ili ć ( 2020 ) present a method to model AGN variability using a representation of the DHO model with Gaussian processes 5 Whose IR mo v es slowly toward equilibrium.(GPs), and successfully detect variability due to continuum emission and (broad) line emission.Thus, DRWs and DHOs are useful descriptions of quasar light curv es.The ke y goal of CARMA models now is to accurately measure the model parameters from observed quasars, and use this information to study the underlying physics.We summarize the working equations for the DRW = CARMA(1,0) and DHO = CARMA(2,1) models in Table A1 .Table A1 presents the differential equations, input parameters, impulse response (IR, also called Green's function), SF , ACF , Autocovariance Function (ACVF), and PSD for DRW and DHO processes.Table A2 explains all acronyms and notation.Table 1 shows all acronyms used in this paper.
Our ability to study AGN variability will soon be transformed by the Vera Rubin Observatory (VRO), conducting the LSST.The VRO telescope, located on Cerro Pachon in Chile, has an 8.4-m (6.5-m ef fecti ve) primary mirror, with a 9.6-deg 2 field of view, a 3.2gigapixel camera, and six filters ( ugrizy ) covering the wavelength range 320-1050nm.
LSST will repeatedly observe millions of objects, with ≥825 visits for any given point in the survey footprint and a single-visit depth ≈ 24.5 mag in the r band.LSST will co v er the whole Southern sky, and part of the Northern sky, as part of the Wide-Fast-Deep (WFD) surv e y.There are also five Deep Drilling Fields (DDFs; LSST Science Collaboration et al. 2009 ).The WFD will take about 90 per cent observing time, with > 10 million quasars projected to be identified, though the cadence strategies will have an effect on the efficiency of quasar identification (Ivezic 2016 ).This has moti v ated recent white papers from the LSST AGN Science Collaboration for estimating the influence of cadence strategies on AGN astrophysics MNRAS 512, 5580-5600 (2022) studies. 6Kov ace vic et al. ( 2021b ) hav e pro vided statistical proxies to measure the LSST cadence effects on AGN variability observations and Kov ace vic et al. ( 2021a ) provide two metrics: based on AGN time lag and periodicity, and on the SF. .Such models simulate AGN light curves ahead of the start of LSST surv e y operations.Analysing reco v ery of parameters (including e.g.characteristic time-scales, PSDs, and IRs) from the simulated data under different cadences can tell us the potential systematic bias.
With the development of large sky surveys such as SDSS and LSST, astronomy has become a data-intensive science.Consequently, there have been recent attempts to use ML techniques to address the data challenges set by LSST, especially for classification, forecasting, and parameter estimations.For example, the Photometric LSST Astronomical Time Series Classification Challenge included 175 500 simulated AGNs (among other transients) to test classification algorithms (e.g.Boone 2019 ;Kessler et al. 2019 ;Hlo žek et al. 2020 ).Rele v ant to the AGN study (Jankov et al. 2022 ) based on LSST cadence strategies, in this paper, we aim to quantify the influence of different LSST cadence strategies on AGN time-series data, in order to see if contemporary ML algorithms can ef fecti vely reco v er CARMA model parameters.
We implement an SRNN to model quasar light curves and reco v er the DRW and DHO model parameters by GPR.RNN are a popular class of ML connectionist models for sequential modelling and have been used previously in astrophysics applications (e.g.Charnock & Moss 2017 ;Hinners, Tat & Thorp 2018 ;Naul et al. 2018 ;Muthukrishna et al. 2019 ;Becker et al. 2020 ;Escamilla-Rivera, Carvajal Quintero & Capozziello 2020 ;M öller & de Boissi ère 2020 ;Burhanudin et al. 2021 ;Lin & Wu 2021 ).Ho we ver, as noted in Yin & Barucca ( 2021 ), one limitation of RNNs is that the hidden state transition function is entirely deterministic, which can limit the RNNs ability to model processes with high variability.Thus (and as far as we can tell, for the first time in astrophysics research), we implement the SRNN, in order to recreate quasar light curves.The SRNN differs from the traditional RNNs in that the RNN hidden cells invoke a probabilistic (often Gaussian) distribution to generate a level of stochasticity that impro v es longer term temporal forecasting.As such, the SRNN can be somewhat thought of as a combination of an RNN and ideas from a VAE.
We have two main goals: (i) To test how well the SRNN can reco v er and predict observations when dense or uniformly seasonal light curves are set as inputs.
(ii) To predict AGN behaviour during gaps between seasons in 10-yr LSST-simulated light curves, and see how SRNN could help reco v er DRW and DHO parameters.This paper is organized as follows.In Section 2 , we describe our observational data and the methods (including DRW and DHO) we use to generate sample quasar light curves.Section 3 presents the details on the LSST observing strategy and associated cadences.In Section 4 , we describe the ML algorithms we use to e v aluate the model quasar light curves.We report our key results in Section 5 and discuss these results in the context of quasar studies and LSST in Section 6 .Section 7 presents our conclusions.In Appendix A , we write down the fundamental parameters and equations for the CARMA models, and in Appendix B , we detail the implementation of the SRNN.We report all magnitudes on the AB zero-point system (Oke & Gunn 1983 ;Fukugita et al. 1996 ) unless otherwise stated.All logarithms are to the base 10.

Q UA S A R DATA A N D M O D E L L I G H T C U RV E S
In this section, we give the details of our quasar data, including both observed quasars from previous sky surveys, used to provide an input set of representative statistical parameters, and the model light curves we generate.For the observed quasar data we will focus on the well-studied SDSS Stripe 82 field.The key analysis codes of this section are available via a github repository.
For our study we will concentrate on the ∼9000 spectroscopically confirmed quasars from SDSS S82 reported in MacLeod et al.  ( 2010 ). 7MacLeod et al. ( 2010 ) model the time variability using the DRW and measure the characteristic time-scale ( τ ) and an asymptotic rms variability on long time-scales (SF ∞ ).Also reported for this data set is the binary value edge (if the observation is close to the field edge) and a set of probabilities: P like (log likelihood of a DRW solution), P noise (log likelihood of a noise solution), and P inf (log likelihood of τ → ∞ ).
In order to have a sample of objects that have well-fitted DRW parameters, we selected those with edge = 0, P like-P noise > 2, and P like-P inf > 0.05.We also remo v e objects with τ < 0 or τ > 10 5 d to allow convergence in τ .With these selections, the number of quasars in the sample is 7384.We plot the SF ∞ and τ distribution of these quasars in Fig. 1 .

Model quasars
Our aim is to analyse the influence of surv e y cadence on quasar modelling in LSST.We note again that although CARMA models are not physical models, they are appropriate approximations of quasar light-curve properties.As such, we simulate the light curves using a DRW and DHO implementation.We first generate 10-yr light curves (consistent with LSST surv e y length) with dense, daily observations, which can later be sampled at different realistic cadences.The steps to generate light curves are described in the following sections.

DRW-simulated light curves
For the DRW model, two input parameter choices are required: SF ∞ and τ , in addition to the redshift ( z).To generate our model light curves, we sample the SF ∞ -τ parameter space from the S82 data set  1 .We take the values reported at a given SF ∞ -τ coordinate and add a 'scatter' in the range −0.05 < ι < 0.05 (as determined by a random number generator) to each ordinate separately.We generate 30 000 DRW light-curves parameter pairings in this manner.
The four black stars in Fig. 1 are four DRW pairs, and we show their associated light curves, SFs and PSDs in Figs 2 and 3 .Given a fixed value for SF ∞ and a set rest-frame observation length, longer characteristic time-scales ( τ ) will lead to less fluctuation.When τ is short compared with the rest-frame observing duration, the variance of the light curve will tend towards σ 2 and the estimated parameters will approach the underlying values.

DHO-simulated light curves
The literature is not so comprehensive for DHO modelling of UV/optically bright AGN, though there are papers that report the PSD slopes of some quasars are not consistent with DRW (e.g.Kasliwal et al. 2015Kasliwal et al. , 2017 ; ;Moreno et al. 2019 ).Consequently, these illustrate that a more complex CARMA process, DHO, might be more suitable for describing some quasars with (quasi-)periodicity features, such as weak oscillations, though such research is purely based on the statistical analysis of their variability rather than any assumptions of deterministic/periodic physical processes.As such, we build a DHO set including both 'o v erdamped' and 'underdamped' cases.
Five parameters are required as inputs: β 0 , β 1 , ξ , τ decay , and τ QPO [the time-scale of quasi-periodic oscillations (QPOs)] in the 'underdamped' case or τ rise (the time-scale to brighten in response to an impulse) in the 'o v erdamped' case (see Table A2 for details).To build the light curves, the redshift and SF ∞ for each light curve are randomly sampled from the S82 distribution.τ decay is set to vary from 60 to 200 d.MA coefficients β 0 and β 1 are set with constant and small values to ensure that dependent parameters remain in reasonable ranges (see Appendix A1 ).Additionally, we set 2 < ξ < 5 for the o v erdamped case, 0 < ξ < 1 for the underdamped case.In this way, we can calculate the corresponding C-AR coefficients α 1 and α 2 using the recipes in Table 2 .
Fig. 4 shows four DHO light curves, including two underdamped and two o v erdamped, with their IR functions.F or the underdamped DHO, the IR oscillates and gradually returns to a steady state, whereas in the o v erdamped case, it gradually mo v es to its steady state without oscillation.Analogous to Figs 2 and 3 for the DRW model, Fig. 5 depicts the PSD and SF for the underdamped and o v erdamped DHO.

GPRs and Eztao
A GP is a generalization of the Gaussian probability distribution and a GPR model provides uncertainty estimations together with prediction values.GP8 and GPR are discussed e xtensiv ely elsewhere (e.g.Rasmussen & Williams 2006 ).
CARMA model can be well expressed by a GP model which consists of a mean function and a covariance matrix (also called kernel).F or e xample, the kernel for the simplest CARMA process (DRW) can be written as (4) We generate the DRW and DHO light curves using the Eztao Python package (Yu & Richards 2022 ).Eztao is a Python toolkit for conducting time-series analysis using CARMA processes.Building on work by Rybicki & Press ( 1995 ) and in particular Foreman-Mackey et al. ( 2017), EzTao uses celerite (a fast GPR library) to compute the likelihood of a set of proposed CARMA parameters given the input time series.
Here, we use EzTao to model and produce GPRs that give uncertainty estimations together with predictions for DRW and DHO light curves.

Colour-r edshift corr elation and SF ∞
The observed colour of a quasar changes with redshift (e.g.Richards et al. 2001 ).To quantify this, we select 151 362 quasars included in the SDSS DR16 Quasar catalogue (Lyke et al. 2020 ) that are also in the UKIDSS (Lawrence et al. 2007 ) footprint.We convert from UKIDSS/WFCAM9 Vega Y -band to LSST AB y -band,10 and calculate the colours ( u − g ), ( g − r ), ( r − i ), ( i − z), and ( z − y ) from redshift z = 0.00 − 5.00, in redshift bins of z = 0.05.We initially generate mean magnitudes in the g band, and normalize the other bands using these colour relations.
Examining the SF ∞ values for 9258 S82 quasars from MacLeod et al. ( 2010),11 we find that SF ∞ is larger in bluer bands.As SDSS does not provide a y filter, we calculated the SF ∞ and τ ratios for the S82 quasars in six bands (shown in Table 3 ), and fix the ratio of y band SF ∞ ( y )/SF ∞ ( u ) = 0.61, and τ ( y )/ τ ( u ) = 1.26.

LSST photometry and photometric error
Here, we note the expected photometric performance of LSST, and tie this to our synthetic model quasar light curves.Detailed LSST performance metrics are given in Ivezi ć et al. ( 2019 , section 3.2.1),and the expected photometric error in magnitudes for a single visit can be written as  Table 2. DHO parameters for underdamped and o v erdamped cases.These values restrain the range of the real parameters, but they are not applied with fixed increments to a v oid combinations that lead to unrealistic derived statistical parameters (calculated using where σ sys and σ rand are the systematic and random photometric error, respectively, x ≡ 10 0 . 4(m −m 5 ) and m 5 is the typical 5 σ depth of point source at zenith for each visit.Given the fact that the calibration system and procedure are set to maintain σ sys < 0.005 mag, we assume σ sys = 0.004 mag.γ is a band-dependent parameter.Following Table 2 in Ivezi ć et al. ( 2019), we set γ u = 0.038 and γ g , r , i , z, y = 0.039.LSST Operations Simulator (OpSim) provides the m 5 for each visit in a proposed cadence strategy.In this way, for one original simulated photometric value, a photometric error is randomly selected from a Gaussian distribution, with mean equal to 0 and variance equal to σ 2 LSST .

Model light-cur v e data products
Following the steps above, we generate a data set containing light curves generated from the DRW and DHO models.We simulate 200 objects from each of the DRW, DHO-o v erdamped and DHOunderdamped models, and for each object, light curves in six bands are generated, with one observation per day.The mean magnitude, SF ∞ , and τ follow the restrictions detailed in Section 2.2 .These light curves will be used in Section 4 for the SRNN analysis, and are also made available to the community. 12 (RHS).The black and grey dash lines depict τ decay and 5 τ decay , respectively.It is worth noting that when t is equal to 5 τ decay , the SF has been steadily close to SF ∞ .The blue dash line on the right plot represents τ blue , where the o v erdamped SF slopes just decrease (Moreno et al. 2019 ).Document).The baseline design elements for the WFD are: (i) co v er at least 18 000 deg 2 ; (ii) average 825 visits per field, in all filters, o v er 10 yr, and (iii) obtain same-night, same-field revisit 'pairs'.Within these bounds, several key characteristics that will define the main WFD surv e y remain to be determined, including: How the surv e y area is defined ( footprint , for short); how often each WFD field should be revisited ( cadence ) -both for intra-and inter-night visits; and what are the optimal filter distributions for the WFD fields ( filter ) and the optimal intra-night filter pairs for WFD revisits ( colours ).

LSST cadence strategies
The LSST is a complex survey with numerous science drivers.To this end, nearly 200 simulations of observing strate gy hav e been generated that look into ho w dif ferent observ ations will dri ve the science goals and various metrics (see e.g.Lochner et al. 2021 ).
The surv e y strate gies are summarized online13 and we use the v1.7 version.14There are 'families' of observing strategies, with members of each family having related traits, e.g. the visit time family are simulations examining the effect of the length of the individual visits, and the e.g.u long family are simulations bearing on the length of the u -band exposure time.
The LSST OpSim is designed to simulate LSST observing strate gies o v er the 10-yr surv e y, and different strate gies hav e been tested to consider different scientific requirements (LSST Science Collaboration et al. 2017 ).Since this work is concerned with the (i) baseline nexp2 v1.7 10yr , which we call baseline for short.This is the baseline WFD footprint, with the default observing beha viour ha ving visits of 2 × 15-s exposures.
(ii) u long ms 60 v1.7 10yr , which we call u -long for short.Observations in the u filter are taken as single snaps, and we test increasing u -band exposure times.The cadence we choose has 1 × 60 s, with the number of u -band visits left unchanged, resulting in a shift of visits from other filters to compensate for the increase in u -band observing time.With quasars being bright in the UV/optical, we are keen to see if additional u -band exposure improves the recovery of light-curve parameters.
(iii) filterdist indx4 v1.5 10yr , which we designate filterdist for short.The aim is to e v aluate the impact of changing the balance of visits between filters, where again we choose a ' uband heavy' cadence as our focus is on AGN.
(iv) cadence drive gl200 gcbv1.7 10yr , which we designate cadence drive for short.This investigates the impact of reducing the gaps between g -band visits o v er the month, essentially down-weighting the lunar cycle.This aims to a v oid long gaps in g -band co v erage with the goal to impro v e transient disco v ery and variable characterization for longer time-scale objects which require bluer filter co v erage (such as AGN and Supernovae).Our chosen cadence has 200 fill-in g -band visits each night in a contiguous area.
(v) rolling scale0.9nslice3 v1.7 10yr , which we designate rolling for short.A rolling cadence is where some parts of the sky receive a higher number of visits during an 'on' season, followed by a lower number of visits during an 'off' season.During the first 1.5 yr and the final 1.5 yr of the WFD surv e y, half the sk y is co v ered uniformly, allowing for better proper motion co v erage.This leaves 7.0 yr for 'rolling' observations, with the benefit that transient and variable phenomena are better observed, at the cost of each of the middle 7 yr will have no uniform survey coverage.Full details of the rolling cadences are given in Yoachim ( 2021 ).
We summarize the salient details of these strategies in Table 4 .

Light cur v e with fiv e cadence strategies
We selected and tested the LSST cadence simulations on the SciServer and used the Metrics Analysis Framework (Jones et al. 2014 ), to analyse the OpSim-simulated surv e ys.We chose several sky positions in WFD fields, and assume that each position corresponds to a quasar object.Given a set of DRW/DHO parameters, a mean magnitude, a chosen band, and an LSST cadence, we provide a QuasarMetric to return a realistic LSST-cadence light curve with MJDs, magnitude, and magnitude error.This metric can be used for an y WFD surv e y strate gy, so will be beneficial for future AGN lightcurve analysis.
Fig. 6 presents a DRW-simulated light curve as observed under five cadence strategies.In the next section, we will discuss the influence of different strategies on quasar modelling with GPR and ML.

S TO C H A S T I C R E C U R R E N T N E U R A L N E T W O R K S
In this section we give a very high-level theoretical outline of SRNNs.We describe the SRNN architecture for our investigations, and how we implement the SRNN in practice.We note Fabius & van Amersfoort ( 2015 ), Chung et al. ( 2016 ), Fraccaro et al. ( 2016 ), Schmidt & Hofmann ( 2018 ), and in particular the notation from Yin & Barucca ( 2021 ) -as given in Table 5 -as important influences in what follows.

SRNN high-level overview
Derived and inspired from Bayer & Osendorfer ( 2015 ), Fraccaro et al. ( 2016 ) propose the idea of propagating stochasticity in a latent state representation with RNNs.They stack an SSM on deterministic RNNs to achieve a stochastic and sequential generative model and a structured variational inference network, which produce the output sequences and provide the model's posterior distributions, respectively.
This algorithm is particularly suitable for CARMA modelling as there are stochastic features, and CARMA can be represented as a format of SSMs.As such, here we applied SRNN to ingest AGN light curves with different LSST cadences and bands, and output modelled light curves with denser observations.The implementation of our SRNN, with the generative model and the inference model that we use in our study, is outlined in Fig. 7 .

Generative model
The role of the generative model is to establish probabilistic relationships between the target variable y t , the intermediate variables of interest ( h t , z t ), and the input x t .Within the generative model, a key part of how the RNN becomes an SRNN is the SSM.Inside a 'classical' RNN, the evolution of the hidden states h is go v erned by f , a non-linear transition function: h t+ 1 = f ( h t , x t+ 1 ), where x is the input v ector.F or an SSM ho we ver, the hidden states are assumed to be random variables, z t .In our model, the SSM layer latent states are Gaussian distributions.The input of the next layer is randomly sampled from these distributions, thus providing the Downloaded from https://academic.oup.com/mnras/article/512/4/5580/6553834 by guest on 19 April 2022 Figure 6.Identical DRW-simulated light curve (SF ∞ = 0.17, τ = 100, and z = 0) at position RA = 0, Dec = -10, under fiv e different cadence strate gies.The gre y light curv e is the original one with dense observations.For each cadence strategy, u , g , and r bands' cadences are selected.For u long strategy with 60-s exposure time in u band, the photometric error is much smaller than others.Ho we ver, cadence dri ve bearing reducing g -band gaps, some of its observations have large noise.The rolling cadence with scale 0.9 shows apparent trend in seasons after around 1000 d -the number of observations in the prior season grows and downs in its next season.Compared with u long, filterdist for u -band heavy has more overall visits in u band.stochastic features.The key design of our model is to stack an SSM layer on the last GRU h layer (Fig. 7 , left-hand panel).
Given the fact that CARMA can be written as developing Gaussian distributions o v er time-steps, the SSM layer is expected to present similar functions for generating the output sequences.In this way, the model can learn both the long-term dependency within a sequence, as well as the stochastic features of the input sequence.Combining the non-linear gated mechanisms of the RNN with the stochastic transitions of the SSM creates a sequential generative model that is more e xpressiv e than the RNN and better capable of modelling long-term dynamics than the SSM.Fig. 7 a shows the architecture of the generative model.

Inference network
The second part of the SRNN is the inference network.Here, the prior distributions in the generative model learn from the posterior distributions by KL divergence (Kullback & Leibler 1951 ).While training, the generative model and inference network are both implemented, and learn from the backpropagation.Fig. 7 b outlines the inference model architecture.For each step, latent states a t are fed with the combinations of y 1: T and h 1: T , so it has the information of the future time-steps from target sequences as well as previous steps' information from the last h layer in the generative model.In order to let posterior distributions contain the information from the future steps, the RNN layer is reversed in the time dimension.

Loss function
Initially, we wish to maximize the marginal log-likelihood function L , where . Ho we ver, the random variable z t in the non-linear SSM cannot be integrated out analytically (see e.g.Kingma & Welling 2014 , section 2.1).Therefore, we instead aim to maximize the ELBO (also known as the variational lower bound) given as F , with respect to the generative model parameters θ and an inference model parameter, φ.Thus, the objective function of our SRNN is F ( θ, φ), given in Appendix B1 .In practice, minimizing the loss function is more intuitiv e and conv enient for implementation.Therefore, we present our loss function equation ( 7), which can be derived from ELBO.
This is formed from the ne gativ e log likelihood and KL divergence ( D KL ; Kullback & Leibler 1951 ) of the prior and posterior distributions of z .Q and P correspond to the approximate posteriors in the inference network and priors in the generative model, respectively.Furthermore, the magnitude errors are considered in the loss calculation.N lc is the number of light curves per loss calculation.N T means the length of each light curve.m target, t means the t -th observation of the target light curve, and m rec t means the t -th observation of the reconstructed light curve by SRNN.m error t represents the magnitude error in the t -th observation of the LSST-cadence light curve (also the input light curve), with 0 for vacant observations.

Parameter configuration
Our SRNN is built using the open-source software ML library TensorFlo w ( tf ), with K eras as the backend.We used Google Colab as the computing platform, and chose the GPU P100 nodes option.
For the generative model, we set one bidirectional GRU h t layer with 32 neurons, followed by two bidirectional GRU layers -for the μ z and log σ priors -with 32 neurons for each layer.All layers apply the default hyperbolic function tan h as the acti v ation function.We add a 'bidirectional' wrapper to each GRU layer as we found during testing it increased the accuracy of the light-curve reconstruction.For the inference network, similarly, there are three bidirectional GRU layers with 32 neurons, corresponding to a t , the posterior μ z , and log σ , respectively.The dimension of the inputs includes magnitudes y (and magnitude errors y err for DRW and DHO o v erdamped cases).The outputs are the dense light curves, with the observation length identical to the input sequences.
As the goal of applying SRNN is to model the whole light curves, including those dates where there are no observations, it is necessary to discuss how to deal with these vacant time-steps.We cannot impute these values using Autore gressiv e inte grated mo ving average (ARIMA), (e.g.Saputra et al. 2021 ) and Neural-Networkstechniques (e.g.Li et al. 2020 ;Shu et al. 2021 ), as the SRNN would not then need to learn how to fill the gaps.Our solution is: For each dimension, the value is first reduced by its mean values, and then zeros are added for those time-steps where the observations are vacant.In other words, we pre-processed those vacancies with the mean values.As RNNs do not allow 'NaN' values included in inputs, adding zeros can provide these vacancies with initial values.During training, SRNN is expected to learn the correlations between steps and predict the corresponding values for these vacancies.As such, adding zeros has less physical meaning, but simply satisfies the RNN input rules.Ho we ver, when the vacant period is too long, the predicted values will mo v e close to the mean values.15( p) , respectively.A sampling layer is used for randomly sampling a variable from each Gaussian distribution at each step.The output of the sampling layer is further combined with the output of the last h layer, and passed to the output dense layer, to generate output sequences.(b) Inference network.This is for providing the posterior Gaussian distributions.The target output sequences are combined with the out put of the last h layer (also in the generative model), and they are delivered to a reversed GR U layer , called a .In this way, the output of the a layer at each step contains the information both from its future steps (from the target) and its previous step (from h ).Furthermore, an SSM layer is stacked on the a layer, producing the posterior μ ( q ) and log σ 2 ( q) for each step, which is compared with the priors by calculating the KL divergence.During training, this divergence is considered and works with the ne gativ e log likelihood to minimize the whole loss function.
We chose the Adam Optimizer (a stochastic gradient descent method derived from adaptive moment estimation) with a learning rate r = 5 × 10 −3 , β 1 = 0.900, β 2 = 0.999, and = 1 × 10 −7 .The batch size is 128 per training epoch, and the maximal training epoch number is 300.tf.keras.callbacks.EarlyStopping is also applied with the patience set to 3 in order to stop training if the loss function is no longer decreasing.

Data set and training plan
The outputs from the SRNN are light curves.To see how the SRNN performs in 'ideal' circumstances, we input a dense and uniformly sampled light curve with 3650 points (a daily observation for 10 yr) to the SRNN, and target a similarly dense light curve as the output.
The training data involve 7200 light curves, and test data involve 1800 light curves.
To predict the light curve during seasons in the 10-yr LSSTsimulated light curves, and see how SRNN could help reco v er the CARMA DRW and DHO parameters, light curves based on the five considered LSST cadences are also fed as the input of the SRNN, and dense light curves will be the targets.The SRNN is expected to learn the trends between seasons.In this instance, the training data involve 14 400 light curves, and the test data are 3600 light curves.
We also want to predict future observations based on observed light curves, and again see how the SRNN could help reco v er DRW and DHO parameters for light curves with shorter duration.As such, we shortened the length of dense and LSST-cadenced light curves to 6-9 yr, and let SRNN to predict the next 1-4 yr's light curves.
For both the LSST light curves as well as the SRNN predicted daily light curves, we apply the GPR method (Section 2.2.3 ) to measure input layer is also tested, in order to mask all the missing values with zeros and ignore them by deacti v ating their passing neurons while training.This design dramatically slows down the training process, which is about 5 times longer on GPU cores, and the predictions are not as good as the architecture without masking.
CARMA parameters.We note, GPR is only one method to estimate the CARMA parameters of the light curves; it is not as accurate as Markov chain Monte Carlo (MCMC), but more efficient.

R E S U LT S
After generating the model light curves in Section 2 and simulating observations through the LSST cadences as discussed in Section 3 , we use the SRNN described in Section 4 to reconstruct the full light curves and calculate CARMA model parameters.In this section, we report the results.These examples illustrate that the longer the gaps in observations, the lower the accuracy of the model and the larger the MAE.From the plot it can be seen that the SRNN is able to predict the o v erall trend of each gap, which is similar to its real values.It also can simulate the stochastic characteristics of the sequence, by randomly sampling from Gaussian priors (the mean and standard shown in lower two panels) for each step.

LSST light-curve modelling
For LSST-cadence light curves, it is harder for the SRNN to learn the correlations between each time-step, as the cadences are often not uniform and have longer intervals.The SRNN is trained for The modelling of these two cases shows that SRNN is able to predict the gaps between seasons with stochastic characteristics in general, though it fails to predict accurately the specifics of the variability during the gaps.This is unsurprising, as the CARMA process at each step is indeed a (Gaussian) random variable.

Problem of 'filling the gap'
Here, we particularly discuss how SRNN modelling fills in the gaps between distant observations.As can be seen from Figs 8 -11 , SRNN can reconstruct the input observations when gaps are reasonably short compared to the time-scale of variability, but for large gaps, SRNN's general performance is weak.The following factors all affect the SRNN light-curve reconstruction: (i) Number of observations: Unsurprisingly, the number of provided observations is a major determinant how much information that SRNN can digest.(iii) Level of perturbations: high SF ∞ /short τ : Light curves with extremely high variability or very short time-scales are difficult for SRNN to model, since the limited number of observations and long gaps are not sufficient for SRNN to learn these features.
(iv) Quasi-periodicity: Fig 11 shows that SRNN is better at modelling light curves with quasi-periodic features.It can be seen that for these light curves with durations 3500 d and quite different cadences, SRNN is able to predict the observations well, and even only se veral observ ations can help SRNN to greatly reco v er the trends with high accuracy.
(v) Assumption of stationarity: All simulations and fittings in this paper are based on the stationary model CARMA, though the real AGN variability could be non-stationary (Tachibana et al. 2020 ).The main reason for our assumption of stationarity is that at this stage, not enough real and good-quality AGN light curves are provided for the training set (for both inputs and targets), while CARMA (especially DRW) has been a popular model for AGN variability study for years.It is the closest and simplest model that could help to achiev e AGN light-curv e simulations, though it does hav e dra wbacks and discrepancies compared with the real ones.From this perspective, SRNN should not be expected to recover the real short-term events happening in gaps, as CARMA light curves are not generated by deterministic physical processes.
To summarize: For the LSST cadences shown in Figs 9 -11 , long gaps exist between observations, and for the reasons abo v e, the SRNN model struggles to impute the behaviour during these gaps, especially for the non-periodic DRW and DHO-o v erdamped cases.This will turn out to be an important limitation when attempting to infer CARMA parameters from these light curves.

Parameter estimation analysis
The main moti v ation of our paper is to investigate whether using an ML algorithm on (synthetic) LSST quasar light-curve data would be able to detect and/or mitigate any biases in derived CARMA model parameters.Specifically, we have modelled the CARMA(1,0) DRW and CARMA(2,1) DHO processes, the latter in both the o v erdamped and underdamped cases.Here, we report the results of these investigations.

Metric for CARMA parameters
We design a metric, M err , for e v aluating ho w the CARMA parameters are reco v ered by SRNN-modelled daily light curves, compared with LSST-cadence light curves.This metric is used for the comparisons between combinations of different cadences and bands, and it can also be used as an ensemble metric for each cadence with all bands considered.
where N is the number of light curves used in e v aluating the metric.
M is the number of parameters for the rele v ant CARMA model.θ SRNN , θ LSST , and θ in are the parameters reco v ered (by GPR) on the SRNN reconstructed light curves, the parameters recovered from the LSST simulated light curves, and the input parameters, respectively.We calculated the absolute values of differences between θ SRNN and θ LSST , and then divide them with θ in in order to measure each parameter with the same scale.This metric can be extended to any LSST cadences and any bands.
Fig 12 shows our calculated metrics, for each of the five LSST cadences and six LSST bands, in the DRW, DHO-o v erdamped, and DHO-underdamped cases.We also show ensemble metrics for each cadence in each CARMA model.Before metric calculations, those objects with estimated τ decay longer than 10 4 d are regarded as outliers and remo v ed. 17n this plot, it can be seen that the DHO-underdamped case al w ays gains the lowest M err , as the SRNN algorithm can better simulate this kind of light curve, followed by the DRW case.The DHOo v erdamped case al w ays has the largest M err .Regardless of which CARMA model is used, the u -band light curves usually have the largest M err .For most bands, filterdist generally has the lowest (best) M err , which shows that the number of observations plays a key role in parameter estimation.
To make this more explicit, the lower right panel of Fig 12 shows ensemble metrics for each cadence.Here, three conditions are considered.The mean metric value from all LSST bands in a given cadence shows that the differences between baseline, u long and cadence driv e are tin y for the DHO-underdamped and DRW models.The filterdist cadence al w ays gains the lowest mean M err among all cadences, with rolling having the largest value.Such results indicate that filterdist might be the optimal cadence for SRNN modelling for this case, and rolling is the worst option due to the long gaps in co v erage.
Ho we ver, as the large M err in the u band, which is poorly sampled for most cadences, drives the mean metric to higher values, we also present a mean metric that considers only the five redder bands.In this case, cadence drive gains the lowest metric.If we consider single-band modelling, taking only the minimal value (for any band, but typically r ), u long, cadence drive, and filterdist all perform well, with the best cadence depending on the CARMA model used.

DRW parameters analysis
Finally, we investigate the derived parameters and possible biases in SRNN modelling of AGN, concentrating on the DRW case as it is the most popular model used to analyse AGN time series.Analysis for DHO cases will be conducted in future work.
Fig. 13 shows the trend in reco v ered time-scales compared to the model inputs, expressed as log ( ρ out ) and log ( ρ in ), where ρ x ≡ τ x /(10 yr) (the LSST surv e y length).F or small time-scales, the reco v ered τ out from the SRNN-modelled light curves are highly overestimated.This is because a shorter time-scale leads to more perturbations within a gi ven gap.SRNN sho ws worse performance in predicting the highly variable magnitudes in longer gaps and fewer observations, resulting in relatively flat predicted light curves and overestimated time-scales.For longer time-scales, when log ( ρ) ≥ −1, parameter reco v ery with the SRNN is somewhat better than using the LSST data alone, but the time-scales are underestimated in both cases.This is because an observing length longer than ∼8-10 times τ is required to provide sufficient information for unbiased τ estimation (Kozłowski 2017 ;Kozlowski 2021 ;Suberlak et al. 2021 ).Put simply, when τ is longer, e.g. 5 yr, and the observing length is 10 yr, there are not enough samples for calculating the magnitude differences at high t, resulting in underestimation of τ .
SRNN-modelled and LSST-cadence metrics almost o v erlap in r , i , z, and y bands.For u band, in all cadences, ρ out for the SRNN reconstructed light curves is al w ays longer than that for LSST cadence light curves.Examining the ranges of log ( ρ out ) for the six bands, the r , z, and y bands in all cadences allow the best performance: They are closer to the diagonal than other combinations.Especially for z band, some SRNN metrics at small time-scales are very similar to LSST ones.For a given band, the SRNN performance is similar across most cadences, but larger differences are seen in u band -in this case filterdist performs relatively better than the others.In general, the SF out distributions from the SRNN and LSST light curv es o v erlap in most cases e xcept for u band.SF out estimations from u -band SRNN light curves, with the baseline, u long, cadence drive, and rolling cadences, are lower than those from LSST light curves.This is because light curves with sparse observations make it harder for SRNN to learn their features, resulting in smooth predictions during observing gaps.When more observations are allocated (such as in the filterdist cadence), SF out gets closer to the diagonal.

SRNN modelling performance
Our SRNN model shows the ability to reconstruct realistic AGN light curves for the three CARMA models of interest.Ho we ver, we have also identified biases and limitations in this method.In particular, the relative inability to predict observations during long gaps means that modelling results are v ery sensitiv e to season length and variability time-scales, as illustrated in Figs 12 -14 .
The SRNN model clearly struggles to reco v er the damping timescale, predicting τ ∼ 1-2 yr regardless of τ in (Fig 13 ).The problem at short time-scales is due to the problem of gaps in cadence much longer than τ , with the result that the SRNN representation does not capture the true variability in these gaps (Section 5.1.3).At longer τ in , both the SRNN and LSST-cadenced data return underestimates.This is because of the 10-yr surv e y duration: F or τ 1 yr, the data do not have a long enough baseline to give sufficient samples of τ (Kozłowski 2017 ).
The model performs better in reco v ering the SF.As shown in Fig 14 , this can be reco v ered as accurately from the SRNN light curves as from the LSST light curves, in all bands except u , where the lower number of detections likely inhibits SRNN reconstruction.Interestingly, the reco v ered values of SF ∞ are systematically slightly below the input values for both SRNN and LSST light curves.

Comparisons with previous work
The range of ML techniques is vast and here we concentrate on five main artificial neural network types: the RNN, the RAE, the VRAE, the VRNN, and the SRNN.We also include multiband GPR here as the moti v ation of Hu & Tak ( 2020 ) is consistent with this paper.
Very high-level descriptions of these five architectures and astronomical applications are given in Table 6 .We also compare the moti v ations, input formats, architecture, and results of Tachibana et al. ( 2020Tachibana et al. ( ), S ánchez-S áez et al. ( 2021 ) ), Hu & Tak ( 2020 ), and this paper, shown in Table 7 .
Our study represents the first application of an SRNN to astronomical research.The unique design of the SRNN is to stack a SSM layer on the traditional RNN layer in order to learn and produce both the underlying features and corresponding stochastic fluctuations.SRNN is suitable for modelling light curves with MNRAS 512, 5580-5600 (2022) Its encoder learns a representation (encoding) for a set of data, typically for dimensional reduction (2D time sequences to 1D latent variables), by training the network to extract the inherent features from input sequences, and ignore insignificant or noisy data.Then, the representation, or so-called 1D latent variables are fed into a decoder, which decodes the features and generate output sequences.This architecture is design for sequence modelling and forecasting.

Implications in LSST
Our research is expected to help the LSST AGN collaboration to consider optimal cadences for studying AGN variability.To investigate this, we examined the results of our metric e v aluation (equation 8 ) for each LSST band under various classes of proposed LSST cadence (Fig 12 ).For existed cadence strategies, we conclude that filterdist is the best cadence for band-averaged case.As u band al w ays has the worst score due to the fewest total observations, we also only consider a metric with only g , r , i , z, and y bands, and find that cadence drive is fa v oured in this case.If we just look at the 'best' band case, u long is slightly better than baseline and cadence drive.Ho we v er, our e xperiments show that SRNN performs better for uniformly-seasonal light curves (see Fig 8 ) with denser detections, which is reasonable as with more information provided for the SRNN, it can learn the underlying features more easily.Therefore, it might perform better for the DDFs, where the cadences are more dense and uniform compared to the WFD surv e y.

Model selection and caveats
When LSST data emerges, CARMA models will be fitted to the irregular light curves as in Figs 8 -10 .The fitting procedure for CARMA is MLE (e.g.GPR), or a Bayesian variant (e.g.MCMC) if prior constraints on parameters are available.It is worth noting that in our research, we chose the GPR method as it is much less time-consuming than MCMC.As we have a sufficient amount of data for parameter estimations, the biases caused by GPR can be greatly relieved and the estimation distributions are of our interest.
In standard practice, the choice between AR models, such as DRW -CARMA(1,0) and DHO -CARMA(2,1), should be based on a penalized likelihood measure such as the Akaike Information Criterion (AIC; Akaike 1973 ).Best-fitting models are calculated for a range of ( p and q ) and, using the AIC to achieve model selections.The balance between o v ersimplicity v ersus o v ercomple xity of the models is determined by the data.
Mo ving a way from whether DRW or DHO models are preferred for an observed light curve, goodness-of-fit tests will be needed to show that the best-fitting model is adequate.The Anderson-Darling goodness-of-fit test of the cumulative observed versus model brightness is a reasonable tool (Stephens 1974 ) as well as more powerful residual diagnostics including the, e.g.Ljung-Box test (Ljung & Box 1978 ) for autocorrelation and augmented Dickey-Fuller test (Dickey & Fuller 1979 )  MNRAS 512, 5580-5600 (2022) 6.5 SRNN forecasting F orecasting future light-curv e behaviour for a given object is another goal of applying ML to AGN.We tried to use SRNN for forecasting, and tested its ability to predict one or several steps for one run and fold the outputs as the input for the next run, and to predict a length of future light curves for one run.Ho we ver, the results are not promising: Our SRNN can only predict light curves for about one month, with low accuracy.As time increases, the error gets bigger and light curves become increasingly flat.This suggests that the SRNN architecture may not be suitable for forecasting, due to its non-deterministic (random) nature.More experiments will be done in the future.

C O N C L U S I O N S
Cadence strategy plays an important role in light-curve modelling, especially for AGN with long time-scales and high variability features.This paper introduces an SRNN to model realistic LSSTcadence light curves simulated using CARMA models, and provides a quantitative metric to evaluate and compare the performance between five selected LSST cadence strategies.
In this study, we: (i) Investigated the popular CARMA models, which are often applied in AGN light-curve simulations.In addition to the usual DRW model, we also explored DHO models (underdamped and o v erdamped cases), which may be more applicable to some AGN.
(iii) Applied modified SRNNs to AGN light-curve modelling, to achieve stochastic modelling and prediction (iv) Provided a metric, M err , for estimating how well SRNN can help reco v er the input parameters with GPR, compared with pure LSST-cadence light curves.
(v) Concluded that filterdist, cadence drive, and u long strategies are the optimal when six bands, five bands, and the best band are considered, respectively.Ho we ver, as sho wn in e.g.Figs 8 and 13 , the long gaps inherent in all suggested WFD cadences make it extremely difficult for CARMA models, and the SRNN implementation, to reco v er accurate variability time-scales.Moreo v er, in real LSST data most AGN will be fainter than the SDSS sample used to construct our models, and hence photometric noise will introduce further uncertainties.This leads us to conclude that LSST WFD data may not be particularly well suited to AGN variability studies, at least with current methods.Progress may be made by further developing the ML methods, for example by combining our SRNN architecture with the multiband approach of Hu & Tak ( 2020 ), or by focussing on the higher cadenced data from the LSST DDFs for a smaller sample of AGN.The research we have presented here provides a method to quantify the performance of any potential cadence strategy, and we expect this will pro v e useful to the AGN community in preparing for LSST.
Table A1.Summary of the key DRW and DHO equations (Kelly et al. 2009(Kelly et al. , 2014 ; ;Kasliwal et al. 2017 ;Moreno et al. 2019 ).For the RHS of each differential equation, ( t ) corresponds to a white noise process with the mean equal to zero and variance ( σ2 Noise ) equal to one (Kelly et al. 2009 ).For DRW, there is only one root, r 1 , which is equal to −α 1 .For DHO, the r 1 and r 2 mean the two roots of characteristic equations for the LHS of each differential equation.These could be real or complex, corresponding to o v erdamped and underdamped cases, respectively.Further symbols are defined in Table A2 .It is worth noting that the top four sections define the statistical models while the bottom three sections are non-parametric transforms of the CARMA model.

Figure 1 .
Figure 1.The SF ∞ and τ distribution of 7384 quasars in g band selected from the SDSS S82 field, with spectroscopic redshift given by point colour.The dashed line is the regression line, and the four star markers represent four DRW parameter pairs, the associated light curves of which are shown in Fig. 2 .(MacLeod et al. 2010 ) shown in Fig.1.We take the values reported at a given SF ∞ -τ coordinate and add a 'scatter' in the range −0.05 < ι < 0.05 (as determined by a random number generator) to each ordinate separately.We generate 30 000 DRW light-curves parameter pairings in this manner.The four black stars in Fig.1are four DRW pairs, and we show their associated light curves, SFs and PSDs in Figs 2 and 3 .Given a fixed value for SF ∞ and a set rest-frame observation length, longer characteristic time-scales ( τ ) will lead to less fluctuation.When τ is short compared with the rest-frame observing duration, the variance of the light curve will tend towards σ 2 and the estimated parameters will approach the underlying values.

Figure 2 .
Figure 2. Four light curves simulated from the DRW parameters in the SDSS S82 .The mean magnitude is 22.0 mag.

Figure 3 .
Figure 3. Left-hand panel: The PSD for the four e xample DRW light curv es as giv en in Fig. 2 .The PSD is determined by the driving force [ β 0 d W ( t) in DRW equation].For DRW, there is only one AR coefficient β 0 , indicating that slope of PSD will be flat with f −2 after the break point.Right-hand panel: The SF describes the trend of standard variance of magnitude differences with t .The vertical dash lines correspond to τ −1 and τ in each graph, respectively.

Figure 4 .
Figure 4. Four light curves simulated by DHO.Upper two are underdamped and lower two are o v erdamped cases, followed with their IR.For underdamped IR, the black dash line means τ QPO and grey line means τ decay .In overdamped IR, the black line corresponds to τ rise , and grey τ decay .

Figure 5 .
Figure 5. PSD and SF for four DHO light curves with underdamped and overdamped cases.The left-hand side (LHS) and the right-hand side (RHS) of each subplot represent the PSD and SF for underdamped and o v erdamped DHO, respectiv ely.F or Fig. 5 a, ξ means the damping ratio.The v ertical dash lines represent 1/ τ QPO (LHS) and 1/ τ rise (RHS), respectiv ely.F or Fig. 5 b, ζ means the ratio of τ QPO / τ decay (LHS) and τ rise / τ decay(RHS).The black and grey dash lines depict τ decay and 5 τ decay , respectively.It is worth noting that when t is equal to 5 τ decay , the SF has been steadily close to SF ∞ .The blue dash line on the right plot represents τ blue , where the o v erdamped SF slopes just decrease(Moreno et al. 2019 ).

Figure 7 .
Figure 7. (a) The architecture of the generative model (left-hand panel).The inputs layer is connected with a hidden layer (called h ), which can be a GRU or LSTM layer.There could be multiple h layers, and in this diagram we only show one.An SSM layer is interlocked with the last h layer, represented by prior Gaussian distributions N prior in the diagram.Note that the SSM layer is implemented by two GRU layers in the programs, producing μ ( p ) and log σ 2( p) , respectively.A sampling layer is used for randomly sampling a variable from each Gaussian distribution at each step.The output of the sampling layer is further combined with the output of the last h layer, and passed to the output dense layer, to generate output sequences.(b) Inference network.This is for providing the posterior Gaussian distributions.The target output sequences are combined with the out put of the last h layer (also in the generative model), and they are delivered to a reversed GR U layer , called a .In this way, the output of the a layer at each step contains the information both from its future steps (from the target) and its previous step (from h ).Furthermore, an SSM layer is stacked on the a layer, producing the posterior μ ( q ) and log σ 2 ( q) for each step, which is compared with the priors by calculating the KL divergence.During training, this divergence is considered and works with the ne gativ e log likelihood to minimize the whole loss function.

Fig 8
Fig 8 shows results of SRNN modelling for one DRW light curve (SF ∞ = 0.1 mag and τ = 307 d) under different samplings.We test four example samplings: The light curve can be fully sampled, or have observation gaps of 30, 60, or 120 d in-between 30 d of observations.Mean Absolute Error (MAE) is presented for qualifying the influence of the increasing gap length on the whole light-curve modelling.These examples illustrate that the longer the gaps in observations, the lower the accuracy of the model and the larger the MAE.From the plot it can be seen that the SRNN is able to predict the o v erall trend of each gap, which is similar to its real values.It also can simulate the stochastic characteristics of the sequence, by randomly sampling from Gaussian priors (the mean and standard shown in lower two panels) for each step.

Figure 8 .
Figure 8.Multiple SRNN modelling for one DRW process (SF ∞ = 0.1 mag and τ = 307 d) with different uniformly designed cadences.In this experiment, the magnitude errors are set to zero for simplicity.the upper five rows correspond to input light curves with different cadences, which is labelled on the top right of each panel, respectively.The input light curves are shown in red, modelled light curves by SRNN are shown in blue, and the grey means the full light curves (10 yr's observation length; one observation per night).The Mean Absolute Error (MAE) between the model light curve and the target light curve is calculated for each case and displayed in the upper left corner.The last two ro ws sho w the mean and standard deviations of the Gaussian prior distributions used by one example neuron (of the many that contribute in each time-step) in the SSM layer, from which the random variable z t is sampled (Fig.7).The case shown is for the input light curve with 'season 30, gap 120' cadence.
Fig. 11 shows the reconstruction of a quasar with a DHO CARMA process in the underdamped case.The input DHO parameter values are SF ∞ = 0.236, τ QPO = 44.193d, τ decay = 239.414d, and redshift z = 1.372.This plot shows that SRNN is better at learning the

Figure 9 .
Figure 9.The reconstructions of a quasar with a DRW CARMA process, given in the r band with the five LSST cadences.The original DRW parameter values are SF ∞ = 0.218 mag, τ = 691.656d, and redshift z = 1.677.The input light curves are the observed light curves with time dilation considered, which are identical for all five cadences, presented with grey points.Then they are sampled with different cadences respectively, shown in black points with error bars.The reconstructed light curves (by SRNN) are shown with red points.The reco v ered reconstructed parameters (observ ed) are giv en in the top right of each panel.MAE represents the Mean Absolute Error, rec SF ∞ and rec τ correspond to the parameter estimation (by GPR) after reconstruction, which are shown on the top right of each panel.

Figure 10 .
Figure 10.The reconstructions of a quasar with a DHO CARMA process, in the o v erdamped case.Magnitudes are reported in the LSST r band with the LSST cadences from top to bottom panel.The input DHO parameter values are SF ∞ = 0.215, τ rise = 11.208d, τ decay = 1118.304d, and redshift z = 1.233.The reconstructed (observed) DHO parameters are given in the top right of each panel.periodicity characteristic of the DHO-underdamped process.With fe w observ ations, SRNN can predict the v acant observ ations well.

Figure 11 .
Figure 11.The reconstructions of a quasar with a DHO CARMA process, in the underdamped case.The input DHO parameter values are SF ∞ = 0.236, τ QPO = 44.193d, τ decay = 239.414d, and redshift z = 1.372The reconstructed (observed) DHO parameters are given in the top right of each panel.

Figure 12 .
Figure 12.CARMA metric under different LSST cadences.Upper left, upper right, and lower left subplots show the metrics with DRW, DHO-o v erdamped, and DHO-underdamped cases, respecti vely.The lo wer right subplot shows the ensemble metric for different CARMA cases and cadences.The solid lines with round markers correspond to the mean value with each cadence; the solid lines with 'x' marker (the labels with * ) correspond to the mean value without u band; the dashed lines correspond to the minimal value for any band (usually r ) within each cadence.
Fig 14 shows how well SF ∞ can be reco v ered from the SRNNmodelled and LSST light curves.It shows that SF ∞ in all bands and cadences are underestimated.When log (SF in ) increases, log (SF out ) mo v es further a way from the diagonal.The main reason is a positive correlation between SF ∞ and τ (see Fig 1 ): High SF ∞ often

Figure 13 .
Figure 13.τ estimation for DRW processes.ρ in and ρ out are defined as τ in (the input time-scale) and τ out (the reco v ered time-scale from the simulated SRNN or LSST data) divided by 10 yr (the LSST surv e y observation length).Each row corresponds to a different LSST cadence, and each column corresponds to a band.Results from the SRNN-modelled light curves are shown with colours, and results from LSST-cadence light curves are shown in grey.The shaded areas indicate the standard deviation around the median, in 50 bins in ρ in , following outlier rejection.corresponds to longer time-scales, and such light curves require more observation time (time dilation is also considered) to reach the plateau of the SF plot (Fig 3 ).Given the 10-yr LSST survey length, SF ∞ of many objects with long time-scales will be underestimated.In general, the SF out distributions from the SRNN and LSST light curv es o v erlap in most cases e xcept for u band.SF out estimations from u -band SRNN light curves, with the baseline, u long, cadence drive, and rolling cadences, are lower than those from LSST light curves.This is because light curves with sparse observations make it harder for SRNN to learn their features, resulting in smooth predictions during observing gaps.When more observations are allocated (such as in the filterdist cadence), SF out gets closer to the diagonal.

Figure 14 .
Figure 14.Same as Fig 13 but for SF ∞ .SF in and SF out are the real SF ∞ and estimated SF ∞ from SRNN or LSST light curv es, respectiv ely.The fitting lines with colours are for metrics of SRNN-modelled light curves, and the grey ones are for LSST light curves.Table 6.A very high-level description of the four architectures most discussed in this paper.Machine Learning (ML) network Description Recent astrophysics studies Recurrent Neural Network (RNN) for stationarity.Indeed, time-series analyses will likely mo v e a way from the SF (see Emmanoulopoulos, McHardy & Uttley 2010 ) and likely focus more on the ACF.Ultimately, model selection should allow astrophysical insight into the accretion process (e.g.presence or absence of a harmonic oscillator and non-stationary; Tachibana et al. 2020 ).

Table 1 .
Acronyms used in this paper.

Table A1
) -such simulated cases are remo v ed.

Table 3 .
The ratios are the mean values of (SF ∞ or τ in a given band) over mean (SF ∞ or τ in the u band) for the Stripe 82 quasar data.

Table 4 .
Brief o v erview of the five Legacy Survey of Space and Time (LSST) cadence strategies used in our study.Details are taken from the Jupyter Notebook nbviewer .jupyter.org/github/lsst-pst/survey str ategy .